Extension:Cargo/Storing data

在Cargo中,創建數據結構、存儲數據都僅通過模板完成。 任何使用Cargo的模板都需要包含對解析器函數#cargo_declare#cargo_store的調用,或者較罕見的是調用#cargo_attach#cargo_store#cargo_declare定義了數據表的字段,#cargo_store將數據存入數據表,#cargo_attach指定將模板數據存入事先在其他地方定義過了的數據表。

聲明一個表

需要將數據存儲在表中的模板,要麼聲明該表,要麼將自身 "附着" 在其他地方聲明過的表。 由於通常情況下表和模板是一一對應的,大多數使用Cargo的模板都會聲明自己的表。 聲明通過解析函數 #cargo_declare 完成。

這個函數通過如下語法調用:

{{#cargo_declare:
_table = table_name
|field_1 = field description 1
|field_2 = field description 2
...etc.
}}

表名和字段名都不能包含空格或破折號,但你可以使用下劃線、CamelCase式大小寫等。 下劃線不能用於表和字段名稱的開頭或結尾。

字段描述必須以字段的類型開頭,很多情況下描述內容只是類型。 Cargo中預定義了以下類型:

字段類型 描述 是否為不可索引
Page 存儲wiki中頁面的名稱 (默認最大大小:300字符)
String 存儲標準的文本,非維基文本 (默認最大大小:300字符)
Text 存儲標準的文本,非維基文本,用於更長的值
Integer 存儲整數
Float 存儲實數,可以非整數
Date 存儲日期,不帶時間
Start date,
End date
類似於Date,但是用於存儲一段時間的開始和結尾 表可以不包含Start dateEnd date字段,也可以只包含兩者之一。
Datetime 存儲日期和時間
Start datetime,
End datetime
Start dateEnd date類似,但是包含時間。
Boolean 存儲布爾值,值只能是1或0,或者「yes」「no」(查詢Cargo指定的字符串值的信息參見此段
Coordinates 存儲地理坐標
Wikitext string 存儲短文本,用於被MediaWiki解析器解析 (默認最大大小:300字符)
Wikitext 存儲長文本,用於被MediaWiki解析器解析
Searchtext 存儲可以被搜索的文本,使用 MATCHES 命令 (需要MySQL 5.6+或MariaDB 5.6+)
File 存儲wiki中已上傳的文件或圖像的名字(類似於Page,但是不需要指定「File:」命名空間) (默認最大大小:300字符)
URL 存儲URL (默認最大大小:300字符)
Email 存儲電子郵件地址 (默認最大大小:300字符)
Rating 存儲「評分」值,比如通常為1到5之間的整數

指定的任何其他類型都將被視為「字符串」類型。 未索引類型的字段查詢或連接的速度要慢得多。

字段也可以存儲以上類型的列表。 要定義一個這樣的列表,類型值應該是這樣子:「List (分隔符 ) of 類型」。 例如,如果要有叫做「Authors」的字段,存儲由逗號隔開的字符串值,你可以在$cargo_declare調用中使用以下參數:

|Authors=<translate nowrap> List (,) of String

字段參數

描述字符串也可以有額外的參數,這些都包含在類型標識符後的括號中,用分號隔開。 當前允許的參數有:

參數 描述
size= For fields of type "Page", "String", "Wikitext string", "File", "URL" and "Email", sets the size of this field, i.e. the number of characters; the default is set by the global variable $wgCargoDefaultStringBytes, which in turn has a default value of 300 (although it can be modified in LocalSettings.php).
hierarchy Specifies that the field holds a hierarchy of values, as defined in the "allowed values" parameter (see next item).
allowed values= A set of allowed values that a field can have. (This is usually only done for fields of type "String" or "Page".) If "hierarchy" is not specified, this should simply be a set of comma-separated values. If "hierarchy" is specified, the values should be defined using the syntax of a bulleted list. In brief: every value should be on its own line, each line should start with at least one "*", the first line should start with exactly one "*", and the number of "*" should increase by no more than one at a time.
For example, to define a field called "Color" that has three allowed values, you could have the following declaration:
|Color=String (size=10;allowed values=Red,Blue,Yellow)
Meanwhile, to define a field called "Main ingredient" that is a hierarchy, you could have the following declaration:
|Main_ingredient = String (hierarchy;allowed values=*Fruits
**Mangoes
**Apples
*Vegetables
**Root vegetables
***Carrots
***Turnips
**Peppers)
參數 描述
link text= For fields of type "URL", sets text that would be displayed as a link to that URL. By default the entire URL is shown.
hidden Takes no value. If set, the field is not listed in Special:Drilldown, although it is still queriable.
mandatory Takes no value. If set, the field is declared as mandatory, i.e. blank values are not allowed.
unique Takes no value. If set, all values for the field must be unique - a value that already exists for that field in the table will not be saved.
regex= Sets a regular expression for this field, which all values must match. For example if "regex=T\d+" is set, values for that field must consist of the letter "T" followed by one or more numerals.
dependent on= Takes in the name of another field in this table, to specify that this field should only be displayed in Special:Drilldown once the user has selected a value for that field.

其他的#cargo_declare參數

除了表的名稱和字段之外,以下參數也可以添加到#cargo_declare中:

  • _parentTables - for setting one or more other Cargo tables as the "parent tables" of this table. This is used within Special:Drilldown, to let the user filter on fields from additional Cargo tables that are tied in some way to this one. It takes the following syntax:
|_parentTables= tableName1(_localField=localFieldName, _remoteField=remoteFieldName, _alias=tableAlias); tableName2(...); ...
Here, 'tableName1' is the name of the table you want to declare as the parent table. '_localField' and '_remoteField' specify the fields in the two tables that need to be joined on (the default values for both are "_pageName"). If '_alias' is defined, then that will be displayed in the drilldown instead of the parent table's name.
例子: This drilldown display shows additional drilldown fields from a parent table, "Items" (listed as "Item) (template here)
  • _drilldownTabs - for setting custom drilldown tabs in Special:Drilldown page. It can be declared like this:
|_drilldownTabs= Tab1(format=list;delimiter=\;;fields=A,B,C), Tab2(format=table; fields=A,C,D)
where 'Tab1' is the display name of the tab, 'format' parameter takes the desired format name and after that you can add all the parameters needed for that format and then 'fields' holds the set of fields to be displayed.
例子: This drilldown display also shows custom tabs (template here)

#cargo_declare also displays a link to the Special:CargoTables page for viewing the contents of this database table.

附着於另一個表

In some cases, you may want more than one template to store their data to the same Cargo table. In that case, only one of the templates should declare the table, while the others should simply "attach" themselves to that table, using the parser function #cargo_attach.

This function is called with the following syntax:

{{#cargo_attach:
_table = table_name
}}

A template should have no more than one call to #cargo_attach (and no more than one call to #cargo_declare, for that matter). Any template that contains a call to #cargo_store should also call either #cargo_declare or #cargo_attach.

在一個表中存儲數據

A template that declares a table or attaches itself to one should also store data in that table. This is done with the parser function #cargo_store. Unlike #cargo_declare and #cargo_attach, which apply to the template page itself and thus should go into the template's <noinclude> section, #cargo_store applies to each page that calls that template, and thus should go into the template's <includeonly> section.

This function is called with the following syntax:

{{#cargo_store:
_table = table_name
|field_1 = value 1
|field_2 = value 2
...etc.
}}

The field names must match those in the #cargo_declare call elsewhere in the template.

The values will usually, but not always, be template parameters; but in theory they could hold anything.

For fields whose value is a template parameter, and where the name of the template parameter is the same as the name of the Cargo field (other than the presence of underscores instead of spaces), the field can be left out of the #cargo_store call; so, in many cases, the call could instead simply look like:

{{#cargo_store:
_table = table_name
}}

In fact, not even the table name really needs to specified; so in many cases the call could even look like:

{{#cargo_store:}}

However, this is slightly less efficient (and maybe more confusing to readers) than specifying the table name.

存儲循環事件

Special handling exists for storing recurring events, which are events that happen regularly, like birthdays or weekly meetings. For these, the parser function #recurring_event exists. It takes in a set of parameters for a recurring event (representing the start date, frequency etc.), and simply prints out a string holding a list of the dates for that event. It is meant to be called within #cargo_store (for a field defined as holding a list of dates), and #cargo_store will then store the data appropriately. #recurring_event is called with the following syntax:

{{#recurring_event:
start=start date
|end=end date
|unit=day, week, month or year
|period=some number, representing the number of "units" between event instances (default is 1)
|include=list of dates, to be included in the list
|exclude=list of dates to exclude
|delimiter=delimiter for dates (default is ';')
}}

Of these parameters, only "start=" and "unit=" are required.

By default, if no end date is set, or if the end date is too far in the future, #recurring_event stores 50 instances of the event. To change this, you can add a setting for $wgCargoRecurringEventMaxInstances in LocalSettings.php, under the inclusion of Cargo. For instance, to set the number to 100, you would add the following:

$wgCargoRecurringEventMaxInstances = 100;

If working with recurring events, declare the type of the field of the field to be List (;) of Date.

示例

你能看到兩個模板,使用的 #cargo_declare 和 #cargo_store 在 這裡這裡

創建或重新創建一個表

No data is actually generated or modified when a template page containing a #cargo_declare call is saved. Instead, the data must be created or recreated in a separate process. There are two ways to do this:

通過網頁

File:Cargo recreate data interface.png
表單顯示在 「?action=recreatedata」,如果一個現有的表正在被重建

從(聲明)模板的頁面,選擇標籤動作調用「創建數據」或者「重新創建數據」。 這將打開一個界面,其中包含複選框「重建數據至替換表格,保持舊有數據用於查詢」。 (那個複選框只會出現在如果詢問的 Cargo 表已經存在。)

一旦你點了「確定」,下列之一將發生:

  1. If the checkbox was selected, a "replacement table" will be created, while the current table remains unaffected. This replacement table can be viewed by anyone, but its data will not be used in queries. (In the database, the actual table will have a name like "cargo__tableName__NEXT".) If/when you think this replacement table is ready to be used, you can click on the "Switch in table" link at Special:CargoTables. This link will delete the current Cargo table and rename the replacement table so that it becomes the official table. Conversely, if you don't want to use the replacement table, you can click on the "Delete" link for it.
  1. If the checkbox was not selected, the current table will be deleted immediately, and a new version will get created.
  1. If the checkbox was not there, it means that this is a new table. In that case, the table will be created.

In all three cases, MediaWiki jobs are used to cycle through all the relevant pages and recreate the data - a separate job is created for each page. This can be a lengthy process for large tables, which is why using the "replacement table" approach is recommended for large tables - it avoids a "down time" period when the table is mostly empty.

Depending on the MediaWiki configuration, a call to MediaWiki's runJobs.php script may be useful or even necessary for these jobs to actually start.

If any templates contain #cargo_attach, they too will get a "Create data" or "Recreate data" tab. If this tab is selected and activated, it will not drop and recreate the database table itself; instead, it will only recreate those rows in the table that came from pages that call that template.

權限

The ability to create/recreate data is available to users with the 'recreatecargodata' permission, which by default is given to sysops. You can give this permission to other users; for instance, to have a new user group, 'cargoadmin', with this ability, you would just need to add the following to LocalSettings.php:

$wgGroupPermissions['cargoadmin']['recreatecargodata'] = true;

Once a table exists for a template, any page that contains one or more calls to that template will have its data in that table refreshed whenever it is resaved; and new pages that contain call(s) to that template will get their data added in when the pages are created.

命令行腳本

If you have access to the command line, you can also recreate the data by calling the script cargoRecreateData.php, located in Cargo's /maintenance directory. It can be called in one of two ways:

命令 描述
php cargoRecreateData.php Recreates the data for all Cargo tables in the system
php cargoRecreateData.php --table tableName Recreates the data for the one specified Cargo table.

In addition, the script can be called with the --quiet flag, which turns off all printouts. For full usage information, call it with --help.

存儲頁面數據

You can create an additional Cargo table that holds "page data": data specific to each page in the wiki, not related to infobox data. This data can then be queried either on its own or joined with one or more "regular" Cargo tables. The table is named "_pageData", and it holds one row for every page in the wiki. You must specify the set of fields you want the table to store; by default it will only hold the five standard Cargo fields (_pageName, _pageTitle, _pageNamespace, _pageID and _ID: see Database storage details). To include additional fields, add to the array $wgCargoPageDataColumns in LocalSettings.php, below the line that installs Cargo.

Currently there are seven more fields that can be added to the _pageData table; here are the fields, and the call to add each one:

字段 描述 LocalSettings.php call
_creationDate The date/time the page was created
$wgCargoPageDataColumns[] = 'creationDate';
_modificationDate The date/time the page was last modified
$wgCargoPageDataColumns[] = 'modificationDate';
_creator The username of the user who created the page
$wgCargoPageDataColumns[] = 'creator';
_fullText The (searchable) full text of the page
$wgCargoPageDataColumns[] = 'fullText';
_categories The categories of the page (a list, queriable using "HOLDS"). Note that spaces are stored as underscores.
$wgCargoPageDataColumns[] = 'categories';
_numRevisions The number of edits this page has had
$wgCargoPageDataColumns[] = 'numRevisions';
_isRedirect Whether this page is a redirect
$wgCargoPageDataColumns[] = 'isRedirect';
_pageNameOrRedirect The target of the page if it's a redirect, otherwise the page name
$wgCargoPageDataColumns[] = 'pageNameOrRedirect';

Once you have specified which fields you want the table to hold, go to the Cargo /maintenance directory, and make the following call to create, or recreate, the _pageData table:

php setCargoPageData.php

To recreate with replacement, add a --replacement flag:

php setCargoPageData.php --replacement

The replacement table can then be switched in normally using the Special:CargoTables interface.

If you want to get rid of this table, call the following instead:

php setCargoPageData.php --delete

You do not need to call the "--delete" option if you are planning to recreate the table; simply calling setCargoPageData.php will delete the previous version.

存儲文件數據

Similarly to page data, you can also automatically store data for each uploaded file. This data gets put in a table called "_fileData", which holds one row for each file. This table again has its own settings array, to specify which columns should be stored, called $wgCargoFileDataColumns.

There are currently five columns that can be set:

字段 描述 LocalSettings.php call
_mediaType The media type, or MIME type, of each file, like "image/png"
$wgCargoFileDataColumns[] = 'mediaType';
_path The directory path of the file on the wiki's server
$wgCargoFileDataColumns[] = 'path';
_lastUploadDate The date/time at which the file was last uploaded
$wgCargoFileDataColumns[] = 'lastUploadDate';
_fullText The full text of the file; this is only stored for PDF files
$wgCargoFileDataColumns[] = 'fullText';
_numPages The number of pages in the file; this is only stored for PDF files
$wgCargoFileDataColumns[] = 'numPages';

To store the full text of PDF files, you need to have the pdftotext utility installed on the server, and then add the following to LocalSettings.php:

$wgCargoPDFToText = '...path to file.../pdftotext';

pdftotext is available as part of several different packages. if you have the PdfHandler extension installed (and working), you may have pdftotext installed already.

To store the number of pages, you need to have the pdfinfo utility installed on the server, and then add the following to LocalSettings.php:

$wgCargoPDFInfo = '...path to file.../pdfinfo';

Once you have specified which fields you want the table to hold, go to the Cargo /maintenance directory, and make the following call to create, or recreate, the _fileData table:

php setCargoFileData.php

數據庫存儲細節

When the data for a template is created or recreated, a database table is created in the Cargo database that (usually) has one column for each specified field. This table will additionally hold the following columns:

字段 描述
_pageName Holds the name of the page from which this row of values was stored.
_pageTitle Similar to _pageName, but leaves out the namespace, if there is one.
_pageNamespace Holds the numerical ID of the namespace of the page from which this row of values was stored.
_pageID Holds the internal MediaWiki ID for that page.
_ID Holds a unique ID for this row.

存儲列表

For fields that have lists of values, the handling is more complex: a whole separate database table is created to hold all the individual values for this field. This table will get the name "MainTableName__FieldName" (e.g. "Books__Authors"), and it will have the following fields:

字段 描述
_rowID Holds the ID of the row (i.e., _ID) in the main table that this value corresponds to.
_value Holds the actual, individual value.
_position Holds the position of this value in the list (can be 1, 2, etc.)

So if an "Authors" field contained three values, the "Books__Authors" table would have three rows corresponding to that one page.

There's one more complication for list fields: the corresponding field for a list field in the database table will not actually be given that name, but will rather be called "FieldName__full", e.g. "Authors__full". This is to enable the "true" field name to serve as a "virtual" field within the #cargo_query call, to make querying on the field values table easier (see 'The "HOLDS" command').

存儲層次結構

For fields that have a set of allowed values that is defined as being a hierarchy, a separate database table is created to store the whole set of allowed values. This table will get the name "MainTableName__FieldName__hierarchy" (e.g. "Books__Genre__hierarchy"), and it will have the following fields:

字段 描述
_value The allowed value.
_left The number of the leftmost node represented by this value.
_right The number of the rightmost node represented by this value.

For an explanation of this method of storage, see the Wikipedia article "Nested set model".

存儲文件名稱

If a table has one or more fields of type "File", an additional table is created - for use in searching on files within Special:Drilldown - with the name "MainTableName__files" (e.g. "Books__files"), with the following fields:

字段 描述
_pageName The name of the page from which this row of values was stored.
_pageID The internal MediaWiki ID for the page.
_fieldName The name of the relevant field of type "File".
_fileName The value of the field, i.e. the name of an uploaded file.

存儲坐標

For fields of type 'Coordinates', like for fields that hold a list of values, no database field is created with the actual specified field name. Instead, the following three fields are created:

字段 描述
fieldName__full Holds the coordinates as written in the page
fieldName__lat Holds the latitude from the coordinates, as a float
fieldName__lon Holds the longitude from the coordinates, as a float

If the coordinates cannot be parsed, the "__full" field still gets the value, but the "__lat" and "__lon" fields are set to null.


存儲日期

For fields of type 'Date' or 'Datetime', an extra field is created that is named "fieldName__precision". It holds an integer value representing the "precision" of each date value, i.e. whether it holds a full date, only a year, etc. The possible values are:

描述
0 Date and time (can only occur for 'Datetime' fields)
1 Date only
2 Year and month only
3 Year only

存儲 Flex Diagrams 數據

Flex Diagrams 擴展讓用戶在百科頁面內定義(並顯示)圖表。 如果 Cargo 和 Flex Diagrams 都安裝在同一個百科上,你能把那些圖表中的一些數據存儲在特殊 Cargo 表中,以便瀏覽和查詢數據。 能存儲兩種圖表類型的數據:BPMN 圖表和甘特圖。 不像 Cargo 的標準特殊表,_pageData 和 _fileData,你不能指定設置哪些列——它們都是。

業務流程圖

關於業務流程建模符號(BPMN)圖表的數據存儲在表 _bpmnData。 這個表能通過在 Cargo /maintenance 目錄調用下列生成:

php setCargoBPMNData.php

This table holds the following columns:

字段 描述
_BPMNID The internal ID of a component
_name The external name assigned to the component
_type The type of the component; one of 'task', 'exclusiveGateway', 'sequenceFlow', or 'startEvent'
_connectsTo The IDs of the components this component connects to
_annotation The annotation of this component.

甘特圖

關於甘特圖的數據存儲在表 _ganttData。 這個表能通過在 Cargo /maintenance 目錄調用下列生成:

php setCargoGanttData.php

This table holds the following columns:

字段 描述
_localID The internal ID of a task
_name The name of the task
_startDate The start date of the task
_endDate The end date of the task
_progress A decimal value (between 0 and 1) representing the progress of the task
_parent The ID of the parent task, if any
_linksToBB A list of IDs of tasks whose beginnings are connected to this task's beginning
_linksToBF A list of IDs of tasks whose ends are connected to this task's beginning
_linksToFB A list of IDs of tasks whose beginnings are connected to this task's end
_linksToFF A list of IDs of tasks whose ends are connected to this task's end