Files and licenses concept
This page is obsolete. It is being retained for archival purposes. It may document extensions or features that are obsolete and/or no longer supported. Do not rely on the information here being up-to-date. |
This page is a collection of technical information regarding storing certain file properties separately from wikitext. Currently the aim is to store only basic copyright information (author and license), but this could later be extended.
Current situation / Introduction
[edit]Every page has an entry in the mw_page
table, primarily uniquely identified by page_id
.
Every file has an entry in the mw_image
table, identified by img_name
.
Every revision of every page has an entry in the mw_revision
table.
A revision is made when the page is created, edited, renamed, or protected.
Every text version of every page has an entry in the mw_text
table.
If a revision didn't change the text, it keeps referring to the same mw_text
row.
Licenses allowed to be chosen during upload are defined at MediaWiki:Licenses.
That page is kept as a list – the last piped part describes the license, everything before that is wrapped between {{ and }} on the File-page under a == {{subst:MediaWiki:License-header}} ==
.
Information about the file is stored in the {{Information }}-template.
Viewing a file:
- Page namespace/title is looked up in
mw_page
, andmw_image
. - Information from other tables is retrieved by the page ID (
mw_page.page_id
)
Purpose and use cases
[edit]It should be easy to obtain the copyright information for a file (example:bug 25624). Use cases include:
- A data consumer (like the pdf creation tool, or a mobile app) wants to transform a page into a new format, and get all the information (esp. author(s) and license(s)) that is necessary to do proper attribution in the new format.
- Someone importing CC BY text from another source wants to ensure that the authors, source, and license of the original work are stored in a consistent way, so that the license can be complied with later in a standardized way.
Why this is such a good idea is summarized by the following:
- Getting information from the API for re-use (ie. WordPress plugin to search images and attribute names automatically in the article)
- Standardizing file-pages centrally (either by core or by mediawiki-message but not per-page with templates)
- Perhaps for search engines to index files properly by using xmlns:cc-attributes or <meta copyright> tags that currently are based on the general license for the wiki text instead of the file.
- Special File-search with ability to filter by licenses (just like ns0=&ns1&, lic1&lic2; Individual wikis could edit messages (1/2) and put links to search through certain sets of licenses)
- Automatically attributing authors (in case of CC-BY-*) in articles and perhaps mention the license (in case of CC-SA-*)
- Name author in search (See Flickr)
The following (sub)requirements can be identified:
- Author and licensing information should be stored outside the wikitext, since parsing wikitext is often difficult for third-party data consumers (like the pdf creator)
- A list of files by copyright holder should be obtainable
- Copyright information should be provided by the user at upload
- Copyright information should be editable
- Copyright information should be versioned
- Administrators should be able to define a set of canonical licenses
- For each license basic information should be defined (name, full title, url to legal code/deed)
- The display of a license should be localizable
Proposed situation
[edit]Every page has an entry in the mw_page
table, primarily uniquely identified by page_id
.
Every file has an entry in the mw_image
table, identified by img_name
.Special properties about this file are stored in mw_file_props
.
Every revision of every page has an entry in the mw_revision
table.
A revision is made when the page is created, edited, renamed, protected or when file props changed. Every revision contains a reference to a file properties version, which may be NULL.
Every text version of every page has an entry in the mw_text
table.
If a revision didn't change the text, it keeps referring to the same mw_text
row.
For every file-properties version of a file there are one or more entries in the mw_file_props
table.
If a revision didn't change the properties, it keeps referring to the same mw_file_props
row.
Licenses valid on the wiki are defined in the mw_license
table, which is managed from [[Special:LicenseManager]]. Since there could potentially be many licenses, the ones choosable from the upload form are fetched from that table. The <select>
would contain an <optgroup
for "Most used licensed" (top 5 or 10, order by lic_count) and an <optgroup>
for all licenses ordered by alphabet.
Links to information about the file (such as author and license) are stored in the mw_file_props
-table and displayed on the File-page through a centrally determined layout. Description, source, date, location and additional wikitext (like User-templates and categories) are stored in Wikitext, only author and license are stored separately.
How to get there
[edit]Licenses
[edit]Table structure
[edit]Licenses have their own table, (say mw_license
). With columns like:
lic_id PRI UNIQ AI, lic_name VARBINARY 255 lic_url VARBINARY 255 lic_count INT
- The text of the licenses are stored in [[MediaWiki:License-NAME-text]]] which contains wikitext (where NAME is
mw_license.lc_abbrev
).
When used on the File-page, the following parameters are passed:- $1: author (
mw_file_props.fp_author
) - $2: attribution (
mw_file_props.fp_attribution
, if NULL same as author) - $3: title (
{{int:License-ABBREV-title}}
)
- $1: author (
- The title of the licenses are stored in [[MediaWiki:License-ABBREV-title]] which is plain-text.
Example:
# Database-entry lic_name TASL lic_url http://tasl.org/licensedeed.html # Message [[MediaWiki:License-TASL-text]] This file by $1 is licensed under $3. Please attribute the author as:<br />''$2'' [[MediaWiki:License-TASL-title]] The Awesome Something License
Example:
# Database-entry lic_name CC-BY-SA-3.0 lic_url http://creativecommons.org/licenses/by-sa/3.0/legalcode # Message [[MediaWiki:License-CC-BY-SA-3.0-text]] {{Cc-by-sa-3.0|attribution=$2}} [[MediaWiki:License-CC-BY-SA-3.0-title]] Creative Commons Attribution Share-Alike 3.0 License [[MediaWiki:License-CC-BY-SA-3.0-url/nl]] http://creativecommons.org/licenses/by-sa/3.0/deed.en //reason for these messages being seperated from database and in messages is to allow easier translation, example for Dutch: [[MediaWiki:License-CC-BY-SA-3.0-title/nl]] Creative Commons Naamsvermelding Gelijk-Delen 3.0 licentie [[MediaWiki:License-CC-BY-SA-3.0-url/nl]] http://creativecommons.org/licenses/by-sa/3.0/deed.nl
License management
[edit][[Special:LicenseManager]]
- Lists all licenses (may be viewed by anyone ([*]). Editing is done on pages like [[Special:LicenseManager/12]] (by id, like AbuseFilter)
- The actual texts are stored in MediaWiki:-messages, so they could contain a template to allow editing by non-sysops. Editing is limited to users with the
licensemanager-modify
right.
$wgGroupPermissions['*']['licensemanager-modify'] = false;
$wgGroupPermissions['sysop']['licensemanager-modify'] = true;
- Changes, creations and removals of licenses are publicly logged at Special:Log/licensemanager.
- Removal only possible if not in use. In the event a file previously using the license would be reverted to a state where it uses this one again, it would display
{{int:License-notfound}}
and categorize internally into a category like Category:Files with previously deleted licenses
Upload
[edit]Drop-down menu
[edit]During upload a license must be chosen from the drop-down menu. The drop-down menu is populated by the license table. The <select>
would contain an <optgroup
for "Most used licensed" (top 5 or 10, order by lic_count) and an <optgroup>
for all licenses ordered by alphabet. Licenses that are marked as deleted are not shown and can't be used.
<select>
<optgroup label="Most used licenses">
<option val="1">Creative Commons Attribution Share-Alike 3.0 License</option><!-- Contents of {{int:License-CC-BY-SA-3.0-title}} -->
<option val="2">GNU Free Documentation License (Version 1.2 or later)</option>
</optgroup>
<optgroup label="All licenses alphabetically">
<option val="1">Creative Commons Attribution 3.0 License</option>
<option val="4">Creative Commons Attribution Share-Alike 3.0 License</option>
<option val="2">GNU Free Documentation License (Version 1.2 or later)</option>
<option val="12">The Awesome Something License</option>
</optgroup>
</select>
File properties
[edit]Table structure
[edit]Meta data about the file itself is still kept in the mw_image
table.
Page content information is still kept in mw_page
and mw_revision
.
Information about the file as a work is kept as a property either in the new mw_file_props
.
The mw_file_props
is similar to the mw_text
table in that it is kept per revision and only updated when needed.
A reference to the current file props is kept in the appropriate mw_revision
rows, just like it keeps a reference to mw_text
.
When either doesn't change the reference is kept and so duplicate sets will be made in mw_file_props.
mw_file_props contains: fp_id INT (mw_revision.rev_fileprops_id is a key to this column) fp_key VARBINARY(255) fp_value_int INT fp_value_text VARBINARY(255)
EXAMPLE
fp_id | fp_key | fp_value_int | fp_value_text |
---|---|---|---|
1 | author | 50 (mw_user.user_id of User:Krinkle) | NULL if empty the username is used |
1 | author | 43 (mw_user.user_id of User:Catrope) | Roan wiki user who wants display name different from username |
1 | author | NULL | John Doe |
1 | license | 2 (mw_license.lic_id of CC-BY-SA-3.0) | |
1 | license | 5 (mw_license.lic_id of GFDL) |
This file has three authors: Krinkle, Catrope (attributed as Roan) and John Doe (not a wiki user). And is dual licensed.
Management
[edit]An example of what the Wikitext of a File-page could/would look like:
{{#file-descr: {{en|Chiang Kai-shek Memorial Hall's gate at night in [[:en:Taipei|Taipei]].}} {{fr|Porte du Chiang Kai-shek Memorial Hall de nuit à [[:fr:Taipei|Taipei]].}} }} {{#file-date|2011-01-14}} {{#file-source|{{Own}}}} {{user:guillom/photos}} [[Category:National Taiwan Democracy Memorial Hall]] [[Category:Gate of Great Centrality and Perfect Uprightness (Taipei)]] [[Category:MediaWiki Projects]]
The following elements:
- Author (small textarea)
- Date (date picker, eventually should contain 14-int timestamp
jQuery datepicker features a way to prevent other characters from being entered
also do serverside check that this is a valid timestamp) - License (dropdown + button to add/remove license (multiple licenses are allowed)
- Attribution (small textarea)
.. are kept outside of wikitext in their own respective fields (fetched from and saved to mw_file_props
).
These seperate input fields could be editable in two ways:
- Either on [[Special:FileProperties/File:Example.jpg]]
- Or in additional form elements above or below the textbox on the action=edit page
When saving properties a revision is saved (like when moving or protecting the page) with the same rev_text_id
but with a new reference in rev_fp_id
to the added row in mw_file_props
.
Like wise when saving altered wikitext a revision is saved with the same rev_fp_id
but with a new reference in rev_text_id
to the added row in mw_text
.
Idea: A user setting in the preferences decides whether the user sees his own language's description (if available - like {{LangSwitch}}) or all descriptions.
Display
[edit]When viewing a file-page the page is built like:
#filetoc
- ...
#file
- ...
#fileinformation
- Automated page generated like this.
- Layout is fixed (perhaps allow editing the layout in a MediaWiki-message, or dont and instead provide sufficient CSS-hooks).
[[MediaWiki:Fileinformation-template]] $1: description (tag-hook), $2: source (tag-hook), $3: author (file props), $4: license-titles (file props), $5: additional wikitext (all other page text (such as problem tags, user templates etc.) that wasn't filtered (for example, category and interwiki links are filtered from output)
==={{int:fileinformation-header}}=== __EMBEDMETADATA__ <div class=toccolours><legend>{{int:fileinformation-description}}</legend> $1 </div> <div class=toccolours><legend>{{int:fileinformation-datelocation}}</legend> ... </div> <div class=toccolous><legend>{{int:fileinformation-copyrightlicensing}}</legend> {{int:fileinformation-author}} : $3 {{int:fileinformation-copyrightstatus}} : $4 {{int:fileinformation-source}} : $2 </div> ==={{int:fileinformation-similarmedia}}=== <div class=toccolous><legend>{{int:fileinformation-findmore}}</legend> ... </div> ==={{int:fileinformation-additional}}=== $5
API
[edit]imageinfo
[edit]'prop=imageinfo' result format needs to be extended to include this metadata.
- format of results?
- iiprops key for including/excluding it?
This is a requirement for getting the metadata to pass cleanly to InstantCommons (ForeignAPIRepo) clients.
upload
[edit]'action=upload' needs to be able to pass metadata with a new upload, just as a user uploading directly to the web site needs to be able to add license info on the web UI.
- parameter names?
- parameter format?
- what's the best way to pass sets of arbitrary data like this to an api thingy? array should work?
In order for an API client upload tool to present available license options, it also needs to be able to query:
Query license options
[edit]In order to add license metadata to a new or modified upload, an upload client will need to be able to query the available settings. This same interface could/should probably also be used in things like the UploadWizard that supplement Special:Upload with more client-side ajaxy stuff.
- module name?
- parameters?
- result format?
Editing
[edit]In addition to initial uploads, license metadata on an existing file may need to be altered. Changed license metadata needs to be passable either to the existing page editing method, or a dedicated one for file metadata.
- existing or new module? module name?
- parameter format?
- what's the best way to pass sets of arbitrary data like this to an api thingy? array should work?
Export/import and dumps
[edit]If license metadata lives outside the page text, it may also need to be added to the Special:Export & data dump format.
- data structure
- add to special:export
- make sure dump tools handle it without exploding
Hm...
[edit]- Transition
Some kind of detection is required to fallback to the old way (ie. don't render the new file-page layout, but as a normal wiki page).
The easiest way to detect if a page has been converted from the old wki text (eg. {{Information }}) to the new system is to check if a filepage has NULL in mw_revision.rev_fileprops_id
(ie. has no entry in mw_file_props
), then it is an old-style file. In that case, we don't generate the File-page layout, but just parse the wikitext the good ol' way and display it on the page.
- Sounds good afaik. Krinkle 17:00, 22 January 2011 (UTC)