API:Presenting Wikidata knowledge
Display multilingual information from Wikidata in your application. |
This page shows how to retrieve and present relevant information from Wikidata by associating it with entities in your application.
You can use Wikidata items and properties to provide language-independent information about entities (real-world things) in your application—events, places, people, works of art, concepts, etc. This is more direct and consistent than presenting descriptions and snippets from Wikipedia articles about these things, as API:Page info in search results explains.
Example
[edit]Inventaire lets you create an inventory of your books and share with others. It displays certain properties from Wikidata about books, such as P407 "original language of work" and P50 "author". To do so, it uses Wikidata's 'Q' IDs internally to identify books. For example, its URL https://inventaire.io/entity/wd:Q180736 shows certain properties from the Wikidata entity http://www.wikidata.org/entity/Q180736 (the book "Les Misérables"). The Wikidata glossary explains entities and properties in more detail.
Recipe
[edit]- Find existing wiki pages in the domain of your application, e.g. creative works, places, events, people, species.
- View the Wikidata information for those pages, and choose interesting properties.
- Associate Wikidata entity IDs with entities in your application.
- Display their Wikidata information in the user's language.
- Use the Wikidata "sitelinks" information about the item to provide links to the full Wikipedia article about the entity in the user's language.
Getting Wikidata entity IDs
[edit]To get an article's entity ID in Wikidata, you can use the following methods:
- Copy the link "Wikidata item" ('wikibase-dataitem' message key) in the sidebar in most skins. It ends with 'Q'NNNN'.
- Access the
wgWikibaseItemId
variable in client-side JavaScript withmw.config.get( 'wgWikibaseItemId' );
. - Use the API to query the page for the page property
wikibase_item
.
wikibase_item
:Result |
---|
{
"batchcomplete": true,
"query": {
"pages": [
{
"pageid": 61489,
"ns": 0,
"title": "Les Misérables",
"pageprops": {
"wikibase_item": "Q180736"
}
}
]
}
}
|
Choosing properties
[edit]If you view https://www.wikidata.org/wiki/Q180736, you can see the following properties:
- Some localized information:
- "label"
- "description"
- "aliases" (displayed as "Also known as")
- Many "Statements" about the item that give values for its properties such as "author" and "publication date"
- Many "sitelinks" for the item, providing the titles of pages about the item in various Wikipedias and other Wikimedia projects
Clicking the title of a statement takes you to a page about that property. For example, the "author" property is Property:P50. Property pages in turn have labels, descriptions, aliases, and further statements, much like the Wikidata pages for real-world items.
The set of properties in Wikidata is steadily growing. Not all items in Wikidata have properties, and not all property values have been translated into all languages. For example, Victor Hugo's occupation as an "author" has been translated into nearly all languages, but lesser known occupations may have fewer translations. You need to consider how to fall back to an available language if a property or value isn't translated into a language you are supporting, and you shouldn't build your application around a property that only appears in a few statements. The API performs language fallback for you if possible. (Of course you can help by contributing missing statements and translations to Wikidata.)
Querying Wikibase
[edit]The extensions Wikibase Repository and Wikibase Client power Wikidata, together with related components. Most Wikimedia sites run Wikibase Client (check with Special:Version), while only wikidata.org itself runs Wikibase Repository. Wikidata Repository implements several modules for MediaWiki's Action API, all prefixed with wb
. The main API module in Wikibase Repository is wbgetentities
. (See its generated API help). This returns the dataset Wikidata has about items (QNNNNN entities) or properties.
Retrieving and displaying Wikidata information
[edit]Say you have associated Wikidata entity IDs with your application's entities, and you want to display the following information:
- the label and description of the item
- author (property P50)
- publication date property P577)
- genre (property P136)
- a link to the Wikipedia article for more information
action=wbgetentities
can return the same information that you see on an item's Wikidata page: labels, descriptions, aliases, "claims" (like statements), and sitelinks. Let's ask for this information about Les Misérables in a less popular language, Azerbaijani, to see how languagefallback
and sitelinks/urls
work.
- You can give
wbgetentities
page titles on a wiki; but in this scenario, we provide it with the entityids
of the Wikidata items for entities in our application. - You can specify the
languages
you want for the information, and it will only return the description, labels, and aliases in that language (if they are available).- You can also specify
languagefallback=
so that values and properties without a translation in your requested languages fall back to some value.
- You can also specify
wbgetentities
has no means to specify which properties you want, instead you request all claims about the entity. So in this scenario, we requestprops=labels|descriptions|claims|sitelinks/urls
.- You can specify a
sitefilter
for the wiki site links you want. In this scenario we only want the Wikipedia page (if any) on the wiki for the same language.
Result |
---|
{
"entities": {
"Q180736": {
"type": "item",
"id": "Q180736",
"labels": {
"az": {
"language": "az",
"value": "Səfillər"
}
},
"descriptions": {
"az": {
"value": "1862 Victor Hugo novel",
"language": "en",
"for-language": "az"
}
},
"claims": {
"P840": [
{
"mainsnak": {
"snaktype": "value",
"property": "P840",
"datavalue": {
"value": {
"entity-type": "item",
"numeric-id": 90
},
"type": "wikibase-entityid"
},
"datatype": "wikibase-item"
},
"type": "statement",
"id": "Q180736$f42b7321-40a9-758f-3722-72a960687f60",
"rank": "normal"
},
...
}
},
"sitelinks": {
"azwiki": {
"site": "azwiki",
"title": "Səfillər (roman)",
"badges": [],
"url": "https://az.wikipedia.org/wiki/S%C9%99fill%C9%99r_(roman)"
}
}
}
},
"success": 1
}
}
|
From the response, you can see the following information:
- The label for Les Misérables is available in Azerbaijani ("Səfillər").
- The description
"for-language": "az"
falls back to the English description. - There is a wiki page for it on Azerbaijani Wikipedia: az:Səfillər (roman).
Choosing sitelinks
[edit]The generated API help for action=wbgetentities
includes all the possible values for site
and sitefilter
. (Wikimedia encompasses a lot of wikis!)
Visit Special:SiteMatrix for a table listing Wikimedia wikis.
The wiki names are pretty standardized except for some edge cases, so it's safe to assume that wiki names from that table that exist and are not struck out (meaning closed) in sitefilter
. If you want to, for example, ask for links to Wikiquote sites that may not exist yet, your code can also query the API module action=sitematrix
(documentation) and look through its response to dynamically build a list of relevant sites for sitefilter
.
Parsing claims
[edit]The claims that give properties values are unavoidably complex: there can be more than one and they may disagree, they differ in rank
, they are (ideally) backed up by references, they may be qualified (for example the date range in which a claim applies).
As a result, for each property value you want, you must walk through an array of claims for it. In this example, for "author" and "genre" of Les Misérables, you would expect the value of a statement about them to be another item in Wikidata (rather than a simple number or date). To get the IDs for the genre (P136), we are looking for the following property (in the syntax for JSON elements used by jq):
.entities.Q180736.claims.P136[].mainsnak.datavalue.value."numeric-id"
In pseudocode, you would locate entities.Q180736.claims.P136
in the JSON response, then for each element in the array, you would check that its mainsnak.datavalue.value['entity-type']
exists and its value is "item", then you can safely access the numeric-id
.
The result is a set of numbers of items, in this case 8261 and 192239.
You then need to request the labels of items Q8261 and Q192239 in the user's language, making a similar action=wbgetentities
request but only requesting props=labels
.
For performance, you should batch up all these follow-on queries and build a local cache of item labels, so that you don't repeatedly query the Wikidata API to find that Q8261 is a "novel" ("Roman" in Azerbaijani).
Getting the publication date (P577) is a little simpler since the value of a statement about it is a simple date rather than another item:
.entities.Q180736.claims.P577[].mainsnak.datavalue.value.time
In pseudocode, you would locate entities.Q180736.claims.P577
in the JSON response, then for each element in the array, you would check that its mainsnak.datavalue
exists and its "type"
is "time"
, then you can use its value
.
The result is a set of times, in this case one value: "+1862-01-01T00:00:00Z". A time value's format resembles ISO 8601; the Wikibase DataModel gives the details, including datavalue.value.precision
which in this case is 9, indicating this publication date is 1862.
action=wbgetclaims for claims alone
[edit]If you want only the claims of an item (wbgetentities
' props=claims
), you can instead invoke the API module action=wbgetclaims
. It returns similar information.
Alternatives
[edit]You can associate an entity in your application with a page in a particular language's Wikipedia. Then as Page info in search results shows, you can query for and display useful information from that article such as a lead image thumbnail, opening text, and description (action=query&prop=pageimages|pageterms|extracts
, try it for Les Misérables). A downside of this is page titles change so you may have to deal with redirects. Another downside is that it's not multilingual: you have to know the page's title in other wikis (for example, the article in Greek Wikipedia about Les Misérables is Οι Άθλιοι), or track down a "sitelink" to the page in another language. Hence that article talks about page info in the context of search—if your user is searching for articles from a wiki, you know the language and wiki to query.
See also
[edit]- qLabel is a JavaScript library to help create multilingual web sites. You can mark up text elements with 'Q' IDs, and the library retrieves their Wikidata labels in the user's language and replaces the text.
- Reasonator and Autodesc are tools that create machine-generated articles and short descriptions about Wikidata items.
- Wikidata.org maintains a growing list of external tools.
- Consult or reuse the code in existing tools to parse claims.
- For example, inventaire.io uses wikidata-sdk to query Wikidata and handle its responses.