Jump to content

Wikimedia Discovery

From mediawiki.org


For current work, see Wikimedia Search Platform


The Discovery Department of Wikimedia Engineering has the mission to make the wealth of knowledge and content in the Wikimedia projects easily discoverable. We have a number of projects detailed below that focus us on creating and supporting new forms of discovery.


Projects

Search Platform

Discovery is responsible for maintaining and enhancing the various Search features and APIs for MediaWiki. This includes the CirrusSearch extension which relies on Elasticsearch, the search backend used at the Wikimedia Foundation to support Wikimedia projects.

Learn more about Search and the current work of the team. Current work by this team is tracked on this Phabricator workboard and on the public Search Analytics Dashboard that monitors and analyze the impact of our efforts, as well as the External Search Traffic dashboard that very broadly looks at where our requests are coming from.

Current Goals (FY 2017-18 Q2)

  • Objective 1: Implement advanced methodologies such as “learning to rank” machine learning techniques and signals to improve search result relevance across language Wikipedias.
    • Begin to automate the machine learning pipeline, starting by targeting eight to ten languages, other than English, that match (at a minimum) current performance and then deploy those models.
  • Objective 2: Improve support for multiple languages by researching and deploying new language analyzers as they make sense to individual language wikis.
    • Investigate open source language software that is available and see if it can be converted into ElasticSearch plugins.
    • Investigate usage of fall-back languages and fuzzy (phonetic) matching.
    • Continue general language support.
  • Objective 3: Investigate how to expand and scale Wikidata Query Service to improve its ability to power features on-wiki for readers
    • Work on sub-category filtering and searching within the Wikidata Query Service.
  • Address technical debt:
    • Convert existing Selenium tests to Node.js
    • Investigate ownership and maintenance of Logstash

Structured Data on Commons

Current Goals (FY 2017-18 Q2)

  • Objective 1: Commons search will be extended via CirrusSearch and ElasticSearch and Wikidata Query Service, to support searching based on structured data elements describing media.
    • Determine advanced search requirements and measures for structured data on commons.
  • Objective 2: Advanced search capabilities (e.g., Wikidata Query Service, SPARQL queries) will be updated to support the more specific media search filters and the relationships to the topics they represent
    • Begin work on prefix- and full-text search in ElasticSearch on Wikidata in preparation for the Structured Data on Commons project.

Wikidata Query Service (WDQS)

Searching structured data on Wikidata is an integral part of Discovery in building the Wikidata query service. It provides a SPARQL API through which tools can access Wikidata. Learn more about the Wikidata query service. Our current work is tracked on this Phabricator workboard and weekly deployments of WDQS are documented on wikitech:Deployments; while a public WDQS Analytics Dashboard is used to monitor and analyze the impact of our efforts.

Current Goals (FY 2017-18 Q2)

Wikidata Query Service goal for this quarter will be to work on sub-category filtering and searching within the Wikidata Query Service; it will be maintained by Stas and Guillaume to support the continued growth and use of the service; the Analysis team will help with statistics.

Wikipedia.org portal

Many people discover Wikipedia via https://www.wikipedia.org/ (roughly 1.5-2% of our total page views) and the Discovery team has been improving the user experience for these visitors. Here is a report from 2015, detailing the initial analysis from the Discovery team about what we can do to make the portal better.

Learn more about the work around the Wikipedia.org portal project. Current work by this team is tracked on a Phabricator workboard and a listing of upcoming A/B tests can be found here. We also track usage on a public Portal Analytics Dashboard to monitor and analyze the impact of our efforts.

Current Goals (FY 2017-18 Q2)

  • Update the Wikipedia.org portal codebase to be completely automated for ease of ongoing maintenance.
    • Automate portal project updates: statistics and translations

Maps

Discovery is about finding and navigating to content, and one way for users to do that is via maps. To provide better maps the team is working to make OpenStreetMap tiles available on all Wikimedia projects. The technical challenge is doing so at a scale sufficient for their widespread usage.

Learn more about the Maps project in general—work is tracked on a Phabricator workboard and on a Maps Analytics dashboard.

Current Goals (FY 2017-18 Q2)

  • Support the move to be more operationally centralized and roll out a new map style that has numerous updates and enhancements.
    • Finalize and deploy new map style; replicate maps test cluster in Wikimedia Cloud Service; monitor for critical bugs

Analysis

The analysis group within Discovery manages the Discovery Dashboard, as well as analyzing A/B tests and other data. Learn more about the Discovery analysis team and even more information on how they do their analysis and the impact (on Meta). Current work by the analysis team is tracked on this Phabricator workboard

Current Goals (FY 2017-18 Q2)

The team will continue to work closely with the Search Platform team to analyze A/B tests and other assorted data; they will also begin working on determining a baseline set of metrics for Structured Data on Commons.

APIs

Application Programming Interfaces (APIs) provide developers ways to interact with the MediaWiki software.

API:Search and discovery lists the search APIs available and in development. View our public API Analytics Dashboard to monitor and analyze the impact of our efforts.

Other

For general questions about the work of the Discovery department, please see the FAQ. For any questions about the term "Knowledge Engine" please refer to this FAQ. You can find all of our data and key performance indicators on our data dashboard.

The team

Below is a list of sub-teams in the Discovery Department. This list was last updated on June 8th, 2017.

Each sub-team lists the names and team roles (not job titles; those are listed in the staff and contractors page, and may or may not be the same as the person's team role) of anyone who spends a not insignificant amount of time on a project; this therefore means that some names are duplicated across teams.

These lists are only intended to roughly convey who is working on what; no guarantees are made that the list is accurate to any particular level of detail. If you have questions, please contact Deb Tankersley.

Search Platform

Wikidata Query Service

Wikipedia Portal

Maps

Analysis

Cross-team support

Communications

See Updates below for Discovery weekly status updates

Mailing lists

Discovery - A public mailing list about Wikimedia Discovery projects. Examples of topics would include:

  • Announcements, including major upcoming initiatives, completed major releases, quarterly or annual plans, requests for feedback or input
  • Technical discussions and brainstorming regarding our work:
    • Search, Elastic, Cirrus, the Relevance Forge, and other relevant subjects
    • The portal and associated work
    • Our dashboards or related analysis
    • Note that there is a separate list for maps (below)
  • Departmental news, such as changes to team structure, significant changes to team process, changes in how we use phabricator or other tools like gerrit

Maps - Discussion and development coordinating the integration of OpenStreetMap and other free map sources into Wikimedia projects.

IRC channels

#wikimedia-discovery connect

#wikimedia-interactive connect - for talking all Interactive Wikimedia projects - maps, graphs, etc.

Twitter

https://twitter.com/WMF_Discovery

Meetup groups

Process

Discovery uses a "scrumban" process, which is a hybrid of Scrum and Kanban. It is described here: Discovery/Process.

Conferences, gatherings, and other events

Past events

Updates

Weekly Discovery status updates

See Discovery/Status updates for the archive of past Discovery updates (Subscribe)

The Search Platform team (formerly the Discovery Department) at the Wikimedia Foundation is working on many different projects. These weekly summaries are an attempt to keep interested people up-to-date on what the department is currently working on. Weekly summaries are posted to this page and on the Search Platform (formerly Discovery) mailing list every Friday.

Subscribe

Subscribe to receive new updates via on-wiki notification and (opt-in) email.

Subscribe

Contribute

Contribute to the next edition at Discovery/Status updates/Next.

Archives

2019

2018

2017


2016

Meeting minutes

Wikimedia Discovery/Meetings

Quarterly reviews

Data Analysis

The data access and analysis guidelines used by the Discovery team around data sources, or by other teams around Discovery data sources, are documented on Meta.

Deployers

Useful reference for who can deploy code. It's nice to know whom to bug if you need something:

Person MediaWiki

Deployer

Elasticsearch

Deployer

Maps

Deployer

Graphoid

Deployer

Portals Deployer
dcausse Yes Yes
ebernhardsen Yes Yes
jan_drewniak Yes
gehel Yes Yes Yes

Code

Discovery team supports the following code:

Repository Phabricator/Diffusion Github mirror Active?
CirrusSearch extension https://phabricator.wikimedia.org/diffusion/ECIR/ wikimedia/mediawiki-extensions-CirrusSearch
Elastica extension https://phabricator.wikimedia.org/diffusion/EELA/ wikimedia/mediawiki-extensions-Elastica
GeoData extension https://phabricator.wikimedia.org/diffusion/EGDA/ wikimedia/mediawiki-extensions-GeoData
Wikidata Query Service https://phabricator.wikimedia.org/diffusion/WDQR/ wikimedia/wikidata-query-rdf
Wikidata Query Service GUI https://phabricator.wikimedia.org/diffusion/WDQG/ wikimedia/wikidata-query-gui
WDQS deployment https://phabricator.wikimedia.org/diffusion/WDQD/ wikimedia/wikidata-query-deploy
WDQS GUI deployment wikimedia/wikidata-query-gui-deploy
Wikimedia Portals https://phabricator.wikimedia.org/diffusion/WPOR/ wikimedia/portals
PHP textcat https://phabricator.wikimedia.org/diffusion/WTEX/ wikimedia/wikimedia-textcat
Relevance Forge wikimedia/wikimedia-discovery-relevanceForge
Discernatron wikimedia/wikimedia-discovery-discernatron
Discovery Analytics https://phabricator.wikimedia.org/diffusion/WDAN/ wikimedia/wikimedia-discovery-analytics