Jump to content

Core Platform Team/Initiatives/Unify Parsers-Phase 2

From mediawiki.org

Initiative Description

< Initiatives

Summary

MediaWiki currently has two wikitext parsers: the (legacy) parser and Parsoid supporting different use cases. This project aims to arrive at a single parser that supports all use cases.

Significance and Motivation

Parsoid was developed to support HTML-editing clients but is also used by some read view use cases but not all of them. It is not tenable to have two parsers in the long term since it hamstrings development and upgrades to the parsing codebase, wikitext, and templates since we would have to add that support to both codebases. More importantly, the parsing pipelines in the two parsers are different which makes replicating functionality in both parsers more complex.

We would like to consolidate behind Parsoid as the new default parser given its support for HTML clients, annotated HTML output, and more structured internal pipeline. This requires identifying all output and feature incompatibilities between Parsoid and the legacy parser and bridging those gaps. This may also require updating (a) bots (b) gadgets (c) extensions (d) wikitext. This project aims to minimize all such changes by handling any differences with appropriate tooling and support.

Once Parsoid is deployed as the default and only parser for all wikitext-based use cases, we can embark upon much needed work to enhance wikitext and templates and make them easier to use, more performant, less error-prone, and easier to write tools for.

Outcomes

Reduce complexity in core

Baseline Metrics

None given

Target Metrics

None given

Stakeholders
  • Client teams (Web, VE, Flow, CX, Apps)
  • Bot, Gadget, and Extension authors (only as pertaining to the Wikimedia cluster initially)
  • Editing community
  • Core Platform
Known Dependencies/Blockers

Reduce Extension Interface Surface Area

Epics, User Stories, and Requirements

< Initiatives

Time and Resource Estimates

< Initiatives

Estimated Start Date

Late FY1920 Q1

Actual Start Date

None given

Estimated Completion Date

None given

Actual Completion Date

None given

Resource Estimates

18-24 months

3.5 FTE and .5 Engineering and Project Manager for the duration

Possible augmenting of other engineers, but more clarity is needed.

Collaborators
  • Parsing Team
  • Core Platform
  • Performance
  • SRE

Open Questions

< Initiatives

  • To what extent do we want to refactor the Parsing Interface in Core? It is currently coupled with the legacy wikitext parser and the templating implementation.
  • What is acceptable output disparity between Parsoid and the PHP parser? How do we decide this? What qualitative analysis should be used?
  • What are our strategies for engaging with the community on any changes this might require them to do?
  • What additional work is required on the Linter extension to better support editors with any required wikitext and template changes?

< Initiatives

Phabricator

https://phabricator.wikimedia.org/tag/parsoid-read-views/

Plans/RFCs
  • The Long And Winding Road To Making Parsoid The Default MediaWiki Parser ( Slides Video )
Other Documents

Subpages


Blocked, waiting for phase 1 to be complete.

Some work is less defined until several tasks are complete which are expected to define the rest of the project.