Jump to content

Data Platform Engineering/Data Engineering

From mediawiki.org

Responsibilities

[edit]

The Data Enginering team is responsible for the core capabilities of the data platform, including data storage, batch and streaming infrastructure, and distributed query engines.

This platform supports ingestion of Wikimedia project content, web traffic, instrumentation, operational data and other datasets into the Data Lake. The team manages the ingress data pipelines, whereas the data producers manage their respective data pipelines and data products.

The team's responsibilities also include data quality, observability, and discoverability.

The Event Platform has been merged into this team.

Planning & Goal setting

[edit]

The current quarterly plan (Q2) can be viewed here.

And the corresponding OKRs are tracked in Asana.

Backlog & Sprint Backlog

[edit]

The backlog and current sprint work of the Data Engineering team is tracked in the Data Engineering & Event Platform Phabricator board.

New backlog items are triaged every week. The current Sprint cadence is 3 weeks.

Technical Documentation

[edit]

We are currently working on organizing our documentation. Meanwhile have a look at | Data Engineering

Contact Us

[edit]

Please see the Intake Process page to make a request or contact one of our Product Managers.