Engineering metrics in June:
Major news in June include:
Note: We're also providing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge.
There are many opportunities for you to get involved and contribute to MediaWiki and technical activities to improve Wikimedia sites, both for coders and contributors with other talents.
For a more complete and up-to-date list, check out the Project:Calendar.
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.
- Sean Pringle joined the Technical Operations team as our Storage and Database Engineer (announcement).
- Brian Wolff joined the Wikimedia Platform Engineering group as Software developer for the Summer, working on multimedia contribution and review (announcement).
- Ken Snider joined the Technical Operations team as an international contractor, poised to fill the Director of Technical Operations position (announcement).
- Toby Negrin joined the Engineering department as Director of Analytics (announcement).
Site infrastructure
- As part of our capacity planning work, Mark Bergsma upgraded most of our Varnish infrastructure (in EQIAD & ESAMS) with newer and faster servers. He will be adding new mobile Varnish servers in ESAMS next, this coming month. Rob Halsell and Daniel Zahn are pushing ahead with the migration of the other applications from Tampa to EQIAD. New Parsoid application and Varnish servers were also deployed in anticipation of the coming VisualEditor deployment. Meantime, Alexandros Kosiaris is starting the backup project work; read more about the project and the technology.
- Mark also put in the finishing touches to deploy all the new network infrastructure at ESAMS. With help from Mark and Leslie Carr, we finally got approval from ARIN for some new IPv4 addresses, needed for our new ULSFO buildup.
- Many people are refactoring Puppet code with the ultimate goal of having everything organized into Puppet modules. Andrew Bogott, Antoine Musso and Alexandros are setting up an automated testing infrastructure to support these efforts.
Data Dumps
- Our GSoC student, Petr Onderka, is set up in Gerrit and committed his first contributions to the Incremental Dumps project; you can follow his code, read his progress reports and check the current discussion on the mailing list. Additionally, we hold IRC meetings on weekdays at about 4:15 pm (UTC) in #wikimedia-tech; lurkers and contributors are welcome.
Wikimedia Labs
- Wikimedia Labs saw a lot of improvements in June, including the deployment of AJAX improvements for OpenStackManager to wikitech (added actions: console output; improvements: reboot), and a new interface for displaying quotas for projects in OpenStackManager. We ensured that all instances were properly running Puppet and Salt; Many instances were running
puppetmaster::self
and needed to have local puppet repo merges or rebases. We upgraded Salt everywhere and re-issued keys to fix a vulnerability in Salt. The team also worked on stabilizing the NFS server. We've encountered a kernel bug with NFS; we have changed the scheduler from cfq to deadline and have decreased the read and write sizes of clients to 8 kilobytes. Progress has been made towards making the Labs database replicas available to the Labs at large (as opposed to only the Tool Labs project). Last, much work has been done towards user request fulfillment in Tool Labs, including work towards WSGI support.
VisualEditor [edit]
In June, the VisualEditor team completed the major new features that we prioritised over the past few months, in preparation for making VisualEditor available to most Wikipedia users in July. We have built an editor that is capable of letting users edit the majority of content without needing to use wikitext — text support, as well as adding and editing inclusions of references, templates, categories and media items. The deployed alpha of VisualEditor was updated four times as part of the transition to weekly deployments (
1.22-wmf6,
1.22-wmf7,
1.22-wmf8 and
1.22-wmf9), with several mid-deployment releases as the code was developed to patch urgent issues. Part of this involved running an A/B test for new user accounts on the English Wikipedia, with half of the users getting opt-in to VisualEditor ahead of the wider release. Generally, there were a number of user interface improvements, and fixing a number of bugs uncovered by the community.
Parsoid [edit]
Early this month, we deployed Parsoid to the new cluster and started to track all edits and template / image updates from all Wikipedia sites, which is close to the full load we'll see when VE is deployed to all of them. Our earlier optimization work paid off as the Parsoid cluster and the associated Varnish caches are handling the load very well. The extra load we put on the API cluster is low enough to not cause a problem. As expected, the VisualEditor deployment to the English Wikipedia hardly showed up in the load graphs.
Despite being very short-staffed this month (only two full-time developers), the absence of performance issues left us enough time for a lot more polishing before the VisualEditor release on July 1. As a result, the release went very well with clean diffs on almost all pages.
While more work is left to do, it is now clear that we have fundamentally achieved our goal of a clean translation between WikiText and HTML + RDFa. This does not only enable visual HTML editing, but also makes Wikipedia's content easily accessible in a standardized format. It also opens up new opportunities for MediaWiki's core architecture, which we'll pursue this fiscal year.
Editor engagement features
[edit]
Echo (Notifications) [edit]
In June, we released more features and bug fixes for Notifications on the English Wikipedia and mediawiki.org. Ryan Kaldari added a confirmation button for the '
Thanks feature', and updated
notification fly-outs to show diff links for talk page and interactive notifications, based on a design by Vibha Bamba. Benny Situ continued development of
HTML Email notifications and deployed a variety of feature updates. Erik Bernhardson developed a special 'Suppressed' content feature, while Matthias Mullie developed a range of
new metrics dashboards. Dario Taraborelli and Aaron Halfaker ran a week-long
A/B test of new user activity; results show that new users who received Echo notifications made more edits than those who did not, but their edits were reverted slightly more often. Fabrice Florin led the planning process for Notifications, as outlined in the
2013 roadmap, and hosted a day-long
roundtable discussion to improve editor engagement features in collaboration with Wikipedia users (
see Echo demo and Q&A video on YouTube). Later this summer, we plan to start deploying Notifications on more wiki projects, starting with Meta and the French Wikipedia. To learn more, visit the
project portal, read the
FAQ page and join the discussion on the
talk page.
Article feedback [edit]
In June, we deployed final features and bug fixes for the
Article Feedback Tool (AFT5) on the
English,
French and
German Wikipedias. Matthias Mullie released
an opt-in feature to enable or disable feedback on a page, based on designs by Pau Giner and specifications by Fabrice Florin. In collaboration with Dario Taraborelli, Matthias also developed an updated set of
metrics dashboards showing how the new moderation tools are being used: for example, about half of moderated feedback is marked as 'no action needed', while about a tenth is marked as 'useful' (these results are generally
consistent across different languages). The team also supported a wider deployment of AFT5 on over 40,000 articles on the French Wikipedia, as well as a poll by the German community, which elected not to adopt the tool. Now that feature development has ended for this project, we plan to make AFT5 available to other wiki projects in coming weeks, as outlined
in the release plan. For tips on how to use Article feedback, visit the
testing page, and let us know what you think on this
talk page.
Editor engagement experiments
[edit]
Editor engagement experiments [edit]
In June, the Editor Engagement Experiments (E3) team continued work on its experiments related to
onboarding new Wikipedians, and launched several new extensions to Wikimedia projects.
First, the new Campaigns extension was added to all wikis. This analytics tool helps identify internal or external sources of new registrations, by adding a "campaign" name to the signup page URL. This month, E3 began running campaigns to learn about how many anonymous editors sign up on the top 10 Wikipedias, as well as how many sign up via the invitation to "Join Wikipedia" on the login page (see the list of active campaigns and analysis). Another piece of analytics infrastructure by the team is the new CoreEvents extension, which houses logging of MediaWiki core activity, like preference updates and page saves across all projects.
For the Getting Started project, the team conducted usability testing (see results and documentation) of new designs. E3 also refactored and refined the guided tours extension in June, including adding usability enhancements like new interface animations, support for community tours, and bug fixing. The team also planned and began work on an experiment to deliver guided tours to all first-time editors.
The team also assisted with A/B testing and research for
VisualEditor before its July 1 launch date, assisting with experimental design, EventLogging instrumentation, and other work. After the VisualEditor launch, E3 started a week-long
micro-survey of newly-registered users on English Wikipedia, to give us a first systematic look at the gender diversity of those creating accounts.
2012 Wikimedia fundraiser [edit]
Wikipedia Zero [edit]
This month, the team launched Wikipedia Zero with Dialog in Sri Lanka, patched logic and user interface bugs, enhanced the configuration editor, expanded logging and debugging for identification of anomalous access, further decoupled ZeroRatedMobileAccess from MobileFrontend, and proposed ESI- and JavaScript-based software re-architecture.
Mobile design/Uploads [edit]
This month, we focused on improving education around uploads, including an interactive Commons tutorial and first-time user copyright and scope check. We also released our "Nearby" feature to production, allowing users to find articles near them that are in need of images, take photos and upload them via mobile.
Mobile design/Wikipedia navigation [edit]
In beta, we started working on an update to our site and article navigation, including design tweaks to the left navigation menu and a new in-article contributory navigation that combines article actions (edit, upload, and watch) with a talk page link. We also experimented with Echo integration and successfully got Notifications up and running on the English Wikipedia mobile site. We hope to push all of this work to production next month.
MediaWiki 1.22/Roadmap [edit]
In June, the Platform Engineering group switched to a
weekly deployment cycle for MediaWiki to the Wikimedia Foundation servers. This means that we have almost halved our previous cycle of 2 weeks. As such, we are progressing through wmfXX versions of MediaWiki at a faster rate now. In June, MediaWiki versions 1.22-wmf6 through wmf9 were branched and deployed.
Git/Conversion [edit]
Chad Horohoe and Christian Aistleitner upgraded our Gerrit instance from a pre-release version of 2.6 to a pre-release version of 2.7 on the last week of June. They've additionally published a new version of the Bugzilla/Gerrit integration plugin. Details about new functionality can be found in the
Gerrit 2.7 draft release notes.
Multimedia [edit]
In June, we started expanding our multimedia team: Fabrice Florin joined as product manager, and Brian Wolff began a summer contract as software engineer. We started work on improving the display of images in galleries and are now planning our next development steps in consultation with community members. Some of the first features under consideration include file curation and feedback tools, as well as media viewers, new video formats and other platform improvements, to be prioritized based on user feedback and technical feasibility. We are also recruiting for two more positions: a
multimedia systems engineer and a
senior software engineer. Please spread the word about this unique opportunity to create a richer multimedia experience for Wikipedia and MediaWiki sites!
Admin tools development [edit]
In June, the team worked on making the last changes to enable global AbuseFilter rules, and on the global account renaming tool. Some additional work was done on
Single User Login finalisation, which will mean that all user accounts will be global across all of Wikimedia's public wikis, and so allowing for cross-wiki notifications and better tools for editors.
Search [edit]
Work has pretty much shifted from supporting MWSearch/lsearchd to investigating and implementing Solr. Nik Everett and Chad Horohoe have begun writing an
extension to implement Solr searching for MediaWiki, and we've gotten a lot of the initial basic functionality completed. Peter Youngmeister and Andrew Bogott will be handling the operations tasks for the new setup. Initial operations tasks will involve packaging Solr 4 and working with Chad to puppetize the whole design. Additionally, we're going to do some investigation into ElasticSearch, as it's been suggested as an alternative to Solr.
Auth systems [edit]
In June, the team worked with the Wikimedia Foundation's user experience team to improve SUL2. The improvements were pushed to test wikis on July 1, and will be rolled out to other wikis in July. Implementation of OAuth is well underway, and planned for roll-out in July as well.
HipHop deployment [edit]
A Labs instance of MediaWiki running on HipHop is now available at
http://hhvm.wmflabs.org.
Security auditing and response [edit]
The team continued to respond to reported security issues, and gave security-oriented tech talks on emerging DoS techniques and using OWASP's ZAP tool for vulnerability scanning.
Quality Assurance [edit]
This month saw a QA focus on automated browser tests. Besides creating new tests and new builds, and reporting issues identified by tests, we conducted a training session in San Francisco to create automated tests for the Wikilove feature. We continue to support all WMF software development projects, with the VisualEditor being a particular focus in June.
Beta cluster [edit]
Max Semenik wrote a script to synchronize CSS from production on beta. Steinsplitter and Antoine Musso fixed the AbuseFilter configuration to have a global list of filters on the
Beta Cluster. Filters should be configured there and will be used by all the wikis. The PHP fatal errors catched by the wmerrors extension are now sent to the beta udp2log instance. That will largely improve our troubleshooting process.
Continuous integration [edit]
Timo Tijhof and Antoine Musso triaged continuous integration bugs. Antoine has setup a Jenkins slave and migrated most jobs on it. It will be very easy to add new servers.
Quality Assurance/Browser testing [edit]
This month, the QA team added new browser tests for UniveralLanguageSelector and for Mobile (contributed by the Language engineering and Mobile engineering teams, respectively), as well as browser test contributions from volunteers. We created new builds in Jenkins to run browser tests against IE10. We created tests for VisualEditor, including some with our intern with the Outreach Program for Women.
Analytics/Infrastructure [edit]
We made significant progress with our preparations for replacing udp2log with Kafka in our logging infrastructure. The
C library librdkafka has now support for the 0.8 protocol, there is a first version of
varnishkafka ready that will replace varnishncsa, the Apache Kafka project released their first beta of Kafka 0.8, and we have a
Debianized and
Pupppetized version. We keep on adding new
metrics and alerts to monitor all the different parts of the webrequest dataflows into Kraken. We expect to keep making improvements in the coming months, until we have a fully reliable data pipeline into Kraken. We also continued our efforts of moving Kraken out of beta: we puppetized
Zookeeper,
JMXtrans, and the
Hadoop client nodes for Hive, Pig and Sqoop. We started reinstalling the Hadoop Datanode workers with a fully puppetized Hadoop installation; so far, we have replaced 3 nodes, and we'll replace the other seven in the coming weeks. Last, we enabled Jenkins continuous integration for the Grantmaking & Evaluation dashboards.
Analytics/Visualization, Reporting & Applications [edit]
This month, we completed the
end-user documentation of UserMetrics (v1). We rebranded UserMetrics as Wikimetrics, and we will slowly start to use that as the new name when referring to UserMetrics v2 or UserMetrics replatforming. We focused on laying out the foundation of Wikimetrics: a new database design, a new job queue design and lots of unit tests. In addition, we started working on porting over some of the features of UserMetrics v1 (like the 'namespace edits' metric and UI components), we added user roles (so users can only see their own metrics) and authentication using OAuth. Last, we fixed some minor issues in UserMetrics v1, among which handling of user names with comma, single and double quotes.
Analytics/Data Releases [edit]
We delivered many following analyses in June, including one of Arabic cohort using UMAPI v1. Erik Zachte provided an analysis of Commons uploaders, and we provided the Wikipedia Zero team with a number of datasets to help them in tracking adoption of the Wikipedia Zero project across the globe. We supported the VisualEditor and Editor Engagement teams with experimental design, data modeling and data analysis for two controlled experiments: a test of the impact of
impact of notifications and a
first test of the impact of Visual Editor on new contributors. The tests were carried out in June and the reports are being updated with the results of the analysis. We started using the EE-dashboard instance on Labs to host dashboards related to editor engagement projects, that were previously hosted on the Toolserver (see the
metrics and
features dashboards for the English Wikipedia). Last, we worked with the Features engineering team to expand MediaWiki's instrumentation and collect data on
cluster-wide user preference changes and
edit-related events to support VisualEditor analysis.
Bug management [edit]
Andre Klapper published the
Bugzilla administrator policy and documented
for which specific tasks Bugzilla admin rights are actually needed (which might be also helpful for other projects using Bugzilla). He started publishing weekly
"Bugzilla tips and best practices" blog posts and reproposed
introducing a "PATCH AVAILABLE" status in Bugzilla (as requested by several developers at the Amsterdam Hackathon) whilst work is ongoing to fulfill prerequisites. On the code side of Bugzilla, a new
Bugzilla frontpage went live, providing useful links. Furthermore, the misleading term "login" was
replaced by "email address", it is now possible to
set the "Assigned" status directly when filing a new bug report, and smaller issues with the "Weekly Bugzilla Report" email sent to the wikitech-l mailing list were fixed. In Bugzilla's taxonomy, open tickets in the dormant "Wiktionary tools" product were retriaged and the product closed for new bug entry, and Security-related components in Bugzilla were reorganized after a meeting with the Wikimedia Foundation's security engineer.
Mentorship programs [edit]
The 20
Google Summer of Code and the 1
Outreach Program for Women interns have completed the
bonding period (with 3 exceptions, 2 of them justified) and they are now working on their projects. One OPW accepted candidate declined her participation due to a job offer. Monthly status updates are available for these projects:
We also met with
SocialCoding4Good, who are relaunching their activities, and we refreshed the
Wikimedia page. We expect this to become a regular channel for new technical contributors working in corporations with social/training programs.
Technical communications [edit]
In June, work on this topic mostly focused on perennial activities like
Tech news and ongoing communications support to engineering staff, as
Guillaume Paumier was lent to the VisualEditor deployment effort, working on communications, documentation and liaising with the French Wikipedia.
Volunteer coordination and outreach [edit]
The decision of focusing on fewer activities better executed and based on demand seems to be working out, although it's too soon to confirm the trend. Browser test automation is the number one priority to recruit new contributors, and any help to succeed here is welcome. We created the
QA mailing list as an umbrella to host people and discussions focusing on software quality assurance in all its aspects. We have more than 40 subscribers and an initial flow of activity. We had a successful first
Browser Test Automation Workshop, with 40 participants in San Francisco and a few more online; we will iterate on this model. We have also helped organizing a Tech Talk on
Attack vectors & MediaWiki and OWASP ZAP, and the
upcoming Solr-based Search.
The project to get automated
community metrics based on
vizGrimoire and provided by
Bitergia has been approved, and a first prototype can be seen at
http://korma.wmflabs.org. The project starts effectively on July 1 and includes a one-year period of maintenance. We agreed with the
Analytics team that they will assume the responsibility of this area during this period.
The Kiwix project is funded and executed by Wikimedia CH.
- Development of a new MediaWiki HTML dumper in nodeJS has started. This tool exports Wikipedia articles in static files based on the Parsoid output. This solution looks really promising, and new JavaScript developers are welcome.
The Wikidata project is funded and executed by Wikimedia Deutschland.
- June in Wikidata was all about the sister projects. The development team published proposals for how Wikidata can support Commons and Wiktionary. Additionally, they worked on the ability of Wikidata to store language links to Wikivoyage in addition to Wikipedia; as a result, Wikivoyage will soon also be able to manage their language links via Wikidata. Another important step was the deployment of the geocoordinate datatype. This makes it possible, for example, to indicate the location of a city. Geocoordinates that are already in Wikidata can be seen on this map (huge version, updated daily).
- In a blog entry, Denny Vrandečić explained his understanding of the relation of Wikidata and the truth.
- In other news, further development of Wikidata has been supported through a large donation by the search engine company Yandex.
- The engineering management team continues to update the Deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts. Annual goals for the 2013–2014 fiscal year are currently being drafted.