Wikimedia Release Engineering Team/Checkin archive/20160926
2016-09-26
[edit]Vacations/Important dates
[edit]How to do it: https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Time_off
- Oct 01: Start of Q2
- Oct 05: Morning, few hours, airport run - Tyler
- October 10: US Holiday (Indigenous People's Day https://theintercept.com/2015/10/12/columbus-day-is-the-most-important-day-of-every-year/ )
- October 17-21: Offsite in Washington D.C.
- October 31 & November 4th: Mukunda
- October 28 - Nov 2 (ish) - Chad (vacation to Cabo)
- November 24: US Holiday (Thanksgiving)
- January 9-11: Dev Summit
- January 12-13: All Hands
Team Business
[edit]Time spent spreadsheet
[edit]Rotating positions and absences
[edit]Maniphest query for deployment blocker tasks: https://phabricator.wikimedia.org/u/blockers
weeks of Sep 19 and Sep 26
[edit]- Train: Tyler
- wmf.20
- no deploys week of Sept 26
- SoS: Dan
- Out:
- September 22-23 Željko on a conference
weeks of Oct 03 and Oct 10
[edit]- Train: Tyler
- wmf.21
- [ wmf.22]
- SoS: Chad
- Out:
- October 10: US Holiday (Indigenous People's Day)
Oct 17 and Oct 24
[edit]- Train
- none on Oct 17
Actions from last meeting
[edit]TODO: Antoine write a migration plan for gallium
- lIn my head only. Been busy with wmf.19 explosion / random Zend 5.5 segfault etc.
- Still to do, went syphoned in jobrunner issue / lack of monitoring / bunch of reviews etc
- Do this week
TODO: Talk about release process/strategy first week of Q2 (Oct 3) with Ops (Brandon)
Scrum of Scrums
[edit]- https://phabricator.wikimedia.org/project/board/64/
- Blocked on us: https://phabricator.wikimedia.org/maniphest/query/h7YTCBTJsepS/#R
This week
[edit]- Blocking
- Blocked
- Updates
- New scap (3.3.0)
- scap caches local config for it's deployment (machines don't have to reach back to tin)
- New scap (3.3.0)
Last week
[edit]- Blocking
- Blocked
- Updates
- wmf.19 exploded: https://wikitech.wikimedia.org/wiki/Incident_documentation/20160915-MediaWiki
- Job shelling out to mwscript maintenance/getConfiguration.php
- HHVM writes to its bytecode cache (sqlite file) which fails due to ulimit
- No monitoring of jobs
- Antoine cant find a dashboard of jobs failling (neither in Grafana or Logstash)
- Reminder: no deploys week of Sept 26th
- Changes to beta cluster puppetmaster cherry-pick process Coming Soon™ https://phabricator.wikimedia.org/T135427
- wmf.19 exploded: https://wikitech.wikimedia.org/wiki/Incident_documentation/20160915-MediaWiki
Other Team Business
[edit]- Contint root proposal
- Can we just have an ops person?
- do we know if vendors (Antoine and Željko) are coming to all hands?
- TLDR: yes
- Short term contractors budget
- explicit list, obvs
- time it takes to onboard
- teams need to make this an explicit goal themselves
Offsite
[edit]- Agenda being drafted at https://docs.google.com/document/d/1lmxtQkAuDJY4Vv8oFWihSmhz1y-JgUzsb11ebFCOz6g/edit#
Q1 goal/project check-in
[edit]Phase out Ubuntu Precise
[edit]Replace primary production Continuous Integration host (gallium
) - task T95757
[edit]- Huge delay on figuring out a network lan to host the new machine
- Puppet refactoring mostly done
- Need a migration plan then schedule the switch
Upgrade Phabricator database servers to Maria10/Jessie - task T138460
[edit]- Done
Upgrade Beta Cluster database servers to Maria10/Jessie - task T138778
[edit]- Done
- Gotta shutdown then drop the old instances?
Move Gerrit off of ytterbium - task T125018
[edit]- Done
Reduce Technical Debt
[edit]Perform a technical debt analysis of software and services maintained by WMF Release Engineering - task T138225
- Done
Streamline deployments (long-lived branches)
[edit]keyresult task:
- Convert our production deployment strategy to use long-lived branches - task T89945
project view: https://phabricator.wikimedia.org/project/view/2117/
Non-Quarterly goal work
[edit]CI Scaling/Nodepool
[edit]Browser tests
[edit]- mediawiki_selenium feature to show/capture Selenium WebDriver requests to remote browser T94577 - Firefox v47 breaks mediawiki_selenium T137561 - Update mediawiki_selenium to use Marionette T137540
-> need geckodriver to be packaged (tis a rust app :() or drop Firefox?
Beta Cluster
[edit]- MW apps reimaged to Jessie with ops leading
- web servers (deployment-mediawiki*)
- deployment servers (deployment-mira is primary, deployment-tin02 backup
- jobrunner02 done
- tmh to be done later
Phabricator
[edit]DB Inconsistencies
[edit]https://phabricator.wikimedia.org/T132416 and https://phabricator.wikimedia.org/T104459 (see also: https://www.mediawiki.org/wiki/Development_policy#Database_patches )
People status updates
[edit]Antoine
[edit]Last week
[edit]- Help on wmf.19 issue has I can
- Gallium migration plan
- Overdue :(
- Nodepool upgrade hopefully
- Done (less API queries to OpenStack), follow up from August incident
- Monitored via list of tasks on https://grafana.wikimedia.org/dashboard/db/nodepool (look at 10 days)
- Migrate some jobs hopefully
This week
[edit]- Gallium migration plan
Chad
[edit]Last week
[edit]- Learn to play the ukelele
- Finally looping back on DB consistencies since I have free cycles this week (what I have what?!)
- Wrap up some logging-related cleanups I've been poking
- Yell at Timo re: static files.
This week
[edit]Dan
[edit]Last week
[edit]- Beta DBs
This week
[edit]Mukunda
[edit]Last week
[edit]- Fix Phab permissions issue: https://phabricator.wikimedia.org/T146055
- Hopefully get scap swat stuff code reviewed and deployed
- code reviewed by tyler, I'm addressing his feedback
- Looking into a way of organizing the swat patches that doesn't involve manual wikitext entry on https://wikitech.wikimedia.org/wiki/Deployments
- Made some progress but still figuring this out
This week
[edit]- Finish getting scap swat and cli stuff merged
- Talk with Greg about the automation of deployment blockers, release milestones/tasks, etc.
Tyler
[edit]Last week
[edit]- wmf.19
- scap3 updates (blocking things)
- Code review for llb
This week
[edit]- fixup https://gerrit.wikimedia.org/r/#/c/310719/
- scap3 catchup
Željko
[edit]Last week
[edit]- T94577 mediawiki_selenium feature to show/capture Selenium WebDriver requests to remote browser.
- T137561 Firefox v47 breaks mediawiki_selenium
- T137540 Update mediawiki_selenium to use Marionette
- T145718 CentralNotice: Intermittent unexplained browser test failures
- Testival.eu conference
- EU SWAT
This week
[edit]- MEDIAWIKI_URL may be set to incorrect value in mwext-mw-selenium job T144912
- mediawiki_selenium feature to show/capture Selenium WebDriver requests to remote browser T94577
- Improve documentation around running/writing (with lots of examples) browser tests T108108