Wikimedia Release Engineering Team/Checkin archive/20180115
Appearance
2018-01-15
[edit]Vacations/Important dates
[edit]- Jan 15 (Mon): Martin Luther King Day (All US Staff)
- Jan 22/23: Dev Summit
- Jan 24: Tech Management F2F
- Jan 25/26: WMF All Hands
- Jan 29-31: Team offsite
- Feb 19 (Mon): President's Day (All US Staff)
- Mar 30 (Fri): WMF Holiday
- April 14 (Fri): WMF Holiday
- May 15?/16/17: Team offsite in Barcelona
- May 18-20: Wikimedia Hackathon in Barcelona
- May 21 (Mon): Tech-Mgt F2F
Team Business
[edit]Rotating positions and absences
[edit]Maniphest query for deployment blocker tasks: https://phabricator.wikimedia.org/maniphest/?project=PHID-PROJ-fmcvjrkfvvzz3gxavs3a&statuses=open%28%29&group=none&order=newest#R
Jan 15 and Jan 22
[edit]- Train: Tyler
- SoS: Mukunda
- Out
- Jan 15 (Mon): Martin Luther King Day (All US Staff)
- Jan 22/23: Dev Summit
- Jan 24: Tech Management F2F
- Jan 25/26: WMF All Hands
Jan 29 and Feb 05
[edit]
Feb 12 and Feb 19
[edit]Actions from last meeting
[edit]Scrum of Scrums
[edit]- Greg to copy to etherpad after meeting: https://etherpad.wikimedia.org/p/Scrum-of-Scrums
This week
[edit]Release Engineering
[edit]- Blocking
- None?
- Blocked
- "Stack overflow when Redis is down" - https://phabricator.wikimedia.org/T185055
- Need help from Operations and/or Performance
- "Stack overflow when Redis is down" - https://phabricator.wikimedia.org/T185055
- Updates
- Catching up the train this week and rolling out the last version before DevSummit/All Hands and RelEng team offsite weeks. [wiki[email]]
- We moved Wednesday morning’s SWAT window 1 hour earlier (to 10am) to give us an hour break before the new MW version rolls to second set of wikis (all non-wikipedias) which was a follow-up from a recent post-mortem. [wiki][email]
- We broke git-fat deploy repos in scap (old config no longer valid), workaround/fix available in all relevant repos.
- https://phabricator.wikimedia.org/T184882#3899710
- (Yes, we’re re-doing how the CI for scap is done, see: https://phabricator.wikimedia.org/T184628 )
- Updated the Debian packaging for Zuul (CI task scheduler) and released 2.5.0-8-gcbc7f62-wmf6, unblocking an upgrade of Gerrit.
- Converted our home-grown docker image builder to `docker-pkg` from Giuseppe
- Getting started with the basics of planning our team offsite pre Barcelona Hackathon. Submitted travel request form and let eng-admin@ know.
- Working on browser tests with Search (“selenium-CirrusSearch-jessie daily Jenkins job”).
Last week
[edit]- Blocking
- None?
- Blocked
- ops: zuul package update (blocks gerrit upgrade)
- ops: node-tunnel-agent package update (blocks moving node testing to docker in ci)
- Updates
- 2 weeks of normal MediaWiki deploys (this and next) followed by 2 weeks of no MediaWiki but SWATs as needed (DevSummit/All Hands followed by RelEng team offsite)
- Currently building nightlies of Mediawiki on the new “releases” (aka non-CI) Jenkins host. Working with Security on best way to handle security patches (which is the goal, to ensure security patches stay cleanly applicable).
Puppet SWAT
[edit]- list of patches you want to submit to Puppet SWAT
Logspam \ Last week's train updates
[edit]- Train was rolled back on thursday due to a critical bug introduced by the new Revision storage infrastructure: https://phabricator.wikimedia.org/T184749
** The problem was apparently fixed on Friday but it missed the window of opportunity for deploying the fix during the week. ** Monday was a US holiday ** Therefore, 1.31.0-wmf.16 is finally to be deployed on Tusday January 16th just as wmf.17 is being cut from master.
Other Team Business
[edit]Q3 goal/project check-in
[edit]- All of it in table form: https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Goals/201718Q3
Quarterly Goals
[edit]Program 1: Outcome 5: Milestone 1: Develop and migrate to a JavaScript-based browser testing stack
[edit]- Due: End of this quarter
- What: Specific improvements to the now canonical framework, see: task T182421, notably:
- Upgrade webdriverIO to version 4.9
- Investigate replacing nodemw with mwbot
- Video recording for Selenium tests in Node.js
- Task: task T182421
- T175179 Create selenium-CirrusSearch-jessie daily Jenkins job
- Talked with David Causse and we decided that he will rewrite smoke tests (just a few tests in one file) from Cucumber to Mocha.
- The WIP commit is created https://gerrit.wikimedia.org/r/#/c/381785/
- Job configuration is ready: https://gerrit.wikimedia.org/r/#/c/398030/
- Test job is created and working: https://integration.wikimedia.org/ci/view/Selenium/job/selenium-CirrusSearch-jessie-381785/
Program 1: Outcome 5: Objective 1: Maintain existing shared Continuous Integration infrastructure
[edit]- Goals
- Draft requirements for a Kubernetes based solution for CI - task T183513
- Migrate MediaWiki PHPUnit tests to Shipyard (docker-based CI) (~40% of Nodepool usage) - task T183512
- Unify production and CI docker image build process - task T177276
- <insert here>
Program 3: Outcome 1: Objective 2: Identify and find stewards for high-priority/high use code segment orphans
[edit]- Due: End of quarter
- task T174091
- SLAs defintion
- the SLA structure is currently dependent on task/bug priority. As this is not a consistently used attribute, basing SLAs on it could be problematic. Started dialog with Andre to see what alternatives we might have (such as severity) to segment the bugs into manageable sizes.
- Stewardship definition is being reviewed by Toby and Victoria. Desire is to get their support in rolling this out across WMF.
Program 3: Outcome 2: Objective 2: Define and implement a process to regularly address technical debt across the Foundation
[edit]- Due: End of quarter
- task T174095
- restarted work on TechDebt blog post series. Targeting 1/18 for review and following week for publish.
Program 3: Outcome 2: Objective 3: Promote and surface important technical debt topics at large gatherings of Wikimedia developers (e.g., DevSummit and Hackathon(s))
[edit]- Due: End of next quarter
- task T174096
no progress
Program 6: Outcome 2: Objective 2: Set up a continuous integration and deployment pipeline
[edit]- Due: End of this quarter
- Keyword: SSD
- phab project: https://phabricator.wikimedia.org/project/view/2453/
- Goal:
- Verify basic functionality of 'production' deployment and image (initially targeting mathoid):
- Functional PoC within integration in the deployment-pipeline
- Deploy to isolated k8s
- Verify basic functionality of 'production' deployment and image (initially targeting mathoid):
- Minikube packaging going slowly but happening (may need to pair with Dan at some point)
Quaterly non-goal "Work"
[edit]Program 1: Outcome 1: Objective 1: Scap (Tech Debt Sprint FY201718-Q2)
[edit]
Program 1: Outcome 5: Objective 1: Maintain existing shared Continuous Integration infrastructure
[edit]Program 1: Outcome 6: Milestone 1: Maintain Gerrit
[edit]Program 1: Outcome 6: Milestone 2: Maintain Phabricator
[edit]- Streamline logspam workflows by adding some integration with phabricator
- Store git-lfs (and other phab uploads) in swift: task T182085
Program 1: Outcome 5: Objective 1: MW Nightlies server
[edit]Other work
[edit]*New service reviews and the review queue **One of the outcomes of a recent post portem review meeting was the desire to better understand what we currently due to review new components/extensions/services prior to their first deployment to production. In addition to the initial review, I am also investigating what ongoing reviews are done to deployed components/extensions/services. *** Started conversation with Marko and Daniel on this topic. Goal is to see if "active stewardship" should be one of the pre-deployment requirements.
Grooming
[edit]Team Kanban Board Review and Triage
[edit]- closed and touched in the 7 days
- No update for 4 weeks
- No update for 3 weeks
- No update for 2 weeks
- No update for 1 week
- All Open
- Review To Triage column of #releng
Once / month-ish review of backlog(s)
[edit]- releng Review To Triage column of #releng
- releng-kanban Review unassigned in kanban
- releng-kanban Review 'backlog' colum of -kanban
- releng-next - Review for things we need to put on our kanban backlog
- releng-backlog - oh my, the huge backlog of things...