Wikimedia Release Engineering Team/Checkin archive/20181015
Appearance
2018-10-15
[edit]Vacations/Important dates
[edit]- Beginning October - Mid october, Antoine to take off some weeks/days/part time (October 1-14 according to https://phabricator.wikimedia.org/E40)
- October 21-28 - Greg in Portland for TechConf+TechMgrs F2F
- November 1 (Thursday) - Holiday (All Saints' Day - Željko)
- November 12th - Holiday (Veteran's Day, Observed)
- November 22+23 - Holidays (Thanksgiving)
- November 25-december 2nd: Mukunda vacation (in California ahead of the offsite)
- Week of December 3rd - Team offsite
- December 24-28 - Holidays (Christmas)
Rotating positions
[edit]Train
[edit]- Maniphest query for deployment blocker tasks: https://phabricator.wikimedia.org/maniphest/?project=PHID-PROJ-fmcvjrkfvvzz3gxavs3a&statuses=open%28%29&group=none&order=newest#R
- Oct 08 - wmf.25 - Dan (No train due to DC switchover)
- Oct 15 - wmf.26 - Mukunda <---- (last 1.32 wmf.XX release, 1.33 starts the next week)
- Oct 22 - wmf.1 - Mukunda (warning, TechConf happening, ping Greg if you need responses from anyone there...)
- Oct 29 - wmf.2 - Tyler
- Nov 05 - wmf.3 - Tyler
- Nov 12 - wmf.4 - Antoine
- Nov 19 - wmf.5 - No Train (Thanksgiving)
- Nov 26 - wmf.6 - Antoine
- Dec 03 - wmf.7 - No Train (Offsite)
- Dec 10 - wmf.8 - Zeljko
- Dec 17 - wmf.9 - Zeljko
- Dec 24 - wmf.10 - No Train (Holiday break)
- Dec 31 - wmf.11 - No Train (Holiday break)
- Jan 07 - wmf.12 - Dan
- Jan 14 - wmf.13 - Dan
- Jan 21 - wmf.14 - Mukunda
- Jan 28 - wmf.15 - No Train (All Hands)
- Feb 04 - wmf.16 - Mukunda
- Feb 11 - wmf.17 - Tyler
- Feb 18 - wmf.18 - Tyler
- Feb 25 - wmf.19 - Antoine
SoS
[edit]- Oct 10 - Zeljko
- Oct 17 - Zeljko <----
- Oct 24 - Zeljko
- Oct 31 - Zeljko
- Nov 07 - Zeljko
- Nov 14 - Zeljko
- Nov 21 - Zeljko
- Nov 28 - Zeljko
- Dec 05 - Zeljko
- Dec 12 - Zeljko
- Dec 19 - Zeljko
- Dec 26 - Zeljko
- Jan 02 - Zeljko
- Jan 09 - Zeljko
- Jan 16 - Zeljko
- Jan 23 - Zeljko
- Jan 30 - Zeljko
- Feb 06 - Zeljko
- Feb 13 - Zeljko
- Feb 20 - Zeljko
- Feb 27 - Zeljko
Team Business
[edit]Hiring
[edit]- Software Engineer position open and reviewing/hiring for now
"all candidates are good at being them" - Greg
First Offsite
[edit]Details:
- Week of December 3rd
- At the Queen Mary hotel in Long Beach
- Deb T will be facilitating
Topics!
Needs attention
[edit]- gerrit security release 2018-10-08
- https://groups.google.com/forum/m/#!topic/repo-discuss/eH0iLt2XawU
- jGit update, we are unaffected
- may want to hold off until next week: https://bugs.chromium.org/p/gerrit/issues/detail?id=9836
- 2018-10-15 -- paladox tells me they're working on a fix and should have a 2.15.6 tagged Soon™
Scrum of Scrums
[edit]- Greg to copy to etherpad after meeting: https://etherpad.wikimedia.org/p/Scrum-of-Scrums
Release Engineering
[edit]- Blocked by:
- (MediaWiki-General-or-Unknown) T207288 Text in the Sidebar does no longer show the message text, only the message name
- Blocking:
- Fundraising Tech: CRM tests still regularly failing due to full mysql partition on integration hosts. Possible fix noted by Eileen on https://phabricator.wikimedia.org/T205950
- Updates:
- Interviewing on-going for our Developer Productivity position: https://boards.greenhouse.io/wikimedia/jobs/1225258?gh_src=f15731e11
- Train Health:
- Last week: no train, datacenter switchover T191071 1.32.0-wmf.25 deployment blockers
- This week: the last 1.32 release T191072 1.32.0-wmf.26 deployment blockers
- Still open/blocking: T207288 Text in the Sidebar does no longer show the message text, only the message name (MediaWiki-General-or-Unknown)
- Resolved: T207220 AFComputedVariable.php: Argument to getLinksFromDB() must be an instance of Article - the cause has been identified and reverted.
- Next week: 1.33 starts the next week - T191072 1.32.0-wmf.26 deployment blockers
- Log Health:
- T204871 Deployments of MediaWiki with scap cause a spam of "web request took longer than 60 seconds and timed out"
- Code Health:
- Metrics group meets weekly
- T207046 Code health metrics spike
Callouts
[edit]- Release Engineering
- Train blocked: (MediaWiki-General-or-Unknown) T207288 Text in the Sidebar does no longer show the message text, only the message name
Train status and happenings
[edit]- no train last week
Quaterly Goals for Q2
[edit]TEC1 (Maint): Outcome 1 / Output 1.1
[edit]- GOAL: Determine the procedure and requirements for an automated MediaWiki branch cut.
- WHO: Mukunda, Tyler, Antoine
- No update this week
- Need to decide where to keep JJB/whether or not to use JJB
- TODO: thcipriani to create task to discuss with relevant folks
TEC3 (Pipeline): Outcome 1 / Output 1.2
[edit]- GOAL: Formalize the collection of CI infrastructure and tooling metrics
- WHO: Dan, Antoine
TEC3 (Pipeline): Outcome 2 / Output 2.3
[edit]- GOAL: Develop set of metrics to assess incident reports/post mortems - task T206622
- WHO: Greg, Zeljko
- nothing this week
TEC3 (Pipeline): Outcome 3 / Output 3.1
[edit]- GOALS:
- Adopt more services into Deployment pipeline - task T205919
- Migrate graphoid to the Deployment pipeline
- Deploy zotero v2 to the Deployment pipeline
- Deploy blubberoid
- Adopt more services into Deployment pipeline - task T205919
- WHO: Dan, Tyler, Lars
- Zotero v2 blocked on the new release of blubber: https://phabricator.wikimedia.org/T206766
- Alexandros said he'd look at it this week
- Pairing with Lars to get it setup...maybe?
- thcipriani: will schedule pairing session for CD pipeline setup for zotero v2 on Friday
TEC12 (DevProd): Outcome 2 / Output 2.1
[edit]- GOAL: The Annual Developer Productivity Survey results are synthesized and shared, creating a first year baseline.
- WHO: Mukunda, Greg
- Legal wants to know about mailing lists and annonymized results.
- Greg: please respond to the email to confirm that I've got the details correct.
- subject: Developer Productivity Survey - Privacy Statement Request
TEC13 (Code Health): Outcome 1 / Output 1.1
[edit]- GOAL: Update/refresh review queue (review process for initial code deployment)
- WHO: JR
- task breakdown activities
TEC13 (Code Health): Outcome 2 / Output 2.2
[edit]- GOAL: 5 of the 15 prioritized repositories have at least 1 end-to-end test - task T206621
- WHO: Zeljko
- no activity last week
TEC13 (Code Health): Outcome 2 / Output 2.3
[edit]- GOAL: Assess Platform unit test practices and define improvement plan
- WHO: JR, Core Platform Team
- no activity
TEC13 (Code Health): Outcome 3 / Output 3.2
[edit]- GOAL: Core Platform and Search Platform teams are using TDM PoC
- WHO: JR, Core Platform Team
- no activity
TEC13 (Code Health): Outcome 3 / Output 3.4
[edit]- GOALs:
- Identify key Tech Debt areas
- Put in place Tech Debt management process for PEP
- WHO: JR, Core Platform Team
- no activity
TEC13 (Code Health): Outcome 4 / Output 4.1
[edit]- GOAL: Metrics defined and deployed for all 4 Code Health areas.
- WHO: JR, Code Health Metrics Working Group
- no meeting last week
- WG members made some progress async
- sharing information about various tool.
- updated tasks with more core metric candidates
Other work
[edit]Selenium
[edit]- T206624 Q2 Selenium framework improvements
- T179188 Video recording for Selenium tests in Node.js
- Waiting for clarification on code review feedback, I'm not sure what to do :| https://gerrit.wikimedia.org/r/c/mediawiki/core/+/422933
- T179188 Video recording for Selenium tests in Node.js
- T206640 selenium-daily-beta-Popups CI job failing - will debug
Gerrit
[edit]- Upgrade gerrit to
2.15.42.15.5- may want to hold off until next week: https://bugs.chromium.org/p/gerrit/issues/detail?id=9836
Phabricator
[edit]- Need to get phab1002 ready with Daniel
Jenkins
[edit]QA
[edit]SCAP
[edit]Still investigation: https://phabricator.wikimedia.org/T121597#4652873
- "probably not scap", maybe eval.php or some with stderr
Standup!
[edit]Antoine
[edit]- What I plan to do this week
- I am 40 on this October 15th (It is also my 16th wikipedia birthday)
- 7 year hire-anniversay on Wednesday, too :)
- Doc about overhauling MediaWiki testing
- Get back to phasing out Nodepool (really the vast bulk of it is done)
- (done) Formally meet/chat with Lars
- I am 40 on this October 15th (It is also my 16th wikipedia birthday)
- What I'm blocked on
- (done) 700+ emails to triage
- Other?
- More wall painting
Dan
[edit]- What I plan to do this week
- Finish up Blubberoid Swagger spec
- Get back on integration-prometheus now that CI is stable
- What I'm blocked on
- Other?
Greg
[edit]- What I plan to do this week
- Get back to Petaluma on Tuesday night
- respond to legal re survey -- done
- lots of last minute TechConf planning work
- Quarterly Reviews tomorrow
- probably some follow-up on l10nupdate, probably
- What I'm blocked on
- Other?
Jean-Rene
[edit]- What I plan to do this week
- interviews
- QCI prep/prez
- QA strategy stuff
- Update/refresh review queue
- Metrics WG task creation/breakdown
- What I'm blocked on
- Other?
Lars
[edit]- What I plan to do this week
- Learn how the deployment pipeline currently works. IIUC it deployes one microservice to Kubernetes fow now.
- Also how it's meant to work.
- Find and review any documentation relvant to this.
- What I'm blocked on
- Lost on a sea of accounts and services.
- Other?
Mukunda
[edit]- What I plan to do this week
- Train
- More troubleshooting of the scap pre-deploy fatal check.
- Dev Productivity survey
- Developer productivity interviews at 9:00 AM on Monday and Tuesday
- What I'm blocked on
- Legal: need privacy statement
- Other?
Tyler
[edit]- What I plan to do this week
- pairing on Zotero v2 pipeline
- probable gerrit upgrade this week
- further troubleshooting of scap initial check
- Carry over from last week
- Releases-jenkins icinga stuff
- Moar keyholder review
- Docs for ORES github sync problem (with heavy disclaimer)
- What I'm blocked on
- Other?
Zeljko
[edit]- What I plan to do this week
- T199133 Find top 15 target projects that could use Selenium tests to prevent incidents
- T204068 QA: Automation Testing - port Echo Notification tests to Node.js
- "60 seconds" task https://phabricator.wikimedia.org/maniphest/query/bUA0dYsX1iBb/#R
- What I'm blocked on
- Other?
- T207018 RuntimeError: scap failed: average error rate on 4/11 canaries increased by 10x
Grooming
[edit]Team Kanban Board Review and Triage
[edit]- closed and touched in the 7 days
- No update for 4 weeks
- No update for 3 weeks
- No update for 2 weeks
- No update for 1 week
- All Open
- Review To Triage column of #releng
Once / month-ish review of backlog(s)
[edit]- releng Review To Triage column of #releng
- releng-kanban Review unassigned in kanban
- releng-kanban Review 'backlog' colum of -kanban
- releng-next - Review for things we need to put on our kanban backlog
- releng-backlog - oh my, the huge backlog of things...