Wikimedia Release Engineering Team/Checkin archive/2023-08-30
Appearance
[edit]
π Wins
[edit]- https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Monthly_notable_accomplishments
- Aug '23 edition
- Developer Satisfaction Survey got presented
- Gerrit repo archiving script for GitLab migrations \o/
- Dan's back!
- Gerritlab adoption
- JWT auth changes
- T272693 - reviewed non-standard phabricator policies
- Downstream phabricator patches for php8 + logspam
- Upstream phorge patches for logspam
- Overwrote feed transaction default query in conduit (T344232#9092848)
- Scap3 can now be configured to disable the service on secondary hosts: https://phabricator.wikimedia.org/T343447
- Kokkuri is now using the new gitlab id tokens: https://phabricator.wikimedia.org/T337474
- We're on Phorge (assuming it sticks)
- Gitlab CI-built kask container image deployed today. (https://phabricator.wikimedia.org/T335691)
- Gitlab local hacks in progress
- Ahmon passed his CKA! Read Kubernetes in action
- Merged 3 fixes to Phorge upstream for phab logspam
- π Delayed announcement: Jeena's back, and she's a senior software engineer
- Blubber refactor ripping out dockerfile passing acceptance testsβstraight to llb
- Added another pool to our DO cloud runner pullβmemory-optimized
- Refactored the patch to tune-down staging substatually(sp?)
- Now there are 4 runner-controller runners running + 4 nodes ready to go
- GerritLab commits merged to speed up sending patches and does the right things given GitLab's weirdness
- scap backport bugfix
OKR update
[edit]Last week
[edit]The six questions I answer week-by-week about our work. This is pretty much all CTPO/VP/Director-types see for what we're doing. If there are specific things to call out here, let's do.
On track
- Progress update on the hypothesis for the week
- GitLab (Pipeline Services Migrationπ€) workboard for GitLab Pipeline Services β shows all services that can move to GitLab today. Tagged with teams (where available)
- T335691 β Migrate mediawiki/services/kask to GitLab deployed this week (need to archive the old repository and we're done)
- T300819 β Speed up stacked merge requests spent some time optimizing push requests for stacked patchsets in GitLab
- Any new metrics related to the hypothesis
- Repositories on Gerrit decreased (2022 last week β 2023 this week)
- Any emerging blockers or risks
- Finding teams to steward migration for some services may be tricky
- Needs more investigation, but Timo's comment about services previously stewarded by the Platform team is a good example of the challenges (T344739#9119148)
- Any unresolved dependencies - do you depend on another team that hasnβt already given you what you need? Are you on the hook to give another team something you arenβt able to give right now?
- No
- Have there been any new learnings from the hypothesis?
- No
- Are you working on anything else outside of this hypothesis? If so, what?
- Migrated our Phabricator installation to Phorge as an upstream, now working on bugfixes and features there.
- T344754 - Concurrently running Selenium tests end up captured in the same video causing confusion
- gerrit:949986 Provide a secondary database in MediaWiki test suite (quibble)
- Zuul migration from Buster to Bullseye
- π MediaWiki 1.41.0-wmf.23
- 678 Patches βββββ in 187 repos by 58 authors
- 0 Rollbacks βββββ
- 0 Days of delay βββββ
- 1 Blockers ββ ββ β
This week
[edit]π» Open source/Upstream contributions
[edit]πΆ Let's keep these empty
[edit]Code review
[edit]Gerrit Access requests
[edit]Private repo requests
[edit]https://phabricator.wikimedia.org/search/query/E7t2_WXX01bB/#R
Gerrit repo requests
[edit]GitLab Access requests
[edit]High priority tasks
[edit]- UBN! + High: https://phabricator.wikimedia.org/maniphest/query/PkxR1BXrbbU4/#R
- New in inbox: https://phabricator.wikimedia.org/maniphest/query/7vRDrcVnt8OI/#R
π Vacations/Important dates
[edit]- https://office.wikimedia.org/wiki/HR_Corner/Holiday_List#2023
- https://wikitech.wikimedia.org/wiki/Deployments/Yearly_calendar
- https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Time_off
August 2023
[edit]- 09 Wed: International Day of the World's Indigenous Peoples, US staff with reqs
- 11 Fri: Brennen out for Folks Fest (?)
- 7-11 Mon-Fri: Dan out for family vacation
- 31 Mon Jul β 21 Mon Aug β Antoine
- 23 Fri Junβ18 Fri Aug: Jeena β Mongolia :D :D :D
- 24 Augβ04 Sep: Brennen (π₯)
- 27AugSun β 31AugThu: Andre
September 2023
[edit]- 04 Sep: Labor day (US Staff with reqs)
- 26 Augβ05 Sep: Brennen (π₯)
- 13 Wedsβ17 Sun: Brennen β KS (approximate)
October 2023
[edit]- 2-16 Oct: Jaime
Future
[edit]- 15Jan - 15Mar: Andre
π₯π Train
[edit]- https://tools.wmflabs.org/versions/
- https://train-blockers.toolforge.org/
- https://wikitech.wikimedia.org/wiki/Deployments/Yearly_calendar
- 2 Jan - wmf.17 - Dan + Antoine (Jaime out)
- 9 Jan - wmf.18 - Jeena + Dan (Jaime out)
- 16 Jan - wmf.19 - Jaime + Jeena
- 23 Jan - wmf.20 - Brennen + Jaime
- 30 Jan - wmf.21 - Ahmon + Brennen
- 6 Feb - wmf.22 - Chad + Ahmon
- 13 Feb - wmf.23 β Dan + Chad
- 20 Feb - wmf.24 β Antoine + Dan
- 27 Feb - wmf.25 β Jaime + Antoine
- 6 Mar β wmf.26 β Jeena + Jaime
- 13 Mar β wmf.27 β Brennen + Jeena
- 20 Mar β wmf.1 β Ahmon + Brennen
- 27 Mar β wmf.2 β Chad Dan + Ahmon
- 3 Apr β wmf.3 β Antoine + Dan
- 10 Apr β wmf.4 β Chad + Antoine
- 17 Apr β wmf.5 β Jaime + Chad
- 24 Apr β wmf.6 β Jeena + Jaime
- 1 May β wmf.7 β Brennen + Jeena
- 8 May β wmf.8 β Antoine + Brennen (Ahmon out + Antoine Out 8th)
- 15 May β wmf.9 β Ahmon + Antoine (Dan out + Chad out)
- 22 May β wmf.10 β Chad + Ahmon (Dan out + Jeena out 26th)
- 29 May β wmf.11 β Dan + Chad (Memorial Day 29th)
- 5 Jun β wmf.12 β Jeena + Dan (Brennen out, Jaime out)
- 12 Jun β wmf.13 β Jaime + Jeena
- 19 Jun β wmf.15 β Cancelled for offsite
- 26 Jun β wmf.16 β Brennen + Jaime (Jeena out)
- 3 Jul β wmf.17 β Antoine + Brennen (3rd + 4th holidays)
- 10 Jul β wmf.18 β Dan + Antoine (Ahmon out)
- 17 Jul β wmf.19 β Ahmon+Dan (Brennen out Friday)
- 24 Jul β wmf.20 β Jaime+Ahmon
- 31 Jul β wmf.21 β Ahmon+Jaime (Jeena out, Antoine out) (Ahmon volunteered)
- 7 Aug β wmf. 22 β No train
- 14 Aug - wmf.23 β Ahmon+Jaime (Jeena out, Antoine out)
- 21 Aug - wmf.24 β Dan(brennen out, Jeena out, Antoine out)
- 28 Aug β wmf.25 β Jeena+Dan
- 04 Sep β wmf.26 β Antoine+Jeena+Andre as lurker!
- 11 Sep β wmf.27 β +Antoine
- 18 Sep β wmf.28 β Brennen+
- 25 Sep β wmf.29 β
Team discussions
[edit]- Let's do this! https://phabricator.wikimedia.org/T264231
DO runner pools
[edit]= a no-stupid-quesitons fireside chat with Dan Duvall
[edit]- What's this change? https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/commit/a6eab36c860b77c5855afcf34a5bab08a2d6e8b8
- What pools of runners do we have?
- Two runner pools
- Four different node pools
- General pool: unoptimized, standard DO dropletsβcontrol plane + controllers
- Nginx controller
- Runner controller (GitLab thing that schedules pods)
- Anything except buildkit
- Runners: cpu-optimized/memory-optimized
- K8s tainted by default: this repels workloads, so unless you explicitly tolerate this taint, you will not be scheduled here
- CPU-optimized: I tolerate the taint workload=cpu, counteract the tolerant of the taint
- Memory-optimized: I tolerate workload=memory
- IOps-optimized: buildkit runs here, nothing runs there except buildkitdsβimage building is io-intensive
- General pool: unoptimized, standard DO dropletsβcontrol plane + controllers
- How do we control which jobs go to which pool?
- Both cpu-optimized + mem-optimized grab untagged jobs
- If a job includes `cpu-optimized` then that will only go to the cpu-optimized pool; same for memory-optimized
- People control where there job lands by adding one of those tags
- Scaling
- We messed with a horizontal pod autoscaling
- There is no pod autoscaling for the runners
- GitLab runner is polling for jobs and spawning pods and there's a concurrency setting there
- gitlab-cloudrunner project has terraform template with this setting
- `concurrency` and/or `limit` setting in runner yaml file.
- https://gitlab.wikimedia.org/repos/releng/gitlab-cloud-runner/-/blob/main/gitlab/gitlab-runner-values.yaml.tftpl
- Nodepool will respond by vertically scaling by spinning up new nodes (which is currently slow)
- Takes 3 mins
Next time: Resource request limits in k8s