Jump to content

User:AKlapper (WMF)/Bitergia data quality queries

From mediawiki.org

The data behind wikimedia.biterg.io regularly needs updates to make our metrics reliable. The database can be queried via the Sortinghat Identities API. The database can be edited via the Sortinghat Identities API and via the web interface.

For convenience this page lists GraphQL queries and bash scripts that User:AKlapper (WMF) may occasionally run.

Find accounts which likely should have an affiliation / enrollment

[edit]
  • By potential email address:
    • query { individuals(filters:{term: "@wikimedia.org", isEnrolled:false, isBot:false}, page: 1, pageSize: 100) { pageInfo { page pageSize numPages hasNext totalResults } entities { mk profile { name email isBot } } } }
    • query { individuals(filters:{term: "@wikimedia.de", isEnrolled:false, isBot:false}, page: 1, pageSize: 100) { pageInfo { page pageSize numPages hasNext totalResults } entities { mk profile { name email isBot } } } }
    • query { individuals(filters:{term: "@wikimedia.se", isEnrolled:false, isBot:false}, page: 1, pageSize: 100) { pageInfo { page pageSize numPages hasNext totalResults } entities { mk profile { name email isBot } } } }
    • query { individuals(filters:{term: "hallowelt", isEnrolled:false, isBot:false}, page: 1, pageSize: 100) { pageInfo { page pageSize numPages hasNext totalResults } entities { mk profile { name email isBot } } } }
    • query { individuals(filters:{term: "speedandfunction", isEnrolled:false, isBot:false}, page: 1, pageSize: 100) { pageInfo { page pageSize numPages hasNext totalResults } entities { mk profile { name email isBot } } } }
    • query { individuals(filters:{term: "thisdot.co", isEnrolled:false, isBot:false}, page: 1, pageSize: 100) { pageInfo { page pageSize numPages hasNext totalResults } entities { mk profile { name email isBot } } } }
  • By potential username:
    • query { individuals(filters:{term: "(WMF)", isEnrolled:false, isBot:false}, page: 1, pageSize: 100) { pageInfo { page pageSize numPages hasNext totalResults } entities { mk profile { name email isBot } } } }
    • query { individuals(filters:{term: "-WMF", isEnrolled:false, isBot:false}, page: 1, pageSize: 100) { pageInfo { page pageSize numPages hasNext totalResults } entities { mk profile { name email isBot } } } }
    • query { individuals(filters:{term: "(WMDE)", isEnrolled:false, isBot:false}, page: 1, pageSize: 100) { pageInfo { page pageSize numPages hasNext totalResults } entities { mk profile { name email isBot } } } }
    • query { individuals(filters:{term: "-WMDE", isEnrolled:false, isBot:false}, page: 1, pageSize: 100) { pageInfo { page pageSize numPages hasNext totalResults } entities { mk profile { name email isBot } } } }
  • Look at GitLab accounts and if they should get merged into existing accounts (very cumbersome, see phab:T306770, could manually check email addresses and/or group membership on https://ldap.toolforge.org/user/someusername but does not scale):
    • query { individuals(filters:{isEnrolled:false, isBot:false, source:"gitlab"}, page: 1, pageSize: 100) { pageInfo { page pageSize numPages hasNext totalResults } entities { mk profile { name email isBot } } } }

Queries not possible due to GraphQL limitations

[edit]
  • To identify folks that should have an affiliation set, use hostnames of email addresses of user accounts in the Phabricator database, then re-use those usernames as a condition in a GraphQL query on the Bitergia database.
  • To find duplicate Phabricator accounts which only changed their "Also Known As" (as long as phab:T305230 remains unresolved): Query for mks which share the very same name and both have source:"phabricator" but have different mks.
    • Same applies to any other source which allows renaming accounts.
  • To find accounts with same email addresses to merge: Query for mks which share the very same email but have different mks.

Check detached accounts with same mw and phab usernames if they are connected to merge

[edit]

Expensive / time-intense. See the script and DB commands.

Query all existing Phab accounts about their connected MediaWiki.org accounts

[edit]

Expensive / time-intense because >10000 accounts. See the script and DB commands. (For a cheaper version that requires more manual checking, see phab:T170091.)