Jump to content

Talk:Machine Learning

About this board

A liveblog and forum about machine learning at Wikimedia. Create new topics to ask questions or post updates.

Rooiratel (talkcontribs)

I would like to get ORES support for my wiki (Afrikaans Wikipedia).

But I see that ORES is deprecated in favour of LiftWing.

However I can't find any information on how to add a Wiki to LiftWing.

Please advise if I should be using ORES or LiftWing, and what the correct way of requesting support is.

CAlbon (WMF) (talkcontribs)

Hi Rooiratel! Here are the models hosted on Lift Wing and how to access them using the API. Let me know if this helps or if you are looking for something else!

Rooiratel (talkcontribs)

Thanks @CAlbon (WMF) I'm not looking to use the models personally via an API. I am looking to enable Lift Wing for our whole wiki (Afrikaans Wikipedia).

I.e. to automatically revert damaging edits. How do I enable this for the whole wiki? Is it a MediaWiki extension that we need to install, or how does it work?

CAlbon (WMF) (talkcontribs)

ah got it! Let me talk to the product team that owns that and get back to you.

Rooiratel (talkcontribs)

Thank you so much!

Reply to "ORES vs LiftWing"

I am reworking the team page

3
CAlbon (WMF) (talkcontribs)

I am reworking the layout of the team page. The goal is to made a new section to post my weekly update on the work of team. For this year this update has been an internal update as part of OKRs and Annual Planning, but I realized a few months ago that if I added some additional context to the updates, it would be valuable for the community to see.

HaeB (talkcontribs)

Thanks for reviving these regular public updates, they are valuable indeed!

Regarding context: Yes, including explanations, links etc. is great. Just don't let the perfect be the enemy of the good - even a barebones update that requires some googling (or Phabricator searching) to understand is better than none.

CAlbon (WMF) (talkcontribs)

Thanks, this is good to hear. I'll add some phab links where I can and will try to provide more context.

Reply to "I am reworking the team page"
Vanished user b37280c4674be9897c76da35c38f943b (talkcontribs)

Recently, I have found that on the Wikipedia recent changes list the edit highlighting by ORES has disappeared. Is this because of the new open source infrastructure? Any solutions other than to just wait.

CAlbon (WMF) (talkcontribs)
Vanished user b37280c4674be9897c76da35c38f943b (talkcontribs)

Ok, thanks!

CAlbon (WMF) (talkcontribs)

Hey Seaworlf35, the problem has been fixed and edit highlighting should be visible again.

Vanished user b37280c4674be9897c76da35c38f943b (talkcontribs)

Ok, thank you

Reply to "ORES Revision Scoring"

Exploring Wikidata Anti-Vandalism

1
BrokenSegue (talkcontribs)

I'm looking into the feasibility of designing an anti-vandal bot for Wikidata (potentially using OpenAI tech if I can secure funding). I understand that you have built tools for handling revscoring in the general case but Wikidata items are a special snowflake in some senses. Are there tools being built that would work for Wikidata. [[Machine learning models]] lists "wikidatawiki-goodfaith" which is I assume what I want. But is that publicly usable? Thanks for any direction.

Reply to "Exploring Wikidata Anti-Vandalism"

How is ORES trained?

4
Novem Linguae (talkcontribs)

Howdy folks. 1) Was just wondering how ORES is trained? For example, the "Vandalism" tags in PageTriage's Special:NewPagesFeed. Is there software somewhere where volunteers are presented with various pages and click a "vandalism yes/no?" button? If so where is the software? I'd like to check it out. 2) What are the PageTriage ORES configuration settings such as false positive target rate? I assume that this is a setting that can be adjusted up/down, which I assume is how anti-vandalism bot ClueBot NG achieves such a good false positive rate. I assume it's a tradeoff between false positives, and letting stuff slip through the cracks. 3) Any other hints about how ORES works in relation to PageTriage? I'll probably write some documentation about it. Thanks.

Ponor (talkcontribs)

I believe this is where edits are labeled, per wiki and on some quite old selection of edits. IDK if some other training options are in place. Would love to learn too!

Tzusheng (talkcontribs)

Hey @Novem Linguae and @Ponor, me and some researchers are currently working on a project where we build a system that facilitates curating up-to-date data for training and evaluating ML models used in Wikipedia, including but not limited to ORES. We plan to recruit a small group of people for pilot testing around June. Please let me know if you're interested in participating or learning more about the project. Thanks!

CAlbon (WMF) (talkcontribs)

Hi @Novem Linguae, there is some general information here. The models are sometimes trained using human curated training data (like @Ponor mentioned). Other times, data such as whether an edit was reverted is used. The models just output probability, the thresholds are hardcoded in the mediawiki extension itself.


Additionally, we are planning on deprecating the current ORES/Revscoring models in favor of more modern models such as RevertRisk and Outlink Topic Model which cover multiple languages and take advantage of tool such as BERT. The ORES models will still be available for legacy reasons but we won't be updating them.

Reply to "How is ORES trained?"

Recommendations for (introductory) machine learning course

3
Michael Große (WMDE) (talkcontribs)

Hey 👋

we at the WMDE Wikidata team will soonish (this year?) introduce a new feature that will affect how we evaluate the quality of Items. That means, we will probably need to retrain / update the articlequality model for Wikidata and maybe others.

For that reason in particular, I want to get somewhat deeper machine learning knowledge. I've done Andrew Ng's Introduction to Machine Learning course a few years ago, before we last retrained and extended that model for Wikidata.

So I was wondering if you had any courses you would recommend? Especially for someone like me, who wants to upskill in order to work with / contribute to the articlequality model in particular and WMF/Wikimedia ML infrastructure in general.


Thanks!

CAlbon (WMF) (talkcontribs)

Hey Michael! I don't know about courses, but Sebastian Raschka is probably the best ML educator out there right now. He has a course and a book I believe.

Michael Große (WMDE) (talkcontribs)

Thank you! I will look him up :)

Reply to "Recommendations for (introductory) machine learning course"

Damaging Filter Disappears on the English Wikipedia?

3
Tzusheng (talkcontribs)

I noticed that the damaging filter disappeared from the Recent Changes page on the English Wikipedia this morning (Eastern Time). Only the user intent prediction remains. Is there a rationale or announcement behind this change? Thanks!

CAlbon (WMF) (talkcontribs)

Hi Tzusheng, my apologies on the late reply, I was out of the office. I haven't heard anything about a change and on our end the ML team hasn't change anything. That said also don't see the damaging filter in the Recent Changes page. Let me investigate and get back to you.

CAlbon (WMF) (talkcontribs)
Reply to "Damaging Filter Disappears on the English Wikipedia?"

Machine Learning Modernization Project

1
CAlbon (WMF) (talkcontribs)

Hi All! I'm back from vacation!


After far too long we just published a page on our machine learning modernization work, which includes modern serving, model cards, and other plans. I hope this sparks some interesting discussions with you all about what we are working on and how we can work together.

Reply to "Machine Learning Modernization Project"
CAlbon (WMF) (talkcontribs)

My apologies but no update this week because I have been out sick.

Michael Große (WMDE) (talkcontribs)

Hope you get fully well soon!

Reply to "No Update This Week"

Machine Learning Weekly Update Dec 7, 2022

1
CAlbon (WMF) (talkcontribs)
  • We have hit another milestone for Lift Wing. Our ORES model infrastructure hosts ~110 machine learning models and for the first time all of those models are also publicly available on Lift Wing through the API Gateway. We still need to work out the details of reasonable rate limiting and optimizing performance (maybe use the API Gateway as a simple cache?), but you can access all the models right now without any internal WMF permissions. We will have some tutorials up soon for the community and an ask for testers to help us out.
  • To be clear: if folks are currently using ORES, nothing will change with ORES for at least an entire year. We are working on a complete year-long migration plan for ORES users to Lift Wing that includes outreach, tutorials, and technical support, and that plan will only begin once Lift Wing is officially launched in a few months.
  • After talking to AI ethics experts, wiki community members, WMF staff members, and frankly anyone else who would talk to us, we are working on generating model cards for all models on Lift Wing. The goal is for the model cards to be the main point of contact for questions, discussions, and ultimate governance around machine learning models hosted by WMF. The model cards will be individual articles on Mediawiki.org to make it easy for the community to use the tools they are familiar with. We should have some things to show everyone very soon.
Reply to "Machine Learning Weekly Update Dec 7, 2022"