Jump to content

Talk:Page Curation

About this board

Johnywhy (talkcontribs)

Hi,

Any movement toward making this tool compatible with any MediaWiki?

Reply to "General Release?"

Marked for deletion log

1
Summary by MarcoAurelio
Spinningspark (talkcontribs)

"This functionality works regardless of how the template is injected into the page". However, this does not work on any method of nomination other than page curation. There is a phabricator item open on this where it is said that this (Mediawiki) page is wrong.

Enabling the feature at other Wikipedia

11
Summary by 197.218.89.94
MarcoAurelio (talkcontribs)

I have done some use of this feature at the English Wikipedia and I wonder how hard would be to get this tool ported to the Spanish one. We currently lack curation tools given that the scripts we used for that are mostly broken, and their mantainers no longer active/around. Thank you.

Ymblanter (talkcontribs)

Do you have community consensus that tool is needed? I am all for it, but in the English Wikipedia we discovered that associated backlogs may go unontrollably high and are difficult to confine.

Kudpung (talkcontribs)

It's not been proven that the backlogs have anything to do with the page curation tools. In fact the system was developed in response to an issue concerning backlogs. The current backlog is still unacceptably high but it's not the 39,000 it was in 2011 before the tool was introduced. Since introducing the New Page Patroller group in Nov 2016, the backlog has gone down even more. From around 22,000 to 'only' around 16,000 as of today.

Ymblanter (talkcontribs)

Absolutely, but it still requires considerable community attention. Some communities can choose to not care and prioritize their activities differently.

MarcoAurelio (talkcontribs)

@Ymblanter I wanted to know if as of today this is possible. I see some threads below saying that this is too enwiki designed that might not be possible to "universalize". Before proposing it to the community, it'd be good first if this could be actioned, otherwise I'd have wasted the community time to vote into something that cannot be done :)

MarcoAurelio (talkcontribs)

I guess the sumary given above by 197.218.89.94 means that it is not possible for now.

Kaldari (talkcontribs)

Some parts of Page Curation are easy to port to other wikis, while some parts are not. The main part that is difficult is the deletion handling. English Wikipedia has 3 different deletion workflows: AfD, speedy deletion, and PROD. All 3 are supported by Page Curation, but due to the complexity a lot of that support is hard-coded into the software and would need to be modified for other wikis' deletion workflows. Almost everything else about Page Curation (like all the maintanence tags and notification templates) are configurable on-wiki and thus wouldn't be that difficult to port.

MarcoAurelio (talkcontribs)

Thank you @Kaldari this is helpful. As for eswiki we've also got speedy deletion, PROD and AfD. I guess what would be more needed would be the speedy deletion tags. If the other tags can be configured on-wiki then maybe I could propose this to be enabled there. However looking at the Phabricator board there are a lot of other wikis waiting for this feature as well.

It'd be good if we could develop a wiki agnostic PageTriage extension which could be configured on-wiki for its most part.

Kudpung (talkcontribs)
MarcoAurelio (talkcontribs)

Hi @Kudpung Yes we've got Twinkle there, however it has never worked properly for years. I don't know if it is because we're using outdated code, lack of maintenance or both. Probably the later. I shall maybe have a look at how to update the whole Twinkle code we use at eswiki.

At <https://en.wikipedia.org/wiki/Wikipedia_talk:Twinkle/Archive_38#Using_it_at_other_Wikipedia> I requested some time ago guidance to properly port the tool. I'll repeat the request on GitHub and see if we can do something with it.

MarcoAurelio (talkcontribs)
Reply to "Enabling the feature at other Wikipedia"

A button for returning the new page feed page.

1
Mys 721tx (talkcontribs)

It will be nice to have such button so curators can go back to the full list if they decided to skip some of the new pages.

Modifying Page Info on disambiguation pages

3
Swpb (talkcontribs)

It's usually a good thing that Page Info lists "No citations" under "Possible issues" – but not in the case of disambiguation pages, which are explicitly not supposed to have citations. Could the Page Curation toolbar be modified to prevent this notice from appearing on any page containing a disambiguation tag (of which there are several, including aliases)?

Nemo bis (talkcontribs)
Quiddity (talkcontribs)

Filed as phab:T76198 (but will need a volunteer, as nobody is currently officially assigned to work on that extension)

Kozuch (talkcontribs)

Is there some labs demo or beta to test?

Reach Out to the Truth (talkcontribs)

Keep an eye on the PageTriage extension. It doesn't do much more than print "Hello World" at the moment, so there's nothing to try out.

Jorm (WMF) (talkcontribs)

So, we've been talking about the list of New Pages as a queue. People either work from the front or the back of the queue, as it were. I want to step outside of that thinking and propose a different system because there are several problems with it (not the least of which being that people edit conflict each other when working on the same side of the queue).

Let us instead think of putting new pages into a "stack". This is programmer terminology, so I'm loathe to use it, but it's a more accurate term. A "queue" is a stack, but it's got a specified direction: "First in, first out" (FIFO), or "Last in, first out" (LIFO). Currently, patrolling from the front of the queue is a FIFO process, and patrolling from the back is a LIFO process.

We don't have to be bound by this idea. We can create unique stacks per user. We can also do things like "filter by categories" or other metadata that might be easy to pick up (and, in the future, with better metadata tags, get even fancier). Unique stacks would also help to alleviate edit conflicts and double patrolling.

Consider a system like the following:

Say that there are 1,000 articles in the total queue. They are (naturally) ordered by creation date. We have five people who start "Patrolling".

For each of those five people, the system will randomly select 20 articles from the entire queue, shuffle the order, and then present them to the Patroller. No two people will get the same articles as long as they are in the same "patrolling session". So if I start a session, the system may assign me articles #45 as part of my personal stack. As long as I'm in the session, no one else will be assigned article #45. If I end the session, or close it, or skip #45, it will be put back into the main pool and can be assigned to someone else.

It's like a deck of cards. Currently, we only ever see the cards in order: AH, 2H, 3H, 4H, 5H, etc. Let us instead shuffle the deck. When you deal hands in poker, no two people will ever have the same cards (unless, you know, someone is cheating). Further, we're guaranteed a more equitable distribution of attention across the entire queue.

Patrollers could still choose to focus on the front or the back (FIFO, LIFO), obviously. They could also create "stacks" based around categories or namespaces. It may be possible to do stack creation based around WikiProject but I think that's too much to hope for.

Thoughts?

Kudpung (talkcontribs)

I certainly understand the function you are describing. However, we have to consider that very few volunteers actually do a 'session' in the way tasks would be allocated in an office environment. Even I, who works from a home office, often spends hours on end at NPP, relish the freedom to dodge around the site doing things such as working my own articles and doing other random editing, admin tasks, and doing other Wikipedia project work. When doing NPP people stop to discuss issues with each other such as for example today's conversation - we may even have a quick video chat to discuss some particularly difficult issues. To that there is the freedom of doing anything else in RL that comes along.

The same applies to the way we work for example at OTRS - and that's why I suggested a system similar to ticket enquiry software solutions. When you want, you grab a page and then it's blocked for you until you've dealt with it. Nevertheless, we must not forget that we rarely have more than 7 patrollers working at any time at the front, and perhaps one or two working at the back of the queue. Other people pop in and simple patrol a page or two on the fly. And that's for around 1,000 pages a day. When it's night time there in the US, I'm often the lone ranger - and that's when pages are flooding in from India, Asia , the Philippines, and Australia/NZ.

We need more patrollers, and ones who know how to do it, but it's most definitely not a task that appeals to everyone.

A useful tool developed by Snottywong clearly shows how this works.

WereSpielChequers (talkcontribs)

I like the idea of reducing edit conflicts. But I don't like the idea that attack pages sit for longer, so this would need to be combined with the red colour concept for high risk articles that are available to all patrollers. The category idea is good if it can be used to create wikiproject specific pages that would be subpages of each wikiproject, but the that is largely for mid-queue to assist the end of the queue team, as a large proportion of new articles are uncategorised - some for days.

The drawback of the stack idea is that a lot of us have ways of judging what sort of article something is from the couple of lines at special newpages, and I suspect a lot of us like the ability to pick and choose. What would make a difference would be something that allowed you to ignore articles that another patroller was in edit mode in.

Kudpung (talkcontribs)

You've made a very important point about the 'couple of lines at special newpages'. The entry on the special:new pages should be a little bit longer to provide adequate description. This would enable editors like me who generally cherry pick for articles on BLP and bands, or otherwise toxic, etc;, that are very likely to be customers for deletion

This should be a very easy software tweak and perhaps we should file a bug for it. We also need to include this in any advice we give tio NPPers.

Effectiveness of "catching the creators while they are still online and logged in"

8
WereSpielChequers (talkcontribs)

One of the big divides at NPP is between those editors who think it is as Kudpung put it "important to catch the creators while they are still online and logged in". And those of us who think that while it is good to quickly warn bad faith editors, good faith editors respond best to being helped, having some slack cut for them and being given a little space; But are driven away by threats to reject their work and delete their article. This divide makes certain changes difficult to agree because we have very different perceptions of the problem, there are some changes that don't involve this and that both sides can support, but making changes at New Page patrol would be much easier if we had some research behind this to show whether very promptly templating articles and warning their creators encouraged or deterred the creators from fixing their articles, and in particular whether doing this very quickly was either more effective at getting articles improved or more efficient at driving newbies away .

I've done quite a bit of trawling through the BLPprod queues, and while I'm not claiming any statistical robustness, my impression is that if articles with a BLPprod tag get rescued it is usually an experienced editor who does so. But I would be open to persuasion on this, as I hope would be those who take the opposite view to me. Speedies are harder to measure in this way because they get deleted so quickly I find it difficult to spot ones where the author has been spurred on to make greater efforts to improve their article by the threat of deletion.

If we do some research it would be important to focus on the relatively recent data, as earlier this year the system on EN wiki was changed to default to emailing editors when they received a talkpage message. One would presume that this would reduce the advantage of catching the editor before they left.

Whether the results showed a pattern that the quicker the templating the less likely the editor was to stay, or the reverse, I believe it would be much easier to improve NPP if we had some research. I remember trying to get something like this into the WSOR program for 2011 and we might be able to use some of the datasets created for that program. NB G3, G7, G10, G11 and G12 tags need to be excluded, or better still the results analysed by deletion code otherwise you risk this being skewed by our effectiveness at targetting attack pages and driving away vandals.

Kudpung (talkcontribs)

Any solution(s) has:has to be addressed around the three main players: 1) The NPPer, 2) the article creator, and 3) the deleting admin (in the case of CSD). Gathering research is difficult and probably the best indication is the empirical experience from the kind of new page patrolling that I have spent 50 or so hours doing over the last few days. If anyone wants some feedback on how the patroller are performing, they can go herehttp://toolserver.org/~snottywong/cgi-bin/patrolreport.cgi than work their way systematically through the list looking at all the articles that have been patrolled, and if they have been wrongly patrolled, checking the NPPers' talk pages to see if they have been warned before, and than correcting any mis-tags on the fly. I'll list some points that I have mentioned many times before:

  1. NPP needs to be either a right, or NPPers must undergo some form of gtraining. I have suggested a video turial as the best solution.
  2. Article creators are often SPA and don't come back to see what has happened to their article. Almost all new pages are by newly registered accounts.
  3. Do the deleting admins always check what has been tagged? Or do they take the NNPers at their word?Solutions:
  • Train the NPPers and make NPP a user right.
  • De-index and move very poor but possibly salvagable articles to AfC or user space; With of course a suitable message to the creators:
Welccome to Wikipedia and thank you for your contributions. The article articlename that you recently createdis unfortunately not suitable for immedate publication and has been moved to [xxxxxxxxxxx where you will be able to develop it further without fear of deletion. When the article is ready, it can be moved back to mainsspace by an established editor. Thank you, and happy editing!
  • Get Twinkle to leave a message on the creator's talk page when any maintenance tag is applied. What takes up most of my patrolling time is placing custom messages such as:
Welcome to Wikipedia and thank you for your contributions. The article articlename that you recently created has been flagged for urgent attention. Please consider returning to the article and addressing the issues that have been pointed out. If the artice is likely to take longer to develop than you thought, perhaps you would prefer develop it in your user space. - an editor could move it there for you. Thank you, and happy editing!

Such features would be extremely easy to implement - but it appears these suggestions are not making resonance. As no solution for CorenBot seems to be forthcoming, and in the light of the dozens of new articles (all slated for deletion) coming from India in the wake of the India Education Program, something needs to be done quickly.

WereSpielChequers (talkcontribs)

I think we have threads for those suggestions, some I agree with and some I don't. The problem with using Twinkle to tell article creators about tags placed on their new article is that it may well be counter-productive. I'm aware that there are people who think it important to communicate with newbies before they log off. But there is the alternative interpretation that we are driving away newbies with our templates and warnings, and if that is the case doing that more thoroughly will drive away more newbies more quickly. Since the divide is in people's perceptions of what is going on, I'm suggesting that we undertake some research to see whether rapid tagging is an efficient way of driving away newbies or an efficient way of getting those newbies to improve articles. Until we have such research it would be wasteful to invest in making the NPP process more efficient at templating the newbies.

Kudpung (talkcontribs)

It depends how the templates are worded. Researcx has already shown that most newbies believe the first messages they get are hand-written. I do it all the time, and with great response from the article creators. Unfortunately, where everyone wants stats, there are no metrics to prove it.

Steven (WMF) (talkcontribs)

What research showed that? I've seen pretty clear evidence that people who receive the current, passive voice and institutional-sounding templates think that Wikipedia is automatically warning them, rather than it being communication from a human being.

Kudpung (talkcontribs)

Wiki is a huge place, and I can't remember where I saw it, but I certainly did, because it's one of the areas I work on. You hit the nail on the head though when you said 'insitutionalist sounding', and that's the brunt of the problem. I have never understood why at en.Wiki we can't have friendly messages instead of the pompous walls of text that are usually composed. Based on my old studies of comm.sci,, I have a very good idea why this is. All UK government official documents have lost their Dickensian touch over the last decade or so (probably thanks to the Internet) and websites are friendlier - even when applying online for a new passport or driving licence! To enhance the new user reception and experience, these concepts need to be borne in mind. The de;Wiki is different, almost everything in German sounds official, though I greatly appreciate that their more modern and more frequent use of 'Du' on their Wiki is very refreshing. Unfortunately, English is one of the few European languages that does not have such a distinction.

Note that the messages I use are not TLDR diatribes with shedloads of links to obscure policies; I tend to think a creator would be really happy if someone came on line, saw what they were doing, and offered some help to get it right. I know I would, but in the days when I created my first pages, I didn't even receive a welcome template for years - and when it came, I had no idea it was only a template and I was overjoyed at the thought that someone 'up there' had really noticed my painstaking work on Thailand articles. In fact one of my very first edits was to create a cut 'n paste move - I had absolutely no idea where all the rules were, especially for repairing a misspelt page name, and nobody noticed and put me right - or chided me for it!

WereSpielChequers (talkcontribs)

I think we need to remember that communication is not just a matter of posting messages on talkpages, communication also includes interacting by editing the same article. My modus operandi is to do a gnomish edit on an article, spot that another editor there has a redlinked talkpage and drop them a welcome template. I don't know whether 9% or 90% of newbies connect the two events, but my assumption is that at least a proprtion of editors think the same way as me.

However there is a huge easy win awaiting whoever starts testing welcome messages and announces which ones work better than others (from my direct marketing days the one thing I'm sure of is that the messages that work best won't be the ones that a random focus group would predict would work best).

Rich Farmbrough (talkcontribs)

I have found a lot of BLP prods are rescuable, but I know that several people work at the back of the 10 day queue. So it's hard to judge what gets lost that shouldn't.

Enable No Index in mainspace. Put Noindex in badfaith speedy templates and all articles until patrolled

17
WereSpielChequers (talkcontribs)

One of the problems of the current system is that {{Noindex}} doesn't work in mainspace - if it did we'd be putting it in the templates for G3, G10 and probably G11 and G12. But if it is now possible to get IT resource to improve NPP then maybe we can start thinking big.

If unpatrolled new articles all had noindex then we'd have some huge painless changes at NPP.

Noindex would mean that Google etc would ignore and not cache these articles or add them to search engine until they were patrolled.

Attack pages and vandalism which currently persist in Google caches would be gone as soon as we'd deleted them.

Spammers who rely on the Google caches and that sometimes their spam persists for weeks would find us a much less tempting target. Some of them might even go elsewhere or try to write in a somewhat less spammy style..

Immediatists who don't want us to accept poorly formatted unsourced new articles could console themselves that unpatrolled new articles were effectively drafts.

Goodfaith Article creators wouldn't get bitten because they wouldn't know their pages spent its first hours or days noindexed, just as today they don't know if their article has been marked as patrolled or not. So no newbies would be bitten by this change, but presumably all the people who wanted to not have a large subset of these articles created would see this as an improvement.

Σ (talkcontribs)

Support!

But what if vandals had bots that would noindex featured articles and such?

WereSpielChequers (talkcontribs)

Well one option would be to limit this to unpatrolled pages, that would be difficult for vandals to game as they don't have a "mark as unpatrolled" button. I'd like to also noindex anything that is tagged as G3 or G10, and that sometimes includes very old articles. But if there isn't an easy way to do this then I could leave with this just being unpatrolled articles. However if we could limit it to that then I think our FAs would be safe. If vandals start tagging Featured articles as G3 or G10 then the noindex aspect would be the least of our worries, if anything it would be a positive, as the brief moment when an FA was vandalised would not get Google cached..

Rich Farmbrough (talkcontribs)

"What if vandals had bots that did foo." is a standard question. The answer is we'd block and revert.

Steven (WMF) (talkcontribs)

As Σ points out, the potential for abuse of {{noindex}} in the mainspace seems really dangerous. Can you imagine what would happen if a particularly prolific or sneaky vandal managed to noindex any number of legitimate articles?

I think it's an important part of incubator-style spaces (such as what's proposed in Article creation workflow) that they be noindexed. But endangering our core mission by potentially preventing search engines from finding articles in the mainspace is a big risk to take.

Also, considering that many new articles (such as those about disasters or other breaking news) are extremely valuable, I can't imagine we'd want to have to wait for the backlog of patrolling to be done in order to make them visible to readers.

Kudpung (talkcontribs)

This is a case for adding a 'No index' feature to Twinkle that automatically noindexes any articles that are not good enough for publication but that might be eventually kept.

Before we get this far though, I think we need to consider a new page patroll process, similar to AfD or PROD, but not a proposal for deletion, that would noindex a page, send it to AfC, and leave a nice, friendly message on the creator's talk page. A not very often used solution, is to move pages to a creator's sub page. Very similar rally. I don't know if or where we have discussed such possibilities before.

These are also solutions that could be integrated in some way into the Creation workflow interface. BTW: is there actually any further development on that taking place? If ACTRIAL is never to be implemented, we need to be looking actively at these solutions, as well at further development of the Zoom tool.

Steven (WMF) (talkcontribs)

As far as, "noindex a page, send it to AfC, and leave a nice, friendly message on the creator's talk page" I heartily agree, though I would hope that rather than having to manually move to AfC and noindex, we could have tools to demote articles to the noindexed draft workspace Brandon described in Article creation workflow. A single click to "move to draft" rather than tag/delete would save a lot of newbie biting and still remove articles not yet up to snuff away from the mainspace and search engines.

Kudpung (talkcontribs)

Y

Agzin, as I have said many times, the sucess of tis would depend very much on patroller education - they should not use this feature indiscriminately instead of A1, A3, or PROD, for example. es, that's exactly what I had in mind. It's not difficult to programme Twinkle to automate these steps:

  1. 'no-index' the page
  2. Move the page to AfC or pre-titled user sub page
  3. Semi-protect the page for page moving
  4. Leave a nice friendly message on the user talk page.

This would of course depend very much on patroller education; we would not want them userfying pages indiscriminately instead of A1, A3, PROD, etc.

Who is actually developing Zoom - is it Ian or Brandon?

Jorm (WMF) (talkcontribs)

Zoom is still in the design phase, so I am on point here.

I just want to point out that there are possible legal issues with automatically assigning NOINDEX to non-patrolled articles (which I would love to do). The issues stem with the idea of the Foundation expressing editorial judgement. We have a meeting with the legal team about this scheduled, but it won't be for a while yet (given that we have a week of "all hands" activity, followed by some vacations).

It is currently our plan to have our basic business requirements developed over the next two weeks, after which we begin a deep design phase. I can't speak to development resources (Ian is actually only working on this in his spare time - he isn't tasked with it) - so I don't have any schedules for that to give.

Kudpung (talkcontribs)

Two points: 'no-inxex' wouold only apply to pâges moved to AfC or a user sub page - which as far as I know, are not indexed anyway. The main problem s that all new pages are indexed by Google at the speed of light, long before patrollers get to them.

Jorm (WMF) (talkcontribs)

Yes. My ideal solution would be that an article is set to "NOINDEX" until it has been patrolled. However, as I said, there may be legal issues with this regarding the Foundation as an entity exercising editorial judgement. I don't think it's a problem, but I want to make sure we have coverage on it first.

And, to be honest, I'm willing to open a bug about this the instant we get the "okay". It seems like an obvious and simple thing we can do, and do quickly.

Kudpung (talkcontribs)

Sound good. Even though we only have up to about 7 patrollers working at the best of times, Most pages, excep tthose that are so difficult they end up on the backlog, get viewed by a patroller within a few seconds, so unless they are summarily deleted by an admin on patrol or tagged for CSD or PROD , they would be marked by the pagtroller as 'patrolled' and indexed. This wouldn't stop them being visible, but it would prevent the Search engines getting them and caching them.

I still think that a useful new feature to add to Twinkle (and Zoom) though, would be 'userfy' really bad pages that could be improved by the author in time, but would be unreasonable to leave live online as PROD or BLPPROD. I've provided the automated steps somewhere else on this page.

WereSpielChequers (talkcontribs)

Userfy is contentious, partly because we don't know whether it comes off as any less bitey than deletion. Very occasionally I userfy something that the editor probably did intend to put in their sandbox or userpage, but I wouldn't suggest it for potential articles.

I wouldn't worry to much that anyone is going to argue that the Foundation is exercising editorial control as to whose articles are noindexed or not. Appointing admins and autopatrollers and marking edits as patrolled are all volunteer actions, not Foundation ones. Also no indexing new articles is a bit like flagged revisions, except without the snarky bit about telling people their edits are not live until they've been checked. The new pages would still be live, they'd just take a little longer to come up in search engines.

Kudpung (talkcontribs)

- and I believe preventing some article from showing up in search engines is the main issue. I still think 'userfy' is less understood by the NPPers, and not used often enough. The question of biteyness is one that can beaddressed in the mesage that goes with it. Unfortunately, some of the people who write the text for user messages and warning templates may not have the right kind of experience in human communications.

Rich Farmbrough (talkcontribs)

That wouldn't be a problem since it is non-specific.

WereSpielChequers (talkcontribs)

I can't imagine that a breaking news story would be no-indexed for long if at all - remember all admins and Autopatrollers create articles that are already marked as patrolled, and most articles are tagged for deletion or marked as patrolled in their first few minutes. If we get the change that would display the "mark as patrolled" button to any experienced user then there is no risk of a significant newsevent being noindexed for even as long as ten minutes.

As for the risk of abuse in Mainspace, that would depend on how it was done. The most secure way would be to make this a feature of being patrolled. Since editors can't unpatrol an article they wouldn't be able to noindex one. I'd prefer that it also picked up on the G10 and G3 template and I suspect it would be possible to do this in a way that limited Noindex to those templates. But I wouldn't get too concerned at the risk of vandalism using the noindex tag, what motive would vandals have to hide their vandalism from people outside Wikipedia? Putting a penis photo in an article and simultaneously adding noindex would just mean the mirrors were less likely to repeat the vandalism.

WereSpielChequers (talkcontribs)

No index is only dangerous in mainspace if you allow it to be applied indiscriminately. Currently even admins can't mark a patrolled article as unpatrolled, so making unpatrolled articles no index should be pretty safe. I can see that tagging G3 and G10 articles as no-index would be a little tricky to do without somehow enabling people to noindex an article without tagging it for deletion, I'm hoping the devs will say its possible; If not we'd have to just limit the idea to unpatrolled articles.

Ebe123 (talkcontribs)

So an user called Physics is all gnomes conducted NPP training. How about patrollers could do more of the type and with the right, already experienced patrollers can give the permission such as Image-reviewer on commons:.