Jump to content

User:YuviPanda/GSoC

From mediawiki.org

The SelectionSifter helps anyone choose which wiki articles to collect into a selection. (This was a GSoC 2011 project.)

How to interpret estimates

[edit]

The given values are lower bounds. Multiply by 3 to get higher bound. Multiply by 2 to get average. Overshooting timelines is to be expected. Will be adjusted as the project motors along.

Current implementation

[edit]

Rewrite specifications

[edit]
  • Written in PHP
  • Backwards compatible with current assessment templates used
  • Should be 'good enough' to be deployed on enwiki
  • Feature Parity with WP1.0 Bot

Components

[edit]
  1. Assessment Data Collector
    1. Update Assessment Data whenever it is changed
    2. Log changes to assessments
    3. Import initial data from current Bot
  2. Querying interface (Assessment Statistics + Articles List)
    1. Arbitrary Querying of assessment data
    2. Embedding of arbitrary query results in different forms inside wiki articles (Statistical Table embedding)
  3. Creating, managing and exporting 'interim collections'
    1. Usage of Extension:Collections tbd

Component 1: Assessment Data Collector

[edit]

Machine Readable Assessments

[edit]

After talking with User:CBM and User:Awjrichards, we hit on a much better way of doing assessments. Since most assessment templates use the w:en:Template:WPBannerMeta, we can modify that template to provide machine readable assessment data that can be then read by the assessment parser. This eliminates the need to maintain wikiprojects separately.

Representing Assessment Info
[edit]

The approach favored by me would be to modify the template to insert extra attributes (data-* attributes) on the link pointing to the WikiProject home page. data-wp-importance and data-wp-quality placed on that link would denote importance and quality assessment for that particular article from that particular WikiProject, and a class is added to the a tag to denote that it represents a wikiproject assessment. In line with the POSH principle from Microformats, but without too much abuse of class. This puts them in a machine readable form right in the HTML.

Parsing out Assessment Info into Database
[edit]

After each edit, we could either:

  1. Parse out the HTML (after it's generated) and pick out the assessment data
  2. Put an entry into the job queue, which executes code to pick out the assessment data

We then add it to the database if the info has changed, and record pertinent information in the log (user, timestamp, rervision, etc)

Open Issues
[edit]
  1. Okay to use data-* attributes on WMF properties? No issues with browser compat, but still would like to get this clarified.
  2. Is the metatemplate good enough to actually insert these data-* attributes properly? I tried reading it (three times!) and got a headache. Need to contact User:MSGJ.
  3. We'll be parsing HTML to get data out. Is this considered dirty and sinful? Will I be punished by the WMF cabal? This is perhaps the most important issue.
  4. Parse out right after edit, or put in queue? Needs performance testing.
  5. How do I parse out the HTML? OutputPage doesn't build a DOM afaik, and I'd like to avoid reparsing if possible. External library?

Logs

[edit]

Logs of assessment changes every time they are changed.

Tasks
[edit]
  1. Develop logging model, with DA code (2 hours)
  2. Write a Special Page extension to view/filter the log. Filter By: (14 hours)
    1. Time of Change
    2. Type of Change (Importance/Quality/Other)
    3. User making change
    4. Direction of Change (Improve/Detoriate)
    5. Category/Project of article change is made to
    6. Article name

Component 2: Querying and Embedding

[edit]

Query Engine

[edit]

Set of core components that can execute any arbitrary queries, producing both statistics and article lists

Tasks
[edit]
  1. Build a basic querying engine that can be extended in the future over other assessment backends (not just WikiProject based assessments). Abstract and well defined interfaces built. List of supported query operations would rather closely mirror that of LINQ. (est: 12 hours)
  2. Implement the querying engine for the WikiProject based assessments (Component #1) (est: 12 hours)
  3. Implement specific statistical engine for WikiProject based assesments. Support for overall and per project tables (est: 12 hours)

Querying Interface

[edit]

User Interface to interactively query the assessments - both overall statistics and article lists.

  1. Expose the query engine via a Special Page (est: 12 hours design + 12 hours implementation)
  2. Expose the statistical engine via a Special Page (est: 12 hours design + 12 hours implementation).

Embedding Interface

[edit]

Magic Words (or similar) that let you embed statistical tables inside wikipages. Customizable.

  1. Build magic words to embed statistical tables/results in wikipages (est: 6 hours)
  2. Build magic words to embed query results in wikipages (est: 8 hours)