Jump to content

Article feedback/Data

From mediawiki.org
A complete, anonymized dump of article feedback ratings collected from the English Wikipedia between July 2011 and July 2012 is available from the DataHub.
Real-time, censored data is still replicated on the toolserver.

Historical data dumps collected via previous AFT versions as well as weekly data dumps from the current AFT version were available for download, but only for English Wikipedia. All dumps are available as compressed CSV files and are released under a CC-BY-SA license. Real-time anonymized AFT data was also available via the toolserver.

Three different kinds of dumps are currently available: raw rating data, article rating summary, and call-to-action data.

Raw rating data

[edit]

aa_combined dumps contain the raw data for each successfully submitted rating.

field meaning
aa_page_id the unique identifier for the page
page_namespace the namespace of the page (this field is currently disabled as AFT is only applied to the article namespace of the English wikipedia
page_title the full title of the page
rev_len the length of the revision expressed in bytes
aa_user_id this value is set to 0 for anonymous users and to 1 for registered users
aa_user_anon_token a unique token assigned to each individual rater
aa_revision the unique id of the specific page revision that was rated
aa_timestamp the time at which the ratings were submitted expressed as a 14-digit timestamp
aa_rating_wellsourced the rating submitted to the first quality dimension ("Well-sourced" [v.1] or "Trustworthy" [v.2+]) on a 1-5 scale. 0 means that no rating was submitted
aa_rating_neutral the rating submitted to the second quality dimension ("Neutral" [v.1] or "Objective" [v.2+]) on a 1-5 scale. 0 means that no rating was submitted
aa_rating_complete the rating submitted to the third quality dimension ("Complete") on a 1-5 scale. 0 means that no rating was submitted
aa_rating_readable the rating submitted to the fourth quality dimension ("Readable" [v.1] or "Well-written" [v.2+]) on a 1-5 scale. 0 means that no rating was submitted
aa_design_bucket this field was used to track users who were presented with a modified version of AFT with no expertise option during the A/B testing phase
expertise_general this value is set to 1 if the user ticked the following option: "I am highly knowledgeable about this topic (optional)"
expertise_studies this value is set to 1 if the user ticked the following option: "I have a relevant college/university degree"
expertise_profession this value is set to 1 if the user ticked the following option: "It is part of my profession"
expertise_hobby this value is set to 1 if the user ticked the following option: "It is a deep personal passion"
expertise_helpimprove_email this value is set to 1 if the user submitted his/her email address to help improve the article
expertise_other this value is set to 1 if the user ticked the following option: "The source of my knowledge is not listed here"

The following schema can be used to import aa_combined data into a MySQL table:

CREATE TABLE `aa_combined` (
  `aa_page_id` int(10) unsigned NOT NULL,
  `page_namespace` int(11) default NULL,
  `page_title` varchar(255) default NULL,
  `rev_len` int(8) default NULL,
  `aa_user_id` int(1) default NULL,
  `aa_user_anon_token` binary(32) default NULL,
  `aa_revision` int(10) unsigned NOT NULL,
  `aa_timestamp` datetime default NULL,
  `aa_rating_wellsourced` int(1) default NULL,
  `aa_rating_neutral` int(1) default NULL,
  `aa_rating_complete` int(1) default NULL,
  `aa_rating_readable` int(1) default NULL,
  `aa_design_bucket` int(1) default NULL,
  `expertise_general` int(1) default NULL,
  `expertise_studies` int(1) default NULL,
  `expertise_profession` int(1) default NULL,
  `expertise_hobby` int(1) default NULL,
  `expertise_helpimprove_email` int(1) default NULL,
  `expertise_other` int(1) default NULL,
  KEY `aa_revision` (`aa_revision`),
  KEY `aa_page_id` (`aa_page_id`),
  KEY `page_title` (`page_title`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

Article rating summary

[edit]

aap_combined dumps contain a summary of rating data per article, sorted alphabetically by page_title.

field meaning
aa_page_id the unique identifier for the page
page_namespace the namespace of the page (this field is currently disabled as AFT is only applied to the article namespace of the English wikipedia)
page_title the full title of the page
page_len the length of the page in bytes at the time of the dump
aap_total_wellsourced the sum of the ratings submitted to the first quality dimension ("Well-sourced" [v.1] or "Trustworthy" [v.2+])
aap_count_wellsourced the total number of users who rated the article on the first quality dimension
aap_total_neutral the sum of the ratings submitted to the second quality dimension ("Neutral" [v.1] or "Objective" [v.2+])
aap_count_neutral the total number of users who rated the article on the second quality dimension
aap_total_complete the sum of the ratings submitted to the third quality dimension ("Complete")
aap_count_complete the total number of users who rated the article on the third quality dimension
aap_total_readable the sum of the rating submitted to the fourth quality dimension ("Readable" [v.1] or "Well-written" [v.2+])
aap_count_readable the total number of users who rated the article on the fourth quality dimension

The following schema can be used to import aap_combined data into a MySQL table:

CREATE TABLE `aap_combined` (
  `aa_page_id` int(10) unsigned NOT NULL,
  `page_namespace` int(11) default NULL,
  `page_title` varchar(255) NOT NULL,
  `page_len` int(8) unsigned default NULL,
  `aap_total_wellsourced` int(10) NOT NULL default '0',
  `aap_count_wellsourced` int(10) NOT NULL default '0',
  `aap_total_neutral` int(10) NOT NULL default '0',
  `aap_count_neutral` int(10) NOT NULL default '0',
  `aap_total_complete` int(10) NOT NULL default '0',
  `aap_count_complete` int(10) NOT NULL default '0',
  `aap_total_readable` int(10) NOT NULL default '0',
  `aap_count_readable` int(10) NOT NULL default '0'
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

Call-to-action data

[edit]

clicktracking dumps contain hourly data for call-to-action events. The @n suffix refers to the internal version numbering used by the AFT tool.

field meaning
hour the hour during which CTA events were tracked expressed as a 14-digit timestamp
ext.articleFeedback@8-pitch-bypass number of times no pitch was shown
ext.articleFeedback@8-pitch-edit-accept number of clicks on the edit-accept button
ext.articleFeedback@8-pitch-edit-reject number of clicks on the edit-reject button
ext.articleFeedback@8-pitch-edit-save-save-attempt number of attempts to save a revision resulting from the edit CTA
ext.articleFeedback@8-pitch-edit-save-save-complete number of successful attempts to save a revision resulting from the edit CTA
ext.articleFeedback@8-pitch-edit-show number of times the edit CTA was displayed
ext.articleFeedback@8-pitch-join-accept-login number of clicks on the log in button (CTA inviting raters to create an account)
ext.articleFeedback@8-pitch-join-accept-signup number of clicks on the sign up button (CTA inviting raters to create an account)
ext.articleFeedback@8-pitch-join-reject number of clicks on the join-reject button (CTA inviting raters to create an account)
ext.articleFeedback@8-pitch-join-show number of times the join CTA (account creation) was displayed
ext.articleFeedback@8-pitch-survey-accept number of clicks on the survey-accept button
ext.articleFeedback@8-pitch-survey-reject number of clicks on the survey-reject button
ext.articleFeedback@8-pitch-survey-show number of times the survey CTA was displayed
ext.articleFeedback@8-survey-cancel number of clicks on the survey-cancel button
ext.articleFeedback@8-survey-submit-attempt number of attempts to submit feedback in response to a survey CTA
ext.articleFeedback@8-survey-submit-complete number of completed attempts to submit feedback in response to a survey CTA
ext.articleFeedback@8-toolbox-link number of clicks on the "Rate this page" link in the toolbox

The following schema can be used to import clicktracking dumps into a MySQL table:

CREATE TABLE `clicktracking` (
  `hour` datetime default NULL,
  `8-pitch-bypass` smallint(3) default NULL,
  `8-pitch-edit-accept` smallint(3) default NULL,
  `8-pitch-edit-reject` smallint(3) default NULL,
  `8-pitch-edit-save-save-attempt` smallint(3) default NULL,
  `8-pitch-edit-save-save-complete` smallint(3) default NULL,
  `8-pitch-edit-show` smallint(3) default NULL,
  `8-pitch-join-accept-login` smallint(3) default NULL,
  `8-pitch-join-accept-signup` smallint(3) default NULL,
  `8-pitch-join-reject` smallint(3) default NULL,
  `8-pitch-join-show` smallint(3) default NULL,
  `8-pitch-survey-accept` smallint(3) default NULL,
  `8-pitch-survey-reject` smallint(3) default NULL,
  `8-pitch-survey-show` smallint(3) default NULL,
  `8-survey-cancel` smallint(3) default NULL,
  `8-survey-submit-attempt` smallint(3) default NULL,
  `8-survey-submit-complete` smallint(3) default NULL,
  `8-toolbox-link` smallint(3) default NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;