Reading/Web/Projects/Mobile Page Issues/AB tests
Page Issues A/B test results
Assessment of the effects of the new design for page issues warning templates on mobile web readers
|
From September to November 2018, we ran A/B tests on the mobile web version of the Latvian, Persian, Russian, Japanese, and English Wikipedias. Our goal was to determine the performance of improvements we made to the appearance of templates that show messages regarding page content (page issues). Previously, the content of these templates was only accessible through a small link at the top of the article, making it difficult for readers and editors to know when the articles they are reading on the mobile website have content issues. Through this work, we made the templates more visible by displaying parts or all of the template text on top of the article. We also wanted to show the severity of the indicated page issue, by displaying each issue template with the severity and color it has on desktop. Our objective for the project was to increase awareness of particular issues within an article on the mobile web. Through this A/B test, we wanted to gain insight on whether this goal was achieved by looking at the rates of interaction (clickthrough) of our new treatment versus the original treatment. We also wanted to gauge the effectiveness of our severity level variants by comparing the clickthrough for issues of different severity levels. Finally, we wanted to assess whether these changes had any immediate effects on the rates of mobile editing. Our hypotheses were the following:
- The clickthrough rate for the new treatment will be higher than the old treatment
- The clickthrough rate for issues of higher severity will be significantly larger than the clickthrough rate for issues with lower severity
- Articles with the new treatment will be edited more frequently than articles with the old treatment
Our results support two of the above hypotheses: users interact with the new treatment more frequently than the old treatment and when they do, they tend to interact with higher-severity issues at a higher rate than lower-severity issues. However, our data does not support the hypothesis that the new design would cause more frequent edits to pages with issues. Our conclusion is that this feature improves awareness of issues with articles on the mobile website. Our results also suggest that readers understand the severity of these issues correctly and pay more attention to issues with greater severity.
Does the new treatment increase the awareness among readers of page issues?
[edit]The clickthrough ratio (for top-of-page issues notices) increased markedly with the new treatment on all five wikis (e.g. over 7x on ruwiki). We can confidently assume that the new design increases the awareness of page issues among readers.
The new design also reveals the severity of an issue before clicking on the notice (e.g. a notice that the article has been nominated for deletion has high severity). The clickthrough ratios differ significantly between all four severity levels, with "high" severity issues receiving far more clicks, indicating that readers perceive and understand this information in the new design. The differences between the ratios for the three lower severity levels (default, medium and low) are smaller.
Issue severity | Clickthrough ratio |
---|---|
high | 2.70% |
default | 0.73% |
medium | 0.47% |
low | 0.41% |
(Data for page-level issues on the English Wikipedia)
Or observed over time during the duration of the experiment on the English Wikipedia:
Furthermore, we investigated where readers go from the page issues modal (the message that appears after tapping on a page issue notice). We distinguished the following four kinds of links:
- modal closes, i.e. the "X" that closes the modal and brings the user back to the article
- modal edit links, i.e. links leading to the edit screen for the current article (e.g. in the following text generated by the English Wikipedia's "More citations needed" template: "Please help improve this article by adding citations to reliable sources.")
- internal links, generally leading to policy and help pages (e.g. in the above example: "Please help improve this article by adding citations to reliable sources", linking to this help page) or the article's talk page.
- red links (to nonexisting pages on the same wiki), which appear to occur quite rarely, e.g. for a link to the article's talk page when that talk page hasn't been created yet.
The following table shows the ratio of clickthroughs to these four kinds of links, relative to the number of all issues modal views on the corresponding wiki, in the old vs. new design . Unsurprisingly the "X" to close the modal is the most frequently used one. Internal links are fairly popular too, with large variation between the five wikis in the test.
wiki | version | red_links % | internal_links % | modal_edit_links % | modal_closes % | issue_clicks |
---|---|---|---|---|---|---|
enwiki | new2018 | 0.02 | 10.19 | 1.60 | 29.27 | 223152 |
old | 0.02 | 14.65 | 2.40 | 25.60 | 45671 | |
fawiki | new2018 | 0.04 | 8.76 | 0.73 | 18.64 | 51947 |
old | 0.02 | 7.13 | 0.25 | 19.20 | 12290 | |
jawiki | new2018 | 0.07 | 4.21 | 1.23 | 21.56 | 339062 |
old | 0.23 | 8.18 | 0.57 | 24.58 | 59103 | |
lvwiki | new2018 | 0.99 | 4.96 | 3.37 | 29.56 | 504 |
old | 1.00 | 9.00 | 2.00 | 29.00 | 100 | |
ruwiki | new2018 | 0.12 | 10.97 | 0.70 | 25.25 | 115331 |
old | 0.21 | 16.80 | 0.99 | 27.34 | 8570 |
(Again, keep in mind that this measures the usage of modal link clicks to modal opens overall, not the clickthrough rate of that particular link per se relative to how often it appeared. In particular, red links show up rarely in the first place. Also, because we are one step down the funnel already, differences between the old and new design take into account the increased clickthrough rate for the issues modal per se (see above or the right column in the table). For example, on enwiki the new design leads to more modal edit link clicks overall, even though per the the table they are clicked less often per modal view than in the old design.)
Do mobile edits increase with page issues as referrer?
[edit]For practical reasons, we limited ourselves to measuring taps on the edit button (rather than actual saved edits).
Based on the data we can reject the hypothesis that those edit button clicks would increase (on pages with issues) in the new design. To the contrary, we even saw a slight but statistically significant drop in edit button clickthroughs on four of the five wikis. So far, based on our current understanding, we don't regard this as evidence of a detrimental effect of the new design, e.g. considering the absence of a clear explanation of a mechanism that could cause this (keeping in mind that what we could measure here are only taps on the button, not finished edits, so the observed effect might e.g. only impact unintentional taps). But it is something to remain aware of.
wiki | version | edit clickthrough ratio | edit clicks | pageviews |
---|---|---|---|---|
enwiki | new2018 | 0.3682% | 145064 | 39398385 |
old | 0.3736% | 148880 | 39845772 | |
fawiki | new2018 | 0.4011% | 25358 | 6322570 |
old | 0.4057% | 25832 | 6367373 | |
jawiki | new2018 | 0.4917% | 352298 | 71651369 |
old | 0.4962% | 360676 | 72680944 | |
lvwiki | new2018 | 0.4854% | 397 | 81793 |
old | 0.4521% | 369 | 81625 | |
ruwiki | new2018 | 0.4059% | 51272 | 12632030 |
old | 0.4184% | 53322 | 12743416 |
Also, this ratio of edit button taps saw large changes over time during the experiment for both test and control (i.e. not due to the page issues feature). In particular, four of the five wikis saw a sharp drop on October 19, for reasons that are so far unknown.
- Do they increase more or less for anons or editors, for editors per bucket?
Do page issues affect the time spent on each page?
[edit]Other notes and caveats
[edit]- Phabricator task with queries and other details: phab:T200794
- EventLogging schema documentation: m:Schema:PageIssues
- Several of the above metrics should normally be calculated based on data that is sampled per pageview. But in order to limit programming efforts, this instrumentation used sampling by browser session instead (enabling the reuse of existing, tested code). This can introduce inaccuracies because e.g. the number of issues clicks per pageview may not be statistically independent within one session. However we assume that this error is very small, considering the small average session lengths.