Jump to content


From mediawiki.org
< Search‎ | Old

Last update on: 2014-12-monthly


Ram made an update to reduce noise in log files; this has been reviewed and merged. Chad made an update to remove unused code and reduce build-time warnings; this is under review. Ram and Antoine worked on getting search updated on the Beta cluster. Ram created an initial patch for fixing bug 45266, which was reverted to make sure that the deployment would go smoothly, and that backwards-compatibility was taken into account.


The noise reduction patches are now in production and ops reports that the logs are now much cleaner. Ram has pushed a patch to fix the problem with updates being lost sometimes; it awaits review. Most of Chad's cleanup patches have been merged. Ram is instrumenting the code with better diagnostics so that we can troubleshoot search issues better.


Search deployed to Beta Cluster. Search code instrumented for better troubleshooting and identification of issues, and work is underway to add PoolCounter support. Plan for April to make search updates more robust.


Code has been instrumented (and will soon be deployed) to log more data to allow root cause analysis of the spurious "Zero results" issue. Some log analysis was also done. The Puppet configuration on beta was updated to limit lucene-search-2 memory usage on Labs.


Work has pretty much shifted from supporting MWSearch/lsearchd to investigating and implementing Solr. Nik Everett and Chad Horohoe have begun writing an extension to implement Solr searching for MediaWiki, and we've gotten a lot of the initial basic functionality completed. Peter Youngmeister and Andrew Bogott will be handling the operations tasks for the new setup. Initial operations tasks will involve packaging Solr 4 and working with Chad to puppetize the whole design. Additionally, we're going to do some investigation into ElasticSearch, as it's been suggested as an alternative to Solr.


Nik Everett and Chad Horohoe have continued writing an extension to implement ElasticSearch searching for MediaWiki, and we've finished most of the required features. Next comes getting it deployed, scaled, and fixing the inevitable bugs. We're aiming to deploy to the test site beta.wmflabs.org before the end of the month. Peter Youngmeister and Asher Feldman will be handling the operations tasks for the new setup.


In August we deployed CirrusSearch to test2.wikipedia.org and mediawiki.org and we're testing there. We're actively looking for other volunteers to test out CirrusSearch. Right now, CirrusSearch is not the primary search for mediawiki.org; you have to use a URL parameter to test it. We're hoping to make it the primary in September.


In September, we expanded the new CirrusSearch back-end to a number of wikis. Italian Wiktionary, Catalan Wikipedia and English Wikisource are all running CirrusSearch now. Additionally, we deployed to all "closed" wikis. Further feature refinement and bugfixing are ongoing, with roughly 2 to 3 deployments a week.


In October, CirrusSearch was deployed as a secondary search engine to Wikidata, all Wikivoyage wikis, and Wikipedia in Bengali. It became the primary search engine on Wiktionary in Italian, Wikipedia in Catalan and Wikisource in English. In November, we plan to deploy many more wikis including some larger than the Catalan Wikipedia. To expand to those larger wikis, we've negotiated some new hardware that should be deployed mid month.


Before November 18, we were spinning up an aggressive plan to add many new wikis to CirrusSearch. On November 18, we had multiple incidents that caused us to roll all wikis using CirrusSearch back to Lucene; we've spent the rest of November implementing fixes for all issues discovered on the 18th. That is now done and we plan to switch all wikis that used to have CirrusSearch back to running it as a secondary search engine on December 2. We'll attempt to restart our aggressive plan as soon as we're comfortable with it again.


We've continued our aggressive roll-out of Cirrus as a Beta Feature. You can search now 52% of pages including Commons and Wikidata via CirrusSearch. We've fallen back somewhat on our goal to make Cirrus the primary search engine. Right now, we only handle about 1.5% of search traffic.

While we will be switching more wikis over to Cirrus as the primary search back-end in January, the theme of the month really is adding Cirrus as a Beta Feature to more wikis, including the English Wikipedia. We're not sure how many wikis we'll be able to add before we consider ourselves out of hardware space. We're planning on 50% more servers in February so we'll likely be able to finish adding wikis then.


As of February 3, CirrusSearch is available as a Beta Feature on wikis representing about three quarters of all pages, and serves about 7.5% of our search traffic. Next month, we hope to get the hardware that we need to be a Beta Feature on the remaining wikis. We also hope to be the primary search back-end for more wikis. To that end, we're working through performance and recall issues as well as trying to save space in the indexes.


This month, almost all LuceneSearch and MWSearch bugs have either been closed as problems that are fixed in CirrusSearch, or moved to the CirrusSearch component. We then prioritized all CirrusSearch bugs. After clearing out any remaining high priority issues, engineering work for an update to the design of the search results page is due to commence on March 10.


In March we upgraded to the newest version of Elasticsearch and expanded onto more wikis. We also started a performance assessment which has started showing us the work required to use Cirrus as the primary search back-end for the larger wikis. We then started in on that work.


We deployed Cirrus as a Beta Feature on all wikis that didn't yet have it. We're working on deploying a change to how snippets are generated that should be faster and better. We're also starting to work with Elasticsearch plugins for improved analysis of some languages as well as backup.


In May, we deployed changes to improve snippets generated by Cirrus to a handful of wikis, spent some time improving its analysis for Hebrew, and adding more backwards compatibility with lsearchd's syntax to Cirrus.


CirrusSearch is running as the default search engine on all but the highest traffic wikis at this point. Nik Everett and Chad Horohoe plan to migrate most of the remaining wikis in July, leaving only the German and English Wikipedia to migrate in August.


Our deployment of CirrusSearch to larger wikis as the primary search back-end turned out to be too ambitious. After encountering performance issues, we rolled back this change. We are now addressing the root of the problem, by getting more servers (nearly doubling the cluster size) and putting together more optimizations to the portion of Cirrus that fell over (working set). If everything goes as planned, it'll be reduced by about 80%, by reducing indexing performance in return of search performance. These optimizations will slightly change result relevance; please let us know if you notice any issues.


We started deploying Cirrus as the primary search back-end to more of the remaining wikis and we found what looks like our biggest open performance bottleneck. Next month's goal is to fix it and deploy to more wikis (probably not all). We're also working on getting more hardware.


In September we worked to mitigate the performance bottleneck that we found in August. We found there to be no silver bullet but used the information we learned to pick and order appropriate hardware to handle the remaining wikis. We also implemented out significantly improved wikitext Regular Expression search.

In October we've begun rolling out the wikitext Regular Expression search and received some of the hardware we need to finish cutting over the remaining wikis. We believe we'll get it all installed in October and cut the remaining wikis over in November.


In October we prepared for November in which we deployed Cirrus to all the remaining wikis by installing new servers installing new versions of Elasticsearch and our plugins. We also fixed up regex search which had caused a search outage.


In December we wound down the search project after successfully deploying CirrusSearch to enwiki. We fixed a few bugs and did some work on preventing very complex queries from putting undue load on the servers. We don't plan to do any work on this project in January.