(Ugh, Flow.) I just noticed that searching for "wg" on Wiktionary finds wikt:MediaWiki:Common.js, but "wg insource:/wg/" does not. Why?
Talk:Search/Old
I'm personally pretty neutral to positive on projects to build threaded talk pages. But I don't have the kind of experience with talk pages that others do. Anyway, that is beside the point.
For your real question: Lame reasons. See T88247.
moving forward will this extension be bundled into new versions of mediawiki like vector, nuke, wikieditor, etc? or is this something that will always have to be manually installed?
As much as I'd like to bundle CirrusSearch I'm afraid we can't as it'd require all 3rd parties to install Elasticsearch. That's a really really big discussion.
Hello. We notice a seroius issue with the new search engine on the fr.wikipedia. As there are many articles the title of which contains characters such as hyphen, shout out... By using the previous search engine, when you proposed names as for example our ministers Najat Vallaud Belkacem (instead of Najat Vallaud-Belkacem) or Jean Yves Le Drian (instead of Jean-Yves Le Drian) : the answer was immediate and we obtained the direct link towards the article. Today, nothing more. There is also un other issue related to the use in a title of ’ instead of ' et vice-versa. It's urgent !
@User:NEverett (WMF), can you please take a look? Merci!
I took a look and I'm not sure the old behavior was. The full text search seems to find the people when you search with or without the dash. Do you mean that find as you type search?
Is the ’ instead of ' issue also with prefix search?
Thanks. Don't hesitate to request more information in order to understand as well as possible our trouble in France. First screenshot (not OK today, that's was OK with the previous search engine): https://commons.wikimedia.org/wiki/File:Jean-Yves_Le_Drian_-_Not_OK.jpg Second screenshot (that's OK if we add the hyphen) : https://commons.wikimedia.org/wiki/File:Jean-Yves_Le_Drian_-_OK.jpg
Thanks for the screenshot. Its a prefix search issue. I've filed it here: https://bugzilla.wikimedia.org/show_bug.cgi?id=73560
Thanks for taking account the bug related to ’ v/s ' but what about our main trouble related to the hyphen as you can see on the two screenshots ? Thanks for your answer.
This bug should cover both cases. I'll start work on it this morning I believe. I don't think its going to take that long to finish but getting it deployed and the index rebuilt will take some time. The earliest this would be fixed is late late Monday CET.
I mostly find small problems in English Wikipedia and fix them. I have developed a variety of searches which find such errors. An example would be at this search for the ordinal error 1th which works with the old search, but not when the new search is turned on. I have a long series of similar searches set up at w:en:User:SchreiberBike/Workspace/Ordinals. I haven't found ways to do this in the new search. I think it is likely that I will have to rewrite my many queries, but right now I'm not even sure such things will be possible. Please suggest how to move forward.
I did some poking and think I found the root cause: -"quoted phrase" is being misinterpreted. I've filed that as Bugzilla:70301 and will work on it soon.
I proposed a fix for what I think caused your issue. In all likelyhood it'll head to wikipedias next Thursday. You can work around it by replacing the - before the quotes with NOT. This won't be required after next Thursday. In addition the <<-广声法师>> clause won't work until this Thursday - there is a fix for that heading out as well.
And another thing: thanks for doing this work. Its something I really want to make sure we don't totally bust with Cirrus. I don't imagine it'll be a perfect switch but I want to make sure nothing becomes impossible.
Thanks for looking at it. I'll give it another try at the end of next week and let you know how it works.
I've tried turning New Search back on and the results I get look about the same as they did when I first posted above.
OK! I found another problem with Cirrus after trying your query again. It was kind of being masked by your first problem. I've proposed a fix for it (https://gerrit.wikimedia.org/r/#/c/161474/) but haven't yet got it reviewed. Once its reviewed I'll try to get it released quickly and verify your query works. It might not be Monday but it should be soon.
Please have a look now!
This is great. It looks like I think it's supposed to look. I'll play with it some more and see if I can turn up any problems. Also, I'm getting more results with Cirrus than with the old search; that's good. The new search also takes the portions in quotes more literally than the old e.g. a search for "1th" only returns articles with that exact string included. Thanks!
Awesome! Thanks for finding and reporting this. I'm glad its working well for you now!
I've been playing with Cirrus search in English Wikipedia for a bit and one difference I'm finding, which is significant for the kind of work I do, is that it doesn't see the hidden text of references. For instance, if I run a search for "XIth century -368vebleninstinct" (not in quotes) in the old system, the article w:en:Workmanship does not come up because one of the references in that article has the string 368vebleninstinct in a web address. However when I run that with the new search, it does come up. That also means that I can't search for other articles which use that same reference (although there may be a special search for that). If that's a deliberate choice, I can adjust my queries to not use the hidden text of references as exclusion criteria, but I'd rather not.
I've also noticed that the new search does not pick up the comment text of {{clarify}} templates in the form {{Clarify|date=October 2014|reason=Should this be '1st', '11th' or something else?}}. Again, that's something I can work around if needed, but if I don't have to that would be better.
Another possible problem: With the new search, a search for "XIXth century" (not in quotes) has w:en:Provisional Government of the French Republic in its results, but that article doesn't have the word century in it. Same for w:en:German military administration in occupied France during World War II. On the other hand, some articles, such as w:en:Cathar castles in a search for "XIth century", have come up in the new search that never did in the old search.
Thanks for trying it!
By default Cirrus tries hard to search on visible text to make results make more sense for casual readers. So not picking up hidden text in templates is totally intentional. It has a syntax to search in the article's source though. Searching for <<XIth century -insource:368vebleninstinct>> doesn't pick up w:en:Workmanship like lsearchd did with your old search. Is that a decent work around?
I can explain w:en:Provisional Government of the French Republic as well. It does contain the word century but its hidden and Cirrus doesn't properly remove the text. Its in the navbox at the bottom of the page which you can explode by clicking "French Topics". I've filed this as bugzilla:71562. I figured out what was up by adding ?action=cirrusdump
to the page and searching for the word "century" in the result. Its in the "auxiliary_text" field which is usually stuff like image captions and tables. In this case the navbox snuck in.
I've been using the new search for a while and have liked it, especially the fact that it doesn't see things in the source text like linked URLs, but I ran into a possible issue with this search on English Wikipedia for example. It looks for "11st" which is usually an error intended to be be "11th". In the search above, it turns up places where one line ends in a "1" and the next line starts with a "1st". I don't think that is what is intended, so I thought I'd bring it to your attention.
Also, will "intitle" searches be available in new search. They don't seem to work now. Thanks.
Filed the 11st issue here: https://bugzilla.wikimedia.org/show_bug.cgi?id=73558
I think its caused by how we squash the extra text. You can work around it for now by searching for insource:/ 11st /. Its not as good because it wants spaces only rather than word breaks. You could also try insource:11st. Its _probably_ not as effected by the bug.
Also, can you give me an example of intitle not working? Its working for me: https://en.wikipedia.org/w/index.php?title=Special%3ASearch&profile=advanced&search=intitle%3Atelecom&fulltext=Search&ns0=1&profile=advanced
I don't remember what I was working on when I ran into the problem, but I think this is similar. Even with "Falklands" in quotes, it returns items without that string in the title. I get similar results for "Thrushes". Thanks
I do a lot of reference improvement on Wikipedia. Our cite templates put the last name of an authors first, followed by a comma, and the given name (e.g., "Bloggs, Joe"). One very common task among people who improve references is to look for existing articles for reference authors so that authorlinks can be added. This was conveniently done for instance in Firefox by just highlighting the name and using the right click menu to search on Wikipedia as I have it setup to do. The old search was pretty amazing at finding the correct author article even when the search had the names backwards. The new search engine is hit or miss and overall I would say it is pretty terrible at it. For example a search for "Gettleman, Jeffrey" on the English Wikipedia completely misses the existing article "Jeffrey Gettleman". I'd estimate that the new search "completely misses" the obvious intended page about a quarter of the time when using this search format. About half the time the intended target is within about the top 5 results. And the remaining quarter of the time it actually finds the target. This is compared to the old search which is almost always spot on.
I don't know how the internals of the old search work, but I would venture a guess that when a search term like "X, Y" is given, it does a search for "X, Y" and "Y X". Perhaps that is missing in the new search. When I said above that the old search was pretty amazing, I meant it. Very often, it returns the intended result first even when initials are used for the first and/or middle name. It's also possible that it was taking into account redirects and stuff to figure out the intended target.
I've been using the new search (the beta implementation) for a while now. My overall impression is that except for the above issue, where the new search is inferior, I haven't noticed any significant change in quality of the returned results. They are about equally good and I would have trouble noticing which engine I was using if I had to guess.
Another example on the English Wikipedia I found right after my post above is "Georgiadis, Nicholas J." which does not list any author article in the results. Searching for "Nicholas J. Georgiadis" still does not find any author article. Finally if "Nicholas Georgiadis" is searched, it finds the article for "Nicholas Georgiadis". Clearly the search is not being as "fuzzy" as it needs to be.
I can explain the "Georgiadis, Nicholas J." issue - his page doesn't have a "J." in it at all. If you add a redirect from "Georgiadis, Nicholas J." to "Georgiadis, Nicholas" Cirrus will pick it up. Or something - so long as a J. ends up in the page. I tried the search with lsearch and it didn't find the "Georgiadis, Nicholas J." article at all either.
As for how the old search handles "X, Y" vs "Y X": 1. It searches for articles containing X and Y, unions the set together. 2. Of those, it runs down the positions of X and Y and if they appear close to each other and in the order they appear in the search query it pushes that match up in the ranking. 3. If they appear close together but not in the right order then it pushes them up, but not as far. (I think this is true, at least.)
Cirrus right now only does steps 1 and 2. I think I can replicate that last behavior in Cirrus which should help your searches.
There are other searches you could do in the mean time that'd pull the author up in the results but they aren't as quick to type. Stuff like <<Georgiadis, Nicholas hastemplate:"Template:Infobox person">>. It'd be useful for a tool but isn't fun to type.
This still seems to be an issue. The new search simply isn't good at finding articles when the given name and surname are reversed.
I agree. Its something that I spent some time working on but never got to finish.
Your current search engine is terrible. As an online writer for the past several years, I have used your photos to illustrate my writings, but it's impossible to find anything with this new search engine. The photos you used to have are all gone, and when you do a search for a specific subject, you get a bunch of manuscripts, not photos. I truly wish you'd return to the "thrilling days of yesteryear," when photos were photos, not manuscripts. Thanks for the opportunity to tell you how I feel. I'll be back when we have photos on Wikimedia again.
Can you give me an example of what used to work and doesn't now? I think you mean that searching commons used to find better photos.
For the next few months you can get the old behavior by searching and adding &srbackend=LuceneSearch to the end of the url. The parameter doesn't stick so you'd have to add it back after every search. It'd be super useful if you can provide an example of a search that works well with the parameter and sucks without it.
Why!!!
Have a look at http://magnusmanske.de/wordpress/?p=108 Magnus Manske wrote a js-code that shows you if there is a wikidata-article and added this info to Spezial:Search. Another positive feature is that it shows the label which is added in wikidata. For example it tolds that Ågestasjön is a lake in Sweden. I added the code to my global.js page and it is really helpful. And there is an other point. I have looked into the help pages of the new Cirrus search and the old search and i found lots of possibilities i didn´t knew. I think you should try and find ways that the search-box gives more information to its users what is possible. For example, Spezial:Search could be used for searching files on commons. My guess is, that less than 1 percent of the users knew that. If commons will be integrated into Wikidata (commons:Structured data), search will be getting even more central to users. For example, below full text search in the box there could be integrated a search field for commons search (and wikidata search, called: "media and data search ...").
This image: No_SVG.svg can't be found on the first 100 hits with the exact file name. The image is from 2014-05-27
Seems now working...
Shut off the crappy MediaWiki seach and use Google: http://www.google.de/search?q=site%3Acommons.wikimedia.org+filetype%3Asvg+file%3ANo_SVG.svg&oq=site%3Acommons.wikimedia.org+filetype%3Asvg++file%3ANo_SVG.svg
very good man i liked it. Perhelion:
lsearchd finds 19 hits for "wholly-owned", while CirrusSearch finds about 6,000 hits for "wholly-owned", almost all of them being "wholly owned", which is correct. How will mishyphenations be found and corrected?
Chris the speller: "wholly-owned"~0 should show this, but it seems it doesn't. This means that it doesn't have that exact combination in it's history, which might be due to the index not being complete yet.
TheDJ: I've checked the lsearchd results, and any results there are out of date and thus no longer shown in the up to date cirrus search or the hits are in urls (which are excluded from the cirrus indexing I vaguely remember).
There is documentation on the possibilities of Cirrus here: mw:Help:CirrusSearch
TheDJ: Thanks, but I am already familiar with the possibilities and the state of the index. I have used CirrusSearch extensively. What I should have asked is "How will mishyphenations be found and corrected, now that the developers are determined to foist a search engine upon us that is bereft of function?"
It seems that 'upvoting' results causes the exact match to drop below the fold. I have reported this as: bugzilla:70905
The real fix for this problem will come in response to bugzilla:70950, not 70905.
The page for enabling beta features directs me for information to https://www.mediawiki.org/wiki/Search. There I do not find a short, understandable description what the new search features are at present and what the difference to the old search is. Therefore, I do not bother to test this feature.
I took a stab at adding that: https://www.mediawiki.org/w/index.php?title=Search&diff=1180052&oldid=1173154
How should a beta tester know what may be a bug and what not, if there is no specification of what is to be expected?
The old search system didn't have a spec at all. Cirrus has grown one in the form of somewhat readable tests (http://git.wikimedia.org/tree/mediawiki%2Fextensions%2FCirrusSearch.git/master/tests%2Fbrowser%2Ffeatures) but if you know something used to work and it doesn't or if you think the results are worse then that is a good time to file a bug with examples.