Jump to content

Talk:Quarry

About this board

Previous discussion was archived at Talk:Quarry/Archive 1 on 2015-04-17. Discussion area for discussion about Quarry itself and help about individual queries.

OperationalError('table resultsets already exists')

2
Enhancing999 (talkcontribs)

Any idea why that happens? I get it after long running queries. Sample: quarry:query/86096. If I happen to have the window open, I sometimes see the actual results before, meaning the query was successful. Usually I forget to export it before it disappears.

Framawiki (talkcontribs)
Reply to "OperationalError('table resultsets already exists')"

What's more efficient: AND NOT, not in, <>

2
Enhancing999 (talkcontribs)

Which is more efficient?

  • AND NOT ( lt_title = "ABC" ) AND NOT ( lt_title = "XYZ")
  • AND NOT in ( "ABC", "XYZ")
  • AND lt_title <> "ABC" AND lt_title <> "XYZ"

Agree that none is ideal.

Matěj Suchánek (talkcontribs)

As the first thing, the query engine builds a query plan, then executes the query according to the plan. You can check if the query plan is always the same (e.g., using Toolforge SQL Optimizer). If it is, there is no difference.

I prefer NOT IN.

Reply to "What's more efficient: AND NOT, not in, <>"
Enhancing999 (talkcontribs)

How to specify tables from two different databases (wikidatawiki_p and commonswiki_p)?

Enhancing999 (talkcontribs)

I tried

  • SELECT * FROM `commonswiki_p`.`pages` LIMIT 1
  • USE DATABASE commonswiki_p
  • USE commonswiki_p;

to override what's specified in the GUI.

Matěj Suchánek (talkcontribs)
TheDJ (talkcontribs)

They are completely separate DB servers, you cannot make queries across multiple servers in the same query.

Enhancing999 (talkcontribs)

Apparently it was possible (see sample in the topic referenced by Matej) but then un-featured.

I found an easier solution, as the gap between Wikidata and Commons is only partial: one table at Commons is updated ( wbc_entity_usage), but not the other (page_props): quarry:query/86040

TheDJ (talkcontribs)

"Apparently it was possible " Yes, until the infrastructure ran into scaling problems.

Enhancing999 (talkcontribs)

Are there any measure in place to keep the databases in sync? The gap mentioned above is minor in percentages (maybe 0.1%), but in absolute numbers 4600 is a lot.

Reply to "Using 2 databases"

combining SQL and SPARQL query

2
Jarekt (talkcontribs)
Enhancing999 (talkcontribs)
Reply to "combining SQL and SPARQL query"
Summary by GTrang

No more replag

GTrang (talkcontribs)

The enwiki database has been on replag for an entire week now. It should hopefully be fixed in the next week or so.

Quarry / SQL query - how to fix OperationalError('table resultsets already exists')?

4
Gluo88 (talkcontribs)

I was able to complete the above SQL query as seen in https://quarry.wmcloud.org/history/84807/911494/884555.  

However, when I run the same query again at https://quarry.wmcloud.org/query/84807 .   I got the error "OperationalError('table resultsets already exists')".           How should I fix the error?

I guess that resultsets may be a temporary table that was produced from running my previous  SQL query.   I tried to execute the following:

DROP TABLE IF EXISTS resultsets;

However,  I got the following information: “Access denied; you need (at least one of) the SUPER, READ_ONLY ADMIN privilege(s) for this operation”

May anyone help me on this issue?

Thanks.

Gluo88 (talkcontribs)

I just found that the above query is complete now. The issue looks to be automatically solved, at least for now, although I still don't know the reason of the "OperationalError('table resultsets already exists').


However, my query https://quarry.wmcloud.org/query/84817 has just failed still with the same reason of the "OperationalError('table resultsets already exists').

May anyone know how to handle "OperationalError('table resultsets already exists')?

Thanks.

Gluo88 (talkcontribs)

The issue looks to be automatically solved again. Did someone help in background?

Thanks a lot.

This post was hidden by Gluo88 (history)
Reply to "Quarry / SQL query - how to fix OperationalError('table resultsets already exists')?"
Samwilson (talkcontribs)

Is there a delay between a tool database being created and it being available in Quarry? It looks like s55926__wishlist_p can not be queried (it was only created today): https://quarry.wmcloud.org/query/11263

TheDJ (talkcontribs)

As far as I know, tools databases are not public, and so not in quarry.

Samwilson (talkcontribs)

@TheDJ: That used to be the case, but I'd thought that recently (phab:T151158) it had become possible. It's only databases with names ending in _p.

BDavis (WMF) (talkcontribs)
Samwilson (talkcontribs)

Ah that's good to know, thanks! I'll be patient. :-)

Reply to "Tool databases"

Quarry / SQL optimization - using DB indexes?

3
Fl.schmitt (talkcontribs)

Hi, I'm trying to optimize a simple SQL query for pages on commonswiki (table page) which start with a certain string (e.g. "SELECT * FROM page WHERE page_title LIKE "Building%" ORDER by page_title;"- see also https://quarry.wmcloud.org/query/83277 for an example with a smaller result set). The SQL Optimizer on Toolforge tells me that this query would use filesort instead of indexes which causes performance issues ("Query plan 1.1 is using filesort or a temporary table. This is usually an indication of an inefficient query. If you find your query is slow, try taking advantage of available indexes to avoid filesort."). The DB schema documentation tells me that there should be an index defined (key: page_name_title) on page_title and page_namespace columns. The MySQL docs tell me that i could use USE INDEX (page_name_title) to enforce using that index. If I add that clause to my query (SELECT * FROM page USE INDEX (page_name_title) WHERE page_title LIKE "Building%" ORDER by page_title;), the SQL optimizer complains about a "Query error: Key 'page_name_title' doesn't exist in table 'page'". At the Commons village pump, I've learned that the replicas may lack the indices. So, I'm not sure if there's a way to optimize such a query on Quarry. I would prefer using Quarry instead of the toolforge CLI because Quarry allows for linking queries and results.

Matěj Suchánek (talkcontribs)

This query cannot use the index since you don't have a condition on page_namespace.

Fl.schmitt (talkcontribs)

Aww - ok :-) - yes, with such a condition, it works like a charm - thanks a lot!

Reply to "Quarry / SQL optimization - using DB indexes?"
SoySauceOnRice (talkcontribs)

Hi, I am doing some research on sockpuppets. I'm trying to use the SQL database to assist in building my dataset, but queries seem to take a long time.

For instance, this previous query ran in 196 seconds:

https://quarry.wmcloud.org/query/61732

Whereas my identical query has been going for over 9 hours:

https://quarry.wmcloud.org/query/83588

A previous instance was running over a week before I restarted it.

Am I doing something wrong, or are these running times typical?

On another note, I have been looking for complete SQL dumps to run my own instance so that I don't cause issues with excessive queries, however I can only find complete xml dumps (https://dumps.wikimedia.org/enwiki/20240601/).

Is there any way to get a full SQL dump (complete with article revision history)?

Thank you for your help.

Matěj Suchánek (talkcontribs)

Using Toolforge CLI, the query returns 187814 rows in 31.187 sec. Maybe that's too much for Quarry.

Reply to "Long Query Time"
Plastikspork (talkcontribs)
Reply to "Queued queries"