User:Dantman/Abstract query language/old
Appearance
- Return a list of user objects
- Query from the list of revisions
- ?page is an unassigned variable that must be externally defined, expected to be a page, ie Title (the mapping of Revision:page to a page/Title is built into MediaWiki).
- The page of a revision is defined by ?page
- ?rev:user is ?u, so the User that is returned is extracted from there
- Order by the largest timestamp of the revisions for the distinct user.
DISTINCT { User ?u }
FROM { Revision ?rev }
WHERE { ?rev:page IS ?page, ?rev:user IS ?u }
ORDER { MAX(?rev:timestamp) }
Taking into account the database structure which would be described inside MediaWiki the query would translate about into this sql currently.
SELECT user.*, rev_user_text AS user_name, MAX(rev_timestamp) AS timestamp
FROM revision LEFT JOIN user ON (rev_user = user_id)
WHERE (rev_page = 1)
GROUP BY rev_user, rev_user_text
ORDER BY timestamp DESC
The query however when looped over would return objects where ->u would contain a User instance.
The key difference of course is:
- If the database structure changes, the generated sql will be different, the abstract query will not need to be updated to fit the new db structure as SQL would.
- This flexibility would actually flex out even to extreme database changes. That query could still be functional even if we dropped the entire user table, the rev_user_text field, and replaced them with a usermap table that maps anon ips and registered users to ids.
- Because the query is written more abstractly individual db engines could theoretically set rules for certain types of queries to optimize them for that specific DB engine.
- We should theoretically be able to keep our top MySQL performance.
- We might also get better performance in other dbs like pgsql.
PHP Api
[edit]There's a little undecided on the php api, there are two possibilities:
// Abstract query
$q->query("
DISTINCT { User ?u }
FROM { Revision ?rev }
WHERE { ?rev:page IS ?page, ?rev:user IS ?u }
ORDER { MAX(?rev:timestamp) }",
array( 'page' => $this->getTitle() ) );
// Abstract query (php syntaxed?)
$q->query(array(
"distinct" => "User ?u",
"from" => "Revision ?rev"
"where" => array( "?rev:page" => "?page", "?rev:user" => "?u" ),
"order" => "MAX(?rev:timestamp)" ),
array( 'page' => $this->getTitle() ) );
The advantage of the PHP syntax is that for a portion of the syntax mistakes php will throw a syntax error, however the abstract string dsl is more flexible for things like != and other comparisons.
We're beginning to target HipHop as a primary target, and we have the option in cases where performance is key to use a php extension, so performance wise a string based dsl shouldn't be a prooblem.