I don't understand when a revision is defined as old.
Which revisions get deleted exactly? Maybe all but the current one? Or can "old" be defined somewhere as for example "30 days"?
With other words, all non-current revisions.
(also discussing a bit what delete means in this context)
I don't understand when a revision is defined as old.
Which revisions get deleted exactly? Maybe all but the current one? Or can "old" be defined somewhere as for example "30 days"?
The text states "to delete all old (non-current) revisions" so I'd say all but the latest revision of a page no matter at what time it was done.
Good. HOwever since you are a new MediaWiki user I am not sure why you need to reduce the size of the database. Personally I would only do it if I really have an issue.
yeah thanks, I think I simply confused "shrinking db" with "getting rid of the history of a page that is displayed for a page".
From what I saw it is only possible to delete history-entries of a page but than they still appear as greyed- and crossed-out. I thought that it might be possible to simply remove them entirely.
Not sure if a wiki is the best thing for you to choose. Having a version history is one of the core features of a wiki. Not having is is like cutting off arms and legs of the software I believe. However things could get philosophical discussing this further.
It might be good for beginners like me to add a link to Page information to inform readers where to find the page ID
Hi,
I'm a bit digging and our wiki is having pages with a huge number of revisions. But I don't want to remove all revisions (not needed to keep everything). What I would like is an option to keep a certain amount of revisions, given as a parameter f.e. 5. So when deleting revisions from the revision-table the number of revisions for a certain page should be taken into account. If a page has 5 or less revisions none will be removed. If a page has more than 5 revisions, all older revisions will be removed except the most recent 5. I've copied DeleteOldRevisions.php to DeleteOldRevisions_Keep.php and am working on modifying it, but it's a touch job so it seems.
I'm progressing: the query
mysql> select rev_id from revision where rev_page=5591 order by rev_id desc limit 5;
+--------+
| rev_id |
+--------+ | 37402 | | 37401 | | 37400 | | 37399 | | 37398 |
+--------+
5 rows in set (0.00 sec)
mysql> select rev_id from revision where rev_page=5592 order by rev_id desc limit 5;
+--------+ | rev_id |
+--------+ | 37295 | | 37294 | | 37293 |
+--------+
3 rows in set (0.00 sec)
rev_page 5591 has 27 revisions and rev_page 5592 has 3 revisions.
Now I was wondering what will happen if I undo the latest revision for page 5591 to revert it back to 37401. Fortunately this gives me a new revision 37406, which gives me the clue that I can use above query to clean up everything except for the latest 5 revisions.
After some testing it's finished:
Code:
<?php
/**
* Delete old revisions from the database and keep the latest 'N' revisions (default 10)
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License along
* with this program; if not, write to the Free Software Foundation, Inc.,
* 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
* http://www.gnu.org/copyleft/gpl.html
*
* @file
* @ingroup Maintenance
* @author Dick Pluim <dick.pluim@gmail.com>
* (Based on deleteOldRevisions.php by Rob Church)
*/
require_once __DIR__ . '/Maintenance.php';
/**
* Maintenance script that deletes old revisions from the database and keep the latest 'N' revisions (default 10).
*
* @ingroup Maintenance
*/
class DeleteOldRevisions extends Maintenance {
public function __construct() {
parent::__construct();
$this->addDescription( 'Delete old revisions from the database and keep the latest N revisions (default 10)' );
$this->addOption( 'delete', 'Actually perform the deletion' );
$this->addOption( 'page_id', 'List of page ids to work on', false );
}
public function execute() {
$this->output( "Delete old revisions\n\n" );
$this->doDelete( $this->hasOption( 'delete' ), $this->mArgs );
}
function doDelete( $delete = false, $args = [] ) {
# Data should come off the master, wrapped in a transaction
$dbw = $this->getDB( DB_MASTER );
$this->beginTransaction( $dbw, __METHOD__ );
$revConds = "";
$keepRevs = [];
$keepLimit = 10; # default
# If a parameter is given, we assume that this is the number of revisions to keep.
# only first argument is being used.
if ( count( $args ) > 0 ) {
$keepLimit=$args[0];
$this->output( "Keeping " . $keepLimit . " revisions\n" );
}
# make the pagelist
$res = $dbw->select( 'page', 'page_id', 'page_id>0', array( 'ORDER BY' => 'page_id ASC' ));
foreach ( $res as $row ) {
$revConds = "rev_page = $row->page_id order by rev_id desc limit $keepLimit" ;
# make the list of revisions we want to keep for this page
$res2 = $dbw->select ( 'revision', 'rev_id' , $revConds, __METHOD__);
foreach ( $res2 as $row2 ) {
$keepRevs[] = $row2->rev_id ;
}
}
# Make the list of revisions which will be deleted
$revConds = 'rev_id NOT IN (' . $dbw->makeList( $keepRevs ) . ')';
$res = $dbw->select( 'revision', 'rev_id', $revConds, __METHOD__ );
$oldRevs = [];
foreach ( $res as $row ) {
$oldRevs[] = $row->rev_id;
}
$this->output( "done.\n" );
# Inform the user of what we're going to do
$count = count( $oldRevs );
$this->output( "$count old revisions found.\n" );
# Delete as appropriate
if ( $delete && $count>0 ) {
$this->output( "Deleting..." );
$dbw->delete( 'revision', [ 'rev_id' => $oldRevs ], __METHOD__ );
$this->output( "done.\n" );
}
# This bit's done
# Purge redundant text records
$this->commitTransaction( $dbw, __METHOD__ );
if ( $delete ) {
$this->purgeRedundantText( true );
}
}
}
$maintClass = "DeleteOldRevisions";
require_once RUN_MAINTENANCE_IF_MAIN;
--------------
Output:
[root@server maintenance]# php deleteOldRevisions_keep.php
Delete old revisions
Keeping 10 revisions
PHP Notice: Array to string conversion in /u01/mediawiki/tst/includes/db/Database.php on line 808
done.
2534 old revisions found.
[root@server maintenance]# php deleteOldRevisions_keep.php 5
Delete old revisions
Keeping 5 revisions
PHP Notice: Array to string conversion in /u01/mediawiki/tst/includes/db/Database.php on line 808
done.
6103 old revisions found.
[root@server maintenance]# php deleteOldRevisions_keep.php --delete 15
Delete old revisions
Keeping 15 revisions
PHP Notice: Array to string conversion in /u01/mediawiki/tst/includes/db/Database.php on line 808
done.
2026 old revisions found.
Deleting...done.
Searching for active text records in revisions table...done.
Searching for active text records in archive table...done.
Searching for inactive text records...done.
2024 inactive items found.
Deleting...done.
Tested with first 50 and then going slightly further down... ;-)
Can't figure out why I get the PHP Notice above. And there is sometimes a mismatch between old revisions found and inactive items found, but it's working in my test-environment.
Running it a second time:
[root@server maintenance]# php deleteOldRevisions_keep.php --delete 15
Delete old revisions
Keeping 15 revisions
PHP Notice: Array to string conversion in /u01/mediawiki/tst/includes/db/Database.php on line 808
done.
0 old revisions found.
Searching for active text records in revisions table...done.
Searching for active text records in archive table...done.
Searching for inactive text records...done.
0 inactive items found.
Hi Dick,
your option is a great addition to the script! It would be great, if you could create an issue in phabricator and put it into review so that it can be added to the MediaWiki tarball so that everyone can benefit from it!
Hi,
I'm also trying to get a good compromise between a radical removal of history and storing lots of useless information. But what would be the best according to me, would be to be able to remove all the old "minor edits" in the history. Unfortunately, my coding skills are not sufficient for that... If someone has an idea...
Thanks
An option to only remove all edits, which are marked as "minor" does not exist currently. Integrating such an option will cause problems:
First of all, it will break things like the calculation of size differences between revisions, if the referenced revision suddenly no longer is there. While this only is a technical issue, which maybe can be solved, there is another, way bigger problem:
An option to only delete minor edits will remove some edits from the history, but not others. Features like the history function of MediaWiki rely on the fact that all revisions stay in place. They compare revisions with each other and display the difference. However, if a revision in between has been removed, then the difference will also include the changes made in that removed revision. That means that changes will be attributed to a user, although it is not clear whether it was really him, who made them.
This is a very bad situation, which might even cause legal trouble, e.g. if part of an edit contains insults and with the according revision removed it looks like these insults come from user A, while they in fact have been added in a removed revision by user B.