snax

how to make a changeset-preserving svn mirror

I have a private svn repository, as well as a Trac instance, that I use for almost everything. I also have a bunch of Rubyforge projects with their own repositories (Polymorphs, Allison, Fauna). Since the Trac changeset browser is so nice, I wanted to be able to use it to browse my Rubyforge projects with changesets and commit messages intact.

install dependencies

Make sure you have Perl (quiet, you in the back), and then set up CPAN:

sudo apt-get install perl
sudo cpan

CPAN will ask you a billion install questions. Answer them sanely. Then when you are at the CPAN prompt, you can install SVN::Mirror:

force install SVN::Mirror

CPAN is not done with the questioning; it’s very social. It may ask you things like “Which directory for UUID store? [/tmp]” or “Ah, I see you already have installed libnet before; update your configuration?”. Also notice that it runs all the tests on install. Eventually it will finish. Type exit.

write the mirror script

Hold your breath; we need to write the updater script that we can call periodically with cron:

mirror_svn
#!/usr/bin/env perl

use strict;
use warnings;
use SVN::Mirror;

my $url = "svn://rubyforge.org/var/svn";
my $repository = "/svn";
my @projects = ("fauna", "allison", "polymorphs");

foreach (@projects) {
  my %opts = (source => "$url/$_",
    target => $repository,
    target_path => "/rubyforge/$_");
  $opts{'skip_to'} = 1 if ($ARGV[0] && $ARGV[0] eq "--init");
  my $mirror = SVN::Mirror->new(%opts);
  $mirror->init;
  $mirror->run;
}

I made you guys use $_. Just because.

Note that $repository refers to the real repository on your server, not a local checkout. The script must be run on the server that houses the actual repository.

initial import

Rubyforge has a tendency to drop the connection, and we want to make sure the initial import finishes ok because it could be lengthy and a race condition could result if two cron updates overlap. Set the executable bit on the script, then run it by hand:

chmod u+x mirror_svn
./mirror_svn --init

Ignore the “Network connection closed unexpectedly” errors. Keep repeating until it finishes cleanly.

schedule a cronjob

Now we can run every two minutes to import any new changesets. Add the following cronjob:

*/2 * * * * /path/to/mirror_svn &> /tmp/mirror_svn.log

Sit back and the changes will roll in.

troubleshooting

You may get an error like so:

Waiting for sync lock on /rubyforge/fauna: server:21338.

This means the mirror script was forcefully killed (probably by you, playing fast and loose with CTRL-C, jerkface). To fix it, temporarily disable the cronjob, kill any running mirror_svn processes, and then find your Mirror.pm file at /usr/local/share/perl/5.8.7/SVN/Mirror.pm or similar, and change line 518 to:

        while (0) {

This will ignore the lock. Now run the script once by hand, then change the file back, and re-enable the cronjob. A little icky, but it works.

Also, if you delete a mirror, make sure you remove any out-of-date svm:mirror properties from the repository root. That’s not a typo; svm:mirror is the correct name.

It would be nice to be able to make commits to the local repository and push the changes back out to the remote. I spent some time trying with svm mergeback and also SVN::Pusher. Nothing even partially worked, and every attempt broke my mirror setup.

trac considerations

Note there is a rare chance that Trac will not display your changesets after a certain point if they get added in reverse-temporal order. This can happen in the following scenario:

The best solution to this is to just run the mirror update very frequently. Either way, your data is always safe. It’s just a Trac display issue.

conclusion

Why not some other tool? I couldn’t use svnsync because it requires a separate brand-new repository for every mirror source, and Trac doesn’t support external or multi-repository browsing. SVK is too much to learn and too invasive of my normal workflow. And Piston doesn’t preserve changesets.

Go Perl.