how to make a changeset-preserving svn mirror

I have a private svn repository, as well as a Trac instance, that I use for almost everything. I also have a bunch of Rubyforge projects with their own repositories (Polymorphs, Allison, Fauna). Since the Trac changeset browser is so nice, I wanted to be able to use it to browse my Rubyforge projects with changesets and commit messages intact.

install dependencies

Make sure you have Perl (quiet, you in the back), and then set up CPAN:

sudo apt-get install perl
sudo cpan

CPAN will ask you a billion install questions. Answer them sanely. Then when you are at the CPAN prompt, you can install SVN::Mirror:

force install SVN::Mirror

CPAN is not done with the questioning; it’s very social. It may ask you things like “Which directory for UUID store? [/tmp]” or “Ah, I see you already have installed libnet before; update your configuration?”. Also notice that it runs all the tests on install. Eventually it will finish. Type exit.

write the mirror script

Hold your breath; we need to write the updater script that we can call periodically with cron:

mirror_svn
#!/usr/bin/env perl

use strict;
use warnings;
use SVN::Mirror;

my $url = "svn://rubyforge.org/var/svn";
my $repository = "/svn";
my @projects = ("fauna", "allison", "polymorphs");

foreach (@projects) {
  my %opts = (source => "$url/$_",
    target => $repository,
    target_path => "/rubyforge/$_");
  $opts{'skip_to'} = 1 if ($ARGV[0] && $ARGV[0] eq "--init");
  my $mirror = SVN::Mirror->new(%opts);
  $mirror->init;
  $mirror->run;
}

I made you guys use $_. Just because.

Note that $repository refers to the real repository on your server, not a local checkout. The script must be run on the server that houses the actual repository.

initial import

Rubyforge has a tendency to drop the connection, and we want to make sure the initial import finishes ok because it could be lengthy and a race condition could result if two cron updates overlap. Set the executable bit on the script, then run it by hand:

chmod u+x mirror_svn
./mirror_svn --init

Ignore the “Network connection closed unexpectedly” errors. Keep repeating until it finishes cleanly.

schedule a cronjob

Now we can run every two minutes to import any new changesets. Add the following cronjob:

*/2 * * * * /path/to/mirror_svn &> /tmp/mirror_svn.log

Sit back and the changes will roll in.

troubleshooting

You may get an error like so:

Waiting for sync lock on /rubyforge/fauna: server:21338.

This means the mirror script was forcefully killed (probably by you, playing fast and loose with CTRL-C, jerkface). To fix it, temporarily disable the cronjob, kill any running mirror_svn processes, and then find your Mirror.pm file at /usr/local/share/perl/5.8.7/SVN/Mirror.pm or similar, and change line 518 to:

        while (0) {

This will ignore the lock. Now run the script once by hand, then change the file back, and re-enable the cronjob. A little icky, but it works.

Also, if you delete a mirror, make sure you remove any out-of-date svm:mirror properties from the repository root. That’s not a typo; svm:mirror is the correct name.

It would be nice to be able to make commits to the local repository and push the changes back out to the remote. I spent some time trying with svm mergeback and also SVN::Pusher. Nothing even partially worked, and every attempt broke my mirror setup.

trac considerations

Note there is a rare chance that Trac will not display your changesets after a certain point if they get added in reverse-temporal order. This can happen in the following scenario:

  • mirror update runs
  • someone commits to the remote repository
  • someone commits right away to the local repository
  • mirror update runs
  • mirror commits the remote changes into the local repository

The best solution to this is to just run the mirror update very frequently. Either way, your data is always safe. It’s just a Trac display issue.

conclusion

Why not some other tool? I couldn’t use svnsync because it requires a separate brand-new repository for every mirror source, and Trac doesn’t support external or multi-repository browsing. SVK is too much to learn and too invasive of my normal workflow. And Piston doesn’t preserve changesets.

Go Perl.

4 responses

  1. This is vaguely relevant to your discussion, just thought it might be helpful. If you want to run a local repository, and keep it synced with an external repository, while being able to synchronize them, then you can just do this with SVK:

    svk mkdir //local
    svk mkdir //mirror
    svk mirror REMOTE_PATH DEPOT_PATH
    svk sync //mirror/project_name
    svk cp //mirror/project_name //local/project_name
    svk co ~/work/project_name
    

    In case you’re not sure what the DEPOT_PATH is, that’s where svk stores your repository, you probably want to mirror to //mirror/project_name

    Then edit your files, svk ci will commit to the local repository, svk pull will grab any changes from the remote repository, and svk push to push any local changes to the remote repository. It doesn’t offer the convenience of ‘proper’ distributed SCMs, but it’s great for hacking on your laptop, and being able to commit without internet access.

  2. A.S., I’m trying to make this work, but I’m having trouble:

    chloe:/tmp/svk eweaver$ svk mkdir local
    path /tmp/svk/local is not a checkout path.
    

    I see that there is something special to the ”//path” syntax, but don’t understand what it is or where SVK intends to secretly store my repositories.

    I would like to add some SVK-enabled subdirectories in my existing svn repository, as with the mirror above, so I could push and pull changesets through them to the remote repository. Is this possible with SVK?

  3. SVK works from a single local subversion repository, living at ~/svk/local by default (I believe). This is your “depot”. The // is how you refer to something in your “depot path”. How you manage it is up to you. I find that having //local and //mirror to be useful.

    Let’s say that I’ve done svk mkdir //local, I now have a directory called local in my depotpath. I could checkout from it, but it’s empty. I want to import a project I’ve started on, so svk import ~/work/superproject //local/superproject.

    This has just made a superproject directory in my repository, and imported the files. I can then check it out with svk co //local/superproject to get a working copy.

    I hope this helps explain the fundamentals about the SVK way of thinking. Take a look at svk help intro, and svk help <command> for more information. I don’t think that SVK enabled subdirectories in an existing repository is going to work, but I’m really no expert. I just know enough about it work with SVK for my needs (which are minimal, I don’t even deal with edit conflicts and so on, it’s just handy for disconnected operating and simple svn interoperability).

  4. Thanks for the walkthrough. I’m not sure if svk would help me or not. Offline check-in isn’t an issue because I have a bluetooth phone with a data account. I’d like to make an svk mirror on my server, and externalize that in my checkout so that I could push temporary commits and then choose when to move them upstream to Rubyforge, or to my master repository, for that matter.

    But that’s pretty complexified, and it seems that svk doesn’t even work with a non-local depot.