not invented here

Some neat libraries that have happened recently; not by me.


I found CSVScan on the RAA only a few days after I wrote my own fast CSV parser. CSVScan itself had been released only a few days before. It’s Ragel-based, and implements the most common CSV “spec”, unlike my Ccsv, which only supports a constrained format.

Ccsv is faster, but not enough to care about. Here’s a benchmark yielding 1,000,000 rows from a file:

user system total
Ccsv 1.780000 0.030000 1.810000
CSVScan 2.280000 0.050000 2.330000
LightCsv 11.680000 0.110000 11.790000
FasterCSV 35.340000 0.200000 35.540000
CSV 115.050000 0.440000 115.490000

I’m surprised it took until Sep 2007 for someone to write a C CSV parser.

Ccsv development is halted; if I had known about CSVScan, I wouldn’t have written it. It would be nice, though, if CSVScan had a foreach method and a gem version.

If you’re learning, Ccsv still makes an excellent example of a plain C extension. CSVScan makes an excellent example of a Ragel extension.


Ara wrote a leak detector called Dike, which bears investigation. Using the object finalizer is a good idea. Probably the ideal leak detector will be lifecycle-based (sort of like Dike) instead of snapshot-based (like BleakHouse), but C-implemented (like BleakHouse) so that we can guarantee we aren’t introducing leaks in the attempt to track them, and so that it’s fast enough to use in live production environments.

If we do lifecycle tracking we should be able to identify the exact line of app code that spawns the leaks.

ruby east

I was at the Ruby East conference today. It was good. Mike Mangino gave an excellent talk on mocking. Also, I met Gregory Brown in real life. Good guy; hair longer than mine.


Despite Mike’s talk, I’ve added an integration suite to Ultrasphinx. No mocks here. I actually removed some of the unit tests; for a local service consumer like Ultrasphinx, the integration suite is definitely the way to go.

Now I can approach full coverage in the plugin itself instead of relying on CHOW as the integration test—that was definitely not optimal. Also I can spawn mongrels in setup and kill them in teardown, which lets me test awkward situations like development environment class reloading.

Mark Lane contributed the sample app, so it’s still NIH even though I wrote the helper and initial tests. Good times.

bleak_house 3 tells you your leaks

BleakHouse 3:

  58%: core rails (783 births, 1241 deaths, ratio -0.23, impact 1.66)
  66%: recipe/new/GET (15182 births, 14593 deaths, ratio 0.02, impact 1.77)
  75%: core rails (766 births, 1168 deaths, ratio -0.21, impact 1.60)
  83%: recipe/list/GET (16423 births, 15991 deaths, ratio 0.01, impact 1.64)

65992 births, 66458 deaths.

Tags sorted by immortal leaks:
  recipe/show/GET leaks, averaged over 4 requests:
    5599 String
    80 Array
    2 Regexp
    2 MatchData
    2 Hash
    1 Symbol
  core rails leaks, averaged over 4 requests:
    238 String
    10 Array

Tags sorted by impact * ratio:
   0.0739: recipe/show/GET
   0.0350: recipe/new/GET
   0.0218: recipe/list/GET
  -0.6686: core rails

That’s a Symbol up there; the new BleakHouse walks the sym_tbl as well as the regular heap. We now track the history of every individual object instead of just class counts. This means we can accurately (fingers crossed) identify where lingering objects were spawned.

On the flipside, analyzing the log file is slow (a decent-sized logfile will have hundreds of millions of rows). I wrote a pure-C CSV parser, which helps, and there’s always the “better hardware” answer. I’ve been mainly running it on my Mac Mini; if I use the Opteron 2210 it goes much faster, since the analyzer is CPU-bound.

It doesn’t make pretty graphs anymore but I’m not sure exactly how they would help. It would be easy enough to add them back.

go go go

A gem, not a plugin, because it needs to compile a C extension. First, uninstall the old versions to prevent version problems:

sudo gem uninstall bleak_house -a -i -x


sudo gem install bleak_house

Also, you need to rebuild your ruby-bleak-house binary, even if you already have one. Just run:

bleak --build

The RDoc has updated usage instructions.

related in spirit

In the interests of there being less business all up in here, I have created a crazy blog system.

Related in spirit to: e, Hobix, Blosxom, Yurt.

The HTML is the framework. The cache and the store are one. Lines of code, 150ish, maybe less, all procedural. Managed via the filesystem. Also there is some kind of read-only API going on, for free.

Ok, so now a ridiculous benchmark. On the server, a dynamic request from Typo, complete with MySQL climbing painfully out of swap:

$ time curl --head localhost:4001
real    0m11.270s

Once more:

$ time curl --head localhost:4001
real    0m2.825s

Now with page caching:

$ time curl --head localhost:4001
real    0m0.015s

But, how about a dynamic request from the all-new Bax?

$ time curl --head localhost:4040
real    0m0.017s

Yep. Now to get that feed to validate.

svn branching best practices (in practice)

Notice: this article is extremely out of date. If you want to learn modern Subversion best practices, please look elsewhere.

You want to make a Subversion branch, and merge it later. You read the branching section in the official book, but are still confused. What to do?

creating the branch

1. Note the current head revision:

svn info svn:// | grep Revision

2. Make a clean, remote copy of trunk into the branches folder. Name it something. We’ll call it your_branch:

svn cp svn:// \
  svn:// \
  -m "Branching from trunk to your_branch at HEAD_REVISION"

Replace HEAD_REVISION with the revision number you noted in step 1.

Note that a backslash (\) means that the command continues onto the next line.

3. switch your local checkout to point to the new branch (this will not overwrite your changes):

svn switch --relocate \
  svn:// \

You don’t really need the --relocate svn:// bit, but I’m in the habit of being explicit about it.

4. Check that your local checkout is definitely now your_branch, and that you can update ok:

svn info | grep URL
svn up

5. commit your new changes.

(These steps will work even if you had already made local changes on trunk, but decided you wanted them on your_branch instead. If your trunk checkout was unmodified, just skip step 5.)

updating the branch

You’ve been developing for a while on your_branch, and so have other people on trunk, and now you have to add their changes to your_branch.

1. First, update your branch checkout and commit any outstanding changes.

2. Search the Subversion log to see at what revision number you last merged the changes (or when the original branch was made, if you’ve never merged). This is critical for making a successful merge:

svn log --limit 500 | grep -B 3 your_branch

3. Also note the current head revision:

svn info svn:// | grep Revision

4. Merge the difference of the last merged revision on trunk and the head revision on trunk into the your_branch working copy:

  svn:// .

Replace LAST_MERGED_REVISION with the revision number you noted in step 2, and HEAD_REVISION with the revision number you noted in step 3.

Now look for errors in the output. Could all files be found? Did things get deleted that shouldn’t have been? Maybe you did it wrong. If you need to revert, run svn revert -R *.

5. Otherwise, if things seem ok, check for conflicts:

svn status | egrep '^C|^.C'

Resolve any conflicts. Make sure the application starts and the tests pass.
6. commit your merge.

svn ci -m "Merged changes from trunk to your_branch: COMMAND"

Replace COMMAND with the exact command contents from step 4.

folding the branch back into trunk

Hey, your_branch is done! Now it has to become trunk, so everyone will use it and see how awesome it is.

This only happens once per branch.

1. First, follow every step in the previous section (“updating the branch”) so that your_branch is in sync with any recent changes on trunk.

2. Delete trunk completely:

svn del svn://

3. Move your_branch onto the old trunk location:

svn mv svn:// \

4. Relocate your working copy back to trunk:

svn switch --relocate \
  svn:// \

All done.


Subversion 1.5 is scheduled to bring automatic merge tracking (notice the ticket comment that says “tip of the iceberg”). Until that fine day, if you want to automate this, the tool is supposed to be pretty nice.

desensitize your mac

Adobe CS3 gives you this pleasant and fatal alert if you try to install it on a case-sensitive Mac filesystem:


1. Get a firewire hard drive the same size (or larger) as your boot drive. I do a full weekly backup so I already had such a drive.

2. Download and install Carbon Copy Cloner 3 Beta.

3. Start Carbon Copy Cloner and make a fully copy of your boot drive to your backup. Make sure you choose both “copy everything from source to target” and “erase the target volume”. This will give you a bootable backup.

4. Restart and hold down the Option key. Choose your backup drive and press enter.

5. Start the Disk Utility application and erase your regular boot drive. Make sure the volume format is set to “Mac OS Extended (Journaled)”.

6. Start Carbon Copy Cloner again. Make a full copy of your backup drive to your regular boot drive. This time, don’t choose “erase the target volume”, or it will reformat it to be case-sensitive all over again.

7. Now we need to patch up some things. Start a terminal and run:

cd "/Volumes/Your Boot Drive/"
sudo cp -R /usr .
sudo cp -R /private .
sudo cp -R /sbin .
sudo ln -s private/tmp tmp
sudo ln -s private/var var
sudo bless --folder . --bootinfo --bootefi

8. Reboot. Hold down Option again, but this time choose the regular boot drive.

9. Download Applejack. Install it.

10. Reboot and hold down Command-S. At the single-user prompt, type:

applejack auto restart

Let everything finish. Now you’re clear.


I was surprised to see that Fauna has been near the top of the RubyForge most-active list recently:

Based on a clue from Tom Copeland, I pieced together the formula for project activity. For any given week:

activity =
  log(0.3 * number of file downloads) +
  log(number of repository commits) +
  log(3 * number of bugs filed) +
  log(3 * number of forum messages posted) +
  log(4 * number of ended tasks) +
  log(5 * number of files released) +
  log(5 * number of support requests made) +
  log(10 * number of patches filed)

This is then converted into a percentile of all active projects.

It’s interesting that a project can have significant activity even if no developer is working on it.

attack of the beasts

Fauna is taking over.

what’s new

The Allison and has_many_polymorphs projects on Rubyforge are now part of the Fauna project. The new repository urls are:

   svn://    svn://

Their forums have also moved. Eventually the old contents will follow.

Finally, the IRC channel has moved to #fauna on, since it’s not just about has_many_polymorphs any more.

getting help

Every Snax project now has complete RDoc documentation. The code page has up-to-date links.

The best place to make a bug report is usually the IRC channel. A post on the appropriate forum is also good.