ultimate architecture


  • every action loads only a few hyper-denormalized objects (ideally, one)
  • proxy chooses app server by predicting memcached object location
  • total memcached size is bigger than persistence store
  • write to memcached is synchronous with request; write-through to store is asynchronous
  • can use an MQ to avoid race writes (almost never important)


  • infinite scale-out


developing a facebook app locally

In which we host a Facebook/Rails app on our local machine and avoid the deploy/test cycle.


Make a file config/facebook.yml to hold your settings:

      key: 08192fc4c5916a75f3014c130ab241073
      secret: 62c551acccde773e011456957aab0f6f
    port: 10000
      key: ""
      secret: ""
    port: 10001
  host: yourapp.yourserver.com

No, those aren’t my real keys. But you get the idea.

Now, add to config/environment.rb:

FACEBOOK_CONFIG = YAML.load_file("#{RAILS_ROOT}/config/facebook.yml")[RAILS_ENV]

This lets individual developers override any global setting by adding the key under their username.

Now, install this rake task in lib/tasks/tunnel.rake:

namespace "tunnel" do
  desc "Start a reverse tunnel from FACEBOOK_CONFIG['host'] to localhost"
  task "start" => "environment" do
    puts "Tunneling #{FACEBOOK_CONFIG['host']}:#{FACEBOOK_CONFIG['port']} to"
    exec "ssh -nNT -g -R *:#{FACEBOOK_CONFIG['port']}: #{FACEBOOK_CONFIG['host']}"

  desc "Check if reverse tunnel is running"
  task "status" => "environment" do
    if `ssh #{FACEBOOK_CONFIG['host']} netstat -an |
        egrep "tcp.*:#{FACEBOOK_CONFIG['port']}.*LISTEN" | wc`.to_i > 0
      puts "Seems ok"
      puts "Down"

The rake task opens an SSH connection to our server and forwards a port on it to our local box. This will work no matter what kind of NAT or proxy you are behind, as long as you can access your server via SSH.

server setup

Your server’s /etc/ssh/sshd_config file must contain the following line:

GatewayPorts clientspecified

facebook setup

Have each developer setup a Facebook app for themselves, and set the callback url to be their port on your server:


Now, whatever computer you are on, just run rake tunnel:start and Facebook will be able to see your local machine on port 3000. Check that it’s up with rake tunnel:status.

If you’re on an unreliable network, you may want to make a cronjob to keep the tunnel alive.

further resources

Not many people are discussing how to develop a Facebook/Rails app in a sane way, so I might make a series of these “best practice”-style things. The best resources right now are:

Liverail also has a tutorial, but it assumes you know nothing about Rails, so it’s kind of a drag.

see your hand in front of your, uh, face: facebook_exceptions

Sucks to be working hard on your Tamagotchi armadillo + tumblelogging viral Facebook mashup and see:


Official documentation is here.


As usual, it lives in the Fauna. Install via Piston, or otherwise the old-school way:

script/plugin install -x svn://rubyforge.org/var/svn/fauna/facebook_exceptions/trunk

No tarballs/gems yet. Requires edge Rails, probably.


The support forum is here.

searching the world in 231 seconds


Please refer to the official documentation here, rather than this post.


I’m pleased to release the new CHOW search:

CHOW search

no delta, no no no

Sphinx is so fast that we don’t run an index queue. Re-indexing all of CHOW takes 4 minutes in production:

$ time indexer --config sphinx.production.conf complete
Sphinx 0.9.7
Copyright (c) 2001-2007, Andrew Aksyonoff

collected 405482 docs, 1095.1 MB
sorted 199.0 Mhits, 100.0% done
total 405482 docs, 1095082069 bytes
total 228.087 sec, 4801154.00 bytes/sec, 1777.75 docs/sec

real    3m51.321s
user    3m24.121s
sys     0m21.656s

That’s crazy.


Even though Solr/Lucene was available as an in-house CNET product, we dropped it in favor of Sphinx’s simplicity.

Sphinx accesses MySQL directly, so the interoperability happens at that level rather than in-app. This means you don’t need any indexing hooks in your models. Their lifecycle doesn’t affect the search daemon.

Plus, our old indexing daemon would mysteriously die and not restart. The Sphinx indexer just runs on a cronjob.

free codes

Kent Sibilev released acts_as_sphinx a while back, but I had already started on my implementation of a Rails Sphinx plugin. As it turned out, our needs were more sophisticated, so it’s good I did.

Mine is called Ultrasphinx, and features:

  • ActiveRecord-style SQL generation
    • association includes via GROUP_CONCAT
    • field merging
    • field aliasing
  • excerpt highlighting
  • runtime field-weighting
  • Memcached integration via cache_fu
  • query spellcheck via raspell
  • will_paginate compatibility
  • Google-style query parser
  • multiple deployment environments

Of course it inherits from Sphinx itself:

  • Porter/Soundex stemming
  • UTF-8 support
  • no stopwords (configurable)
  • ranged queries (for example, dates)
  • boolean operators
  • exact-phrase match
  • filters (“facets”)
  • rock-solid stability

Downsides of my plugin are:

  • API could be better
  • some features are MySQL-specific

The biggest benefit, really, is the SQL generation and index merging, which are related. The SQL generation lets you configure Sphinx via:

  :fields => ["title", {:field => "post_last_created_at", :as => "published_at"}, "board_id"],
  :includes => [{:model => "Board", :field => "name", :as => "board"}],
  :concats => [{:model => "Post", :field => "content",
                :conditions => "posts.state = 0", :as => "body"}],
  :conditions => "topics.state = 0")

That is, you can :include fields from unary associations, and :concat fields from n-ary associations. For example, in this case, we are indexing all replies to a topic as part of that topic’s body.

Because the SQL is generated, by paying careful attention to Sphinx’s field expectations, we can create a merged index which allows us to rank totally orthogonal models by relevance.

Sphinx does require a unique ID for every indexed record. We work around this by using the alphabetical index of the model class as a modulus in an SQL function.


script/plugin install -x svn://rubyforge.org/var/svn/fauna/ultrasphinx/trunk

Documentation is here.

to the future

I don’t have much time to support this outside of our needs at CNET. So if you need something Certified and Enterprise Ready, I guess use Lucene, or maybe that French one I can’t spell.

If you need something faster, simpler, and more interesting, Sphinx + Ultrasphinx will be awesome.

Patches welcome; just ask if you want to be a committer. The support forum is here.


Who just searched for “señor fish”? Not kidding:

[eweaver@cnet search]$ rake ultrasphinx:daemon:tail
Tailing /opt/sphinx/var/log/query.log
  whole wheat pasta
  senor fish

Hopefully it’s better than the shrimp burrito I made the other day. That was kind of gross.

help, they are stealing my outlook webmail

I work for a company that uses Exchange for its internal email. This means I have to use Outlook Web Access to access my account there, since I am 100% remote, and Entourage doesn’t work over our VPN.


WebOutlook 0.3 released, with DELE support, and a bugfix. Download here.

to gmail!

Wouldn’t it be nice if we could scrape this in some way, and perhaps publish it with a regular POP server? Then Gmail could access it and merge it with our other accounts.

Thankfully, Adrian Holovaty1 has written WebOutlook, a Python script to do just that. Except, it doesn’t support Gmail because the UIDL command isn’t implemented. Also it doesn’t support the HTTP Basic Authentication Scheme, which is how my particular OWA server authenticates me. Also, it deletes your mail without asking. Ok…

open-source ftw

After a few hours of reading RFC 1939 and struggling with Python’s combinatorics of tuples, dicts, and primitives, I am pleased to release WebOutlook 0.2, which you can download here. (GPL’ed because the original version is GPL’ed.)


  • Python 2.4
  • A web server (I’m actually using my local machine with a static IP)
  • A Gmail account


  • Set your Exchange server in popdaemon.py
  • Run python2.4 popdaemon.py
  • Log in to Gmail, and go to “Settings”, “Accounts”, “Get mail from other accounts”
  • Add a new account
    • Check “Leave mail on server”
    • Add a tag so you know where the messages came from

Now, watch as Google logs into your crappy local Python script and steals your mail out of your own house.

todo list

I welcome patches, and can give you subversion access if necessary. The following things need to be done:

  • Make the authentication mode configurable
  • Add support for sending mail
  • Add support for folders other than the inbox

code example

I had to write this thing in Python:

import re
re.search(r"Message-ID: <(.*?)>", msg, re.S).group(1)

In Ruby I would have written:

msg[/Message-ID: <(.*?)>/, 1]

Or in POP:

REGX Message-ID: <(.*?)>
STRG msg

Just kidding about the POP. But I’m not really seeing the benefit of Python here, even with the shortcuts (r instead of re.compile, re.S instead of re.DOTALL).


[1] Best known for playing the MacGyver theme song on acoustic guitar.


Borrowing from cdcarter:

>> def method_missing s; s end

Now we can write like the English Romantics:

>> for desires in heaven do not always end or fade
=> :each

What’s the shortest way to get irb to respond as follows?

=> some lovers remain unhappy even in paradise

I’m afraid that it’s not going to be very short. Maybe there’s a better closing line? Bonus if you keep up the rough anapests.

railsconf wrap-up

Just got back from HasManyPolymorphs Conf. Erm, RailsConf.


Portland is a very nice city, although it rained a lot. The local teenagers seemed pretty strung-out; not sure if that’s normal. The conference center itself was large and conveniently located, and the WiFi pretty much worked, which is unusual for Ruby conferences. I stayed at the Shilo Inn. Chris, you left your shrimp scampi in my fridge.

Portland has an excellent light rail transit system.


Chris Wanstrath did an excellent job explaining memcached, with reference to CHOW. I also went to Dirk Elmendorf’s “Lessons from the Real World”, an overview of converting a Java team to Rails.

The speaker for “Security on Rails” didn’t show, so it turned into an unconference lightning session, which was a lot of fun. Zed Shaw talked about his social art project Utu. Dave Fayram and Tom Preston-Werner demoed a combined loadbalancer and webserver for Rails written in Erlang, of all things. And Brian Takita (I think) showed off a pure-Ruby packrat syntax parser named Treetop, which greatly excited the Rubinius team.

My own talk was distinctly mediocre. John Nunemaker blogged about it, and so did Nick Sieger. I need to become a more entertaining and confident public speaker if I plan to keep this up. I’ll be recapping my talk at Philly on Rails on June 5th, and after that will post the slides.

For anyone who wondered, the speakers’ lounge was not very pimp.


DHH’s keynote was an overview of Rails 2.0, most of which you have already seen if you’ve been following the changelog. He did mention adding a per-action query cache, but I thought cache_fu already had that.

Avi Bryant needled us for not using Smalltalk as a platform. I do agree that SQL is no longer appropriate for most webapps, but moving Ruby to the commercial GemStone VM doesn’t really seem viable. He also brought up the old ghost of Strongtalk. It seems kind of telling that the Smalltalk community recommends a decade-old, win32-specific, incomplete platform as the open-source VM solution. Maybe it’s a trap.

Tim Bray talked about the JVM and how it was awesome and we should use Ruby on it. Then Sun can sell us hardware and software. Um, all right. He also mentioned REST and ETags.

Are you sensing a trend here? Lots of vendors…I guess the freewheeling community days for Rails are ending. Borland, Joyent, Sun, Engineyard, Rails Machine, FiveRuns, and others were there to ply us with t-shirts and beer and try to get us to adopt their IDEs and enterprise deployment stacks. Sorry, but no. On the other hand, Pivotal Labs hired a crazy band for no reason and didn’t try to sell anything. And the Powerset guys were very hospitable, as usual.

Ze Frank was hilarious, but not totally relevant.

Dave Thomas told us to challenge our assumptions and to give money to charities.


The Rubinius team is crazy intense. They are dedicated, smart, and have the good of the community in mind—one of their number (Kyle) even started to bring me around regarding JRuby, which I am pretty skeptical about.

I’m not sure that Avi realizes how far along Rubinius already is in terms of adopting the best features of Smalltalk-style VMs.

The IRC channel #railconf was the usual silliness that comes from imagining you are anonymous. However, for a few talks it became vicious. Yes, the speakers have the logs. Also, it’s not insulting to leave a talk you aren’t enjoying. I know some people left my talk, and really—it’s ok.

Someone made a “Ruby Chicks” website during the conference and started uploading commentable pictures of the few women there. Now that is insulting. Geoff Grosenbach recorded a podcast with the female attendees (I think discussing these community problems), which I look forward to hearing.

I also heard good things about the RejectConf unconference, but hadn’t been in the mood to go to it at the time. Luckily Geoff (again!) has recorded a secret unedited podcast.

Also, there had been a lot of community bluster about forking Rails going into the conference, due to the difficulty of getting patches accepted. However, midway through the conference, Josh Susser and Kevin Clark sat down with DHH and a few others on core and battled, erm, worked out their differences. The result is that core will try to be more transparent about their decision-making process. An example of this is the new #rails-contrib IRC channel.


Well…sometimes I had a good time. It was nice to meet a lot of people I previously only knew online. I learned a lot about public speaking (be confident in your lolcats!). I drank a lot of beer, but did not fall off any fire escapes.

I had hoped the talks and keynotes would be more forward-looking than they were. And the community as a whole seems to be fragmenting, which is a shame, but perhaps inevitable. I guess we will see.

See you at RubyConf.


I am pleased to announce that I am now a senior engineer at CNET Networks.

let me hit you with some knowledge

In which I demonstrate my awesome debugging workflow, and ramble about BleakHouse and Ruby’s memory behavior.

how to debug

I’ve been running BleakHouse in actual, deployed production on CHOW, and have learned some interesting things. The ruby-bleak-house build (which is really Eric Hodel’s work) is rock stable. Every deployed Ruby process on CHOW runs the patched build. BleakHouse itself is not enabled—the point is to have the symbols in place for emergencies.

Now say I notice a problem in a particular process. For example, monit shows this one using 300MB of RAM:

Let’s see what’s up. Using Mauricio’s gdb scripts, we can freeze the live process and inject ourselves into it:

chloe:~ eweaver$ ssh xx-xx.cnet.com
[eweaver@xx-xx.cnet.com ~]$ sudo -u app gdb ruby-bleak-house 20798
Attaching to program: /opt/ruby/1.8.6/bin/ruby-bleak-house, process 20798
(gdb) source ~/tools/ruby-gdb
(gdb) redirect_stdout
$1 = 2

Some poor user just got their request hung. Hopefully they’ll refresh it and get shifted to another mongrel.

Now, we need to tail the process log so that we can see standard output:

[1]+  Stopped sudo -u app gdb ruby-bleak-house 20798
[eweaver@xx-xx.cnet.com ~]$ tail -f /tmp/ruby-debug.20798 &
[2] 27095
[eweaver@xx-xx.cnet.com ~]$ fg %1
sudo -u app gdb ruby-bleak-house 20798

Back to our gdb prompt. Let’s look around, get our bearings:

(gdb) eval "caller"
["gems/mongrel-1.0.1/lib/mongrel/configurator.rb:274:in `run'",
"gems/mongrel-1.0.1/lib/mongrel/configurator.rb:274:in `loop'",

Ah, there wasn’t a request after all. We’re just in the mongrel wait loop. Now we know it’s not some particular action jammed up and eating memory. More likely, it’s a slow leak.

Well, what does the heap look like?

(gdb) eval "require '~/chow/vendor/plugins/bleak_house/lib/bleak_house/c'"

Hmm, something went wrong.

(gdb) eval "begin;
  require '~/chow/vendor/plugins/bleak_house/lib/bleak_house/c';
rescue Object => e;
  puts e.inspect;
#<MissingSourceFile: no such file to load -- ~/chow/vendor/plugins/bleak_house/lib/bleak_house/c>


(gdb) eval "require '/home/eweaver/chow/vendor/plugins/bleak_house/lib/bleak_house/c'"

Ok. But wait:

Program received signal SIGPIPE, Broken pipe.
0x0000003ab5bb7ee2 in __write_nocancel () from /lib64/tls/libc.so.6

Crap! Someone sent us a request. Uh…carrying on:

Program received signal SIGPIPE, Broken pipe.
0x0000003ab5bb7ee2 in __write_nocancel () from /lib64/tls/libc.so.6
The program being debugged was signaled while in a function called from GDB.
Evaluation of the expression containing the function (rb_p) will be abandoned.
(gdb) eval "'test'"
warning: Unable to restore previously selected frame.
(gdb) eval "BleakHouse::CLogger.new.snapshot('/tmp/objs', 'gdb', true)"
Detaching after fork from child process 29366.

Fork? Ok, whatever. Is the objs file there?

[eweaver@xx-xx.cnet.com ~]$ ls -l /tmp/objs
-rw-rw-rw-  1 app app 22865 May 11 22:57 /tmp/objs

It is! Done with gdb:

(gdb) quit
The program is running.  Quit anyway (and detach it)? (y or n) y
Detaching from program: /opt/ruby/1.8.6/bin/ruby-bleak-house, process 20798

Let’s look at that file:

[eweaver@xx-xx.cnet.com ~]$ head /tmp/objs
- - 1178949448
  - :"memory usage/swap": 48608
    :"memory usage/real": 296612
    :"heap usage/filled slots": 670252
    :"heap usage/free slots": 1796759
    :"gdb::::String": 138879
    :"gdb::::_node": 480398
    :"gdb::::Regexp": 2281
    :"gdb::::Array": 20747

Our resident set size is very large. But there really aren’t that many filled slots in the Ruby heap. Maybe the usage shot up at one point, and then dropped back down. Maybe there’s a single array of immediate types that grows and grows, or something.

Let’s compare to a young, small process:

[eweaver@xx-xx.cnet.com ~]$ head /tmp/objs2
- - 1178950158
  - :"memory usage/swap": 48664
    :"memory usage/real": 90396
    :"heap usage/filled slots": 524930
    :"heap usage/free slots": 227851
    :"gdb::::String": 63716
    :"gdb::::_node": 443825
    :"gdb::::Regexp": 1593
    :"gdb::::Array": 2815

Well, the node count is about the same, which means there isn’t any leak in the AST itself. And the filled vs. free looks much more normal. So…we didn’t learn that much.

(At one point, I managed to get my gdb macros confused, so I killed gdb, leaving myself with an unkillable mongrel in the ‘T’ state. I had to send a -CONT signal and then a -KILL signal to kill the mongrel for real.)

What to do? Let’s make some heapspace fireworks.

bleak_house in real life

BleakHouse is pretty fast. So rather than simulate some crappy usage scenario, I shut down monit and swapped out one of the regular mongrels for one with BLEAK_HOUSE=true. Yeah—BleakHouse itself running in production mode on the live site. It worked great.

I restarted monit and eventually the process got big enough that monit killed it. Then, I copied bleak_house_production.yaml.log (all 147MB) to my local machine, and analyzed it. Let’s look at the graphs.

First, we have the root graph:

Wow. A couple things instantly stand out here:

  • memory use, the squiggly red line, is growing linearly
  • the total heap size stays low, then leaps to a new plateau
  • the heap jump correlates with a bizarre request in the boards controller that spikes way out of range
  • sometime before that, the boards controller suddenly went from being in the middle of the object-count pack to consistently the largest

The boards controller looks pretty suspicious. Before we look at that, though, let’s look at the heap in detail:

What a beautiful graph! And it verifies my guess from above—the filled slots stay within the same general bracket the whole time, but around 22:05 there is a huge, temporary need for lots of objects. Those slots then persist for the rest of the life of the process.

We’re running recent edge, so we should make sure that Rails itself isn’t leaking due to an introduced bug:

Looks pretty much fine. Although it shouldn’t really have an /unknown path…that’s a minor BleakHouse::Analyze bug.

The graph does have a slight upward trend, but basically stays within a consistent bracket after an initialization period. The continuing increase is probably some application code leaking objects, which then get misidentified the next time core is logged and tagged.

Now on to that crazy bastard, the boards controller:

Clearly the show action is to blame. Let’s look at that:

Hrm, ok, well that’s an awful lot of strings. The boards are one of the most heavily-trafficked sections of CHOW, so a leak there will eventually bring down the whole process.

Incidentally, CHOW gets about a 1/2 million page views per day.

onward, ever onward

Of course, now that we have a hypothesis about the action causing the leak, we have to find the exact code from which it springs. The general procedure for this is:

  • thrash only that particular action with a single, deterministic request, and make sure you can consistently reproduce the problem
  • sprinkle the action with BleakHouse::CLogger snapshots, tagged with something to identify their source (you could do this automatically with set_trace_func, perhaps, or manually pass __FILE__ and __LINE__)
  • look for large deltas
  • narrow it down

And then…kill the fucker. But that will have to wait for another day.

community response to bleak_house

BleakHouse is getting a lot of attention. There’s an InfoQ article, and one on Ruby Inside. Lourens Naudé wrote some notes about memory usage and used some BleakHouse code. And the Solaris DTrace team sent me a mail.

a detour about profiling

Lourens mentions in his article some tools like a String counter and a debug flag for mongrel. Don’t use those. They use ObjectSpace. ObjectSpace is worse than nothing, because if you try to count references you end up preserving references by mistake. Misinformation is worse than no information. (I’ve removed the old ObjectSpace counter from BleakHouse for this reason.)

Incidentally Lourens references the same critical thread that I had been discussing with various people last week.

Also, the Ruby heap has a tendency to request memory from the OS and never give it back, even if the need goes away. This is normally fine, because unused portions of the space get swapped out. However, if your system is underloaded, there might not be any need to swap. This can give false impressions of the physical memory footprint of your app.

On the other hand, it would be nice if Ruby would just give the space back.


I gave BleakHouse its own documentation page.

leak proof: direct heap instrumentation for bleak_house

BleakHouse tracking a simple action, using the pure-Ruby memlogger:

ruby physics

ObjectSpace is quantum—you can’t interact with it without changing it. This makes it useless as a profiler. I didn’t realize how useless it was until I wrote BleakHouse, though. Until then, I had guessed that other implementations (such as Rublique) were just poorly done. But even after eliminating leaks in those…

Anyway. The new hotness, BleakHouse 5:

Bonus: it charts memory usage now (swap, real, and combined). And it’s faster.

get hooked

More gems required than before: gruff, rmagick, RubyInline, active_support.

You need to build a patched Ruby binary, unfortunately. I borrowed Eric Hodel’s patch from mem_inspect and wrote a Rake task to handle it all for you. To build, just install the plugin one way or another, and then from the bleak_house folder, run:

sudo rake ruby:build

It will give you a new Ruby 1.8.6 binary called ruby-bleak-house alongside your existing one.

(Build process tested on OS X. Should be fine on Linux. Won’t work on Windows, but maybe you can hack it out. You should be profiling in an environment that approximates your deployment environment, though, so a win32 version doesn’t really make sense. Unless you deploy on win32. You probably also chew nails for fun.)

using it

The options have changed a bit. Start your server with:

RAILS_ENV=production BLEAK_HOUSE=true ruby-bleak-house script/server

Fire requests. Bonus points if you do it in a repeatable way. Even Monte Carlo methods are more consistent than by-hand. Consider using open-uri, rfuzz, httperf, Selenium, curl, libcurl via curb, etc.


RAILS_ENV=production SMOOTHNESS=2 rake bleak_house:analyze

The smoothness setting averages series of frames. You can’t set it lower than 2. There’s no upper limit. It’s most useful when you aren’t hitting the same action every request. (It would be nice to use a Gaussian kernel instead of an average, but…)

Check the source if you want to see some interesting things. You can use the memlogger separately, too—you don’t have to profile a Rails app.

And please report problems on the forum.

curtain call

Thanks much to Ryan Davis for RubyInline, Eric Hodel for mem_inspect, Evan Phoenix for telling me about mem_inspect, and David Goodlad for helping with the C extension.