snax

scaling rails

ree

We recently migrated Twitter from a custom Ruby 1.8.6 build to a Ruby Enterprise Edition release candidate, courtesy of Phusion. Our primary motivation was the integration of Brent's MBARI patches, which increase memory stability.

Some features of REE have no effect on our codebase, but we definitely benefit from the MBARI patchset, the Railsbench tunable GC, and the various leak fixes in 1.8.7p174. These are difficult to integrate and Phusion has done a fine job.

testing notes

I ran into an interesting issue. Ruby is faster if compiled with -Os (optimize for size) than with -O2 or -O3 (optimize for speed). Hongli pointed out that Ruby has poor instruction locality and benefits most from squeezing tightly into the instruction cache. This is an unusual phenomenon, although probably more common in interpreters and virtual machines than in "standard" C programs.

I also tested a build that included Joe Damato's heaped thread frames, but it would hang Mongrel in rb_thread_schedule() after the first GC run, which is not exactly what we want. Hopefully this can be integrated later.

benchmarks

I ran a suite of benchmarks via Autobench/httperf and plotted them with Plot. The hardware was a 4-core Xeon machine with RHEL5, running 8 Mongrels balanced behind Apache 2.2. I made a typical API request that is answered primarily from composed caches.

As usual, we see that tuning the GC parameters has the greatest impact on throughput, but there is a definite gain from switching to the REE bundle. It's also interesting how much the standard deviation is improved by the GC settings. (Some data points are skipped due to errors at high concurrency.)

upgrading

Moving from 1.8.6 to REE 1.8.7 was trivial, but moving to 1.9 will be more of an ordeal. It will be interesting to see what patches are still necessary on 1.9. Many of them are getting upstreamed, but some things (such as tcmalloc) will probably remain only available from 3rd parties.

All in all, good times in MRI land.

September 24, 2009

24 comments

Luke Melia says (September 24, 2009):

Thanks for the great writeup, Evan, and for sharing the behind the scenes data. We used your earlier writeup about GC tuning to double the performance of Weplay and it's great to see REE 1.8.7 getting a Twitter-sized workout.

Antti-Ville Tuunainen says (September 24, 2009):

This is an unusual phenomenon...

More like the standard way things work these days. Every time I have tested the various settings with real, non-trivial programs, -Os wins, often with a huge margin. The speed difference of the L1i and L2 is just massive to instructions.

Chad says (September 24, 2009):

Is REE 1.8.7 publicly available? I only see 1.8.6 on the homepage and Github.

evan says (September 24, 2009):

Antti-Ville: Yeah, a lot of people are saying this is normal. I'm going to try recompiling some gems with -Os too.

Chad: Not yet. Soon.

Ashwin Jayaprakash says (September 25, 2009):

Wasn't there an article a while ago about Twitter moving to Scala/JVM? Or is Scala not used in production?

ryan king says (September 25, 2009):

We use Scala for a few things at Twitter, but the majority of the site is Ruby.

ehsanul says (September 25, 2009):

Nice writeup. Have you guys considered migrating to JRuby?

Mark Turner says (September 25, 2009):

Was 1.8.6p287 shipped with RHEL5, or did you use the source?

Attila Szegedi says (September 25, 2009):

Back in my C++ programming days, it was common wisdom that -Os produces code that actually runs faster than -O2; since CPU frequencies are very high, machine code execution is completely dominated by cache misses for at least a decade; making sure your hot execution path fits nicely into the CPU cache with some space to spare for the data gives you the best performance.

evan says (September 25, 2009):

ehsanul: See here.

Mark: It was a patched version from source in order to include the Railsbench GC optimizations.

DAddYE says (September 26, 2009):

When you build REE, are you enabling the MBARI api, and pthread? Also, why not Thin instead of Mongrel?

Ninh Bui says (September 29, 2009):

Thanks again for the kind words! As for REE 1.8.7, it's now up for grabs here. Enjoy!

roger says (September 29, 2009):

It would be nice to see how much RAM each of those processes is using. Also if you're looking for speed, some suggestions might be:

  • Use mysqlplus so that the mysql driver doesn't force a GC every 20 queries (which it currently does)
  • Compile with -march=native

Though I haven't benchmarked these.

john says (October 02, 2009):

Have you guys tried the experimental MacRuby 0.5, even for grins? The LLVM based VM is alleged to offer huge performance increases.

Matt Aimonetti says (October 02, 2009):

As part of the MacRuby team, it would be my pleasure to help them experiment with MacRuby 0.5. However MacRuby isn't running Rails yet and unless Twitter would want to switch from Scala to MacRuby for their queuing mechanism (using GCD for instance), I don't really see any major interest for them to invest too much time in that yet.

Maybe when we'll run Rails very well though. ;)

john says (October 02, 2009):

I understand the early stages nature of MacRuby 0.5.

I was just curious if Evan's people had done any work to look at 0.5. If what you're seeing in your test cases is reflected in any real world application, and there's no reason to suppose it wouldn't, that would be a major step forward.

While not wanting to speak for them, for Twitter I'd expect it to be enough of a major deal that they might consider devoting some resources (eg.manpower) to help get macruby 0.5 to a point where it would be good enough for production, and maybe even ported to a system that supported GNUstep.

evan says (October 02, 2009):

john: We have not tried MacRuby. You seem to overestimate the resources available at a product company to devote to experimental system building.

There are many reasons to suppose that test cases are not representative of real-world performance; in my experience, that is the norm.

That said, if MacRuby ran Rails and worked on Linux, I would try it.

john says (October 04, 2009):

The LLVM version of MacRuby is something you really should be paying attention to, if Ruby performance is important to you. See here for more detail. You'll see that the test cases to which I refer, are in fact the standard Rubf benchmark set, for which speedups of up to 37X over 1.8, and 7.8X over 1.9 have been seen, even at this early stage.

In my experience (which includes 25 years of performance engineering, across a bunch of startups, in large scale HPC, data center, real-time and IO oriented applications), those kinds of speedups in low level benchmarks, tend to make a significant dent on real world applications.

It's not all peaches and cream: there are a bunch of major backwards regressions too, and the project is still at an early stage, with still a ways to go. But overall I can say, without hype, that things looks encouraging.

Now lets say, for the sake of argument, that someone had a Ruby implementation, arriving in a year, that would give you on average a 10X speedup. What would that be worth to you and your business? Would you consider it to be a competitive advantage or an enabling technology? And if so, what would you do to get that advantage 3 or 6 months ahead of time? Think about it.

ehsanul says (October 04, 2009):

John, you might want to note that as MacRuby does not yet support many features of Ruby proper, it's quite unrealistic to rely on those benchmarks. As more of Ruby is supported, it will become harder and harder for the MacRuby team to keep the performance so high. I certainly hope they do, but they have some way to go yet. Let's wait and see before speculating.

Here's a good post explaining why the benchmarks shouldn't be taken at face value.

john says (October 06, 2009):

You're missing my point. I understand almost exactly at what stage the MacRuby project is at, and what the benchmarks represent, and while I don't mean to speak for Evan or Twitter, it seems to me that the benchmarks show enough promise, to them, as to be worth investigating, even at this stage. I'm trying to encourage someone who may have resources to help that project, to actually do so, and help make the project happen. My point to Evan is that the specific payoff to him and his company should be calculable, and might be substantial, consequently justifying an allocation of resources.

What's motivating me here is that I would dearly like to see MacRuby both make it to production, and also be ported to Linux. And while the MacRuby guys have my confidence, I think they need major help for those two things to happen anytime soon.

evan says (October 06, 2009):

We didn't have enough time on our own to integrate the MBARI patchset into 1.8.7 and had to wait for REE to do it. We don't have enough time to port our own app to Ruby 1.9. There is absolutely no way we have time to work on MacRuby in the foreseeable future, especially if it needs "major help".

There may be an opportunity cost to ignoring it right now, but it is extremely small compared to everything else we can spend our resources on.

That said, I also think the LLVM makes a good potential compilation host for dynamic runtimes.

john says (October 06, 2009):

Seems like I owe you an apology for getting on your case. I mistook you for the other Evan at Twitter. Things now make sense.

evan says (October 06, 2009):

What difference does it make? Putting resources into MacRuby is simply a bad business decision right now.

Comments are closed.