We recently migrated Twitter from a custom Ruby 1.8.6 build to a Ruby Enterprise Edition release candidate, courtesy of Phusion. Our primary motivation was the integration of Brent’s MBARI patches, which increase memory stability.
Some features of REE have no effect on our codebase, but we definitely benefit from the MBARI patchset, the Railsbench tunable GC, and the various leak fixes in 1.8.7p174. These are difficult to integrate and Phusion has done a fine job.
I ran into an interesting issue. Ruby is faster if compiled with
-Os (optimize for size) than with
-O3 (optimize for speed). Hongli pointed out that Ruby has poor instruction locality and benefits most from squeezing tightly into the instruction cache. This is an unusual phenomenon, although probably more common in interpreters and virtual machines than in “standard” C programs.
I also tested a build that included Joe Damato’s heaped thread frames, but it would hang Mongrel in
rb_thread_schedule() after the first GC run, which is not exactly what we want. Hopefully this can be integrated later.
I ran a suite of benchmarks via Autobench/httperf and plotted them with Plot. The hardware was a 4-core Xeon machine with RHEL5, running 8 Mongrels balanced behind Apache 2.2. I made a typical API request that is answered primarily from composed caches.
As usual, we see that tuning the GC parameters has the greatest impact on throughput, but there is a definite gain from switching to the REE bundle. It’s also interesting how much the standard deviation is improved by the GC settings. (Some data points are skipped due to errors at high concurrency.)
Moving from 1.8.6 to REE 1.8.7 was trivial, but moving to 1.9 will be more of an ordeal. It will be interesting to see what patches are still necessary on 1.9. Many of them are getting upstreamed, but some things (such as tcmalloc) will probably remain only available from 3rd parties.
All in all, good times in MRI land.