Thanks to Evan Phoenix, memcached.gem
1.3.2 is compatible with Rubinius again. I have added Rubinius to the release QA, so it will stay this way.
The master branch is compatible with JRuby, but a JRuby segfault (as well as a mkmf
bug) prevents it from working for most people.
vm comparison
Memcached.gem
makes an unusual benchmark case for VMs. The gem is highly optimized in general, and specially optimized for MRI. This means it will tend to not reward speedups of “dumb” aspects of MRI because it doesn’t exercise them—contrary to many micro-benchmarks.
user system total real JRuby-head set: libm:ascii 2.440000 1.760000 4.200000 ( 8.284000) get: libm:ascii [SEGFAULT] RBX-head set: libm:ascii 1.387198 1.590912 2.978110 ( 6.576674) get: libm:ascii 2.076829 1.705302 3.782131 ( 7.237497) REE 1.8.7-2011.03 set: libm:ascii 1.130000 1.530000 2.660000 ( 6.331992) get: libm:ascii 1.250000 1.540000 2.790000 ( 6.142529) Ruby 1.9.2-p290 set: libm:ascii 0.860000 1.490000 2.350000 ( 5.917467) get: libm:ascii 1.030000 1.580000 2.610000 ( 6.238965)
JRuby’s performance is surprisingly OK, but only once Hotspot has been convinced to JIT the function to native code (which the benchmark does ahead of time). Rubinius’s performance is good. Ruby 1.9.2 is the fastest.
jruby client comparison
Curiously, memcached.gem
is the fastest Ruby memcached client on every VM including JRuby. It is 70% faster than jruby-memcache-client
, which wraps Whalin’s Java client via JRuby’s Java integration:
memcached 1.3.3 remix-stash 1.1.3 jruby-memcache-client 1.7.0 dalli 1.1.2 user system total real set: dalli:bin 10.720000 7.250000 17.970000 ( 17.859000) set: libm:ascii 2.440000 1.760000 4.200000 ( 8.284000) set: libm:bin 2.280000 1.960000 4.240000 ( 8.600000) set: mclient:ascii 4.150000 3.010000 7.160000 ( 11.879000) set: stash:bin 5.870000 2.970000 8.840000 ( 13.677000)
conclusion
This is great performance for C extensions in JRuby and Rubinius both. It’s handy that MRI’s extension interface is so simple.
One possible performance improvement remains in memcached.gem
itself, which is rewriting the bundled copy of libmemcached to talk directly to Ruby instead of via SWIG, which introduces memory copy overhead.
Also, someone needs to write a faster client for JRuby; there’s no reason why binding to a good native library like Whalin’s or xmemcached should be slow. It should be possible to equal the speed of memcached.gem
on Ruby 1.9.
Could you please post/send the benchmark script you used?
The benchmark script is included in the gem. See https://github.com/fauna/memcached/blob/master/test/profile/benchmark.rb .
I just wrote a simple wrapper for
xmemcached
(JRuby) and benchmarked it againstdalli
(because I had no compile/compatibility issues withdalli
).Both libs where using the binary protocol and a single server. I did not implement any marshaling. I used the JRuby parameters
--server --fast -J-Xmn512m -J-Xms2048m -J-Xmx2048m
.The result is somehow sad:
That’s 4k gets a second for
xmemcached
.I did some tuning (disabling serialization, tcp nodelay, changing the tcp buffer size) aswell, but the changes aren’t significant. I’m a bit disappointed with the single threaded performance of
xmemcached
. With 300 threads, each doing 10k loops I get the following results:It can’t be the Ruby code slowing it all down in this case.
Charles Nutter told me that the default JRuby/Java integration is not very fast, so I assume that’s the bulk of the problem.
Maybe you could test the speed of the integration itself with a do-nothing Java stub?
I talked to Charles about this issue too and we’ll see better times for calling java
code in jRuby 1.7. Though, this does not seem to be the problem here:
I just added a “fake” client (Java class accepting the same args as other clients). We clearly see the overhead (160ms for nothing is not that good) but it wont help turning the benchmark around.
If we look at the benchmark over here, single threaded mode is always below 10k ops/second, which is rather slow compared to your results using libmemcached, as far as I understand this.
Any ideas?
Those tests are against remote servers; mine are against localhost. So not comparable unfortunately.
You’re right, though, my results are around 20k ops per second.
I had an email conversation with Charles and he decided to implement a JRuby extension for memcached. Find it over here.
So far, I hadn’t had the time to test it intensively. It helped me understand some JRuby internals though. Maybe you have the time to do the benchmark again?