memcached gem performance across VMs

Thanks to Evan Phoenix, memcached.gem 1.3.2 is compatible with Rubinius again. I have added Rubinius to the release QA, so it will stay this way. 

The master branch is compatible with JRuby, but a JRuby segfault (as well as a mkmf bug) prevents it from working for most people.

vm comparison

Memcached.gem makes an unusual benchmark case for VMs. The gem is highly optimized in general, and specially optimized for MRI. This means it will tend to not reward speedups of “dumb” aspects of MRI because it doesn’t exercise them—contrary to many micro-benchmarks.

                                          user     system      total        real
JRuby-head
set: libm:ascii                       2.440000   1.760000   4.200000 (  8.284000)
get: libm:ascii                       [SEGFAULT]

RBX-head
set: libm:ascii                       1.387198   1.590912   2.978110 (  6.576674)
get: libm:ascii                       2.076829   1.705302   3.782131 (  7.237497)

REE 1.8.7-2011.03
set: libm:ascii                       1.130000   1.530000   2.660000 (  6.331992)
get: libm:ascii                       1.250000   1.540000   2.790000 (  6.142529)

Ruby 1.9.2-p290
set: libm:ascii                       0.860000   1.490000   2.350000 (  5.917467)
get: libm:ascii                       1.030000   1.580000   2.610000 (  6.238965)

JRuby’s performance is surprisingly OK, but only once Hotspot has been convinced to JIT the function to native code (which the benchmark does ahead of time). Rubinius’s performance is good. Ruby 1.9.2 is the fastest.

jruby client comparison

Curiously, memcached.gem is the fastest Ruby memcached client on every VM including JRuby. It is 70% faster than jruby-memcache-client, which wraps Whalin’s Java client via JRuby’s Java integration:

memcached 1.3.3
remix-stash 1.1.3
jruby-memcache-client 1.7.0
dalli 1.1.2
                                          user     system      total        real
set: dalli:bin                       10.720000   7.250000  17.970000 ( 17.859000)
set: libm:ascii                       2.440000   1.760000   4.200000 (  8.284000)
set: libm:bin                         2.280000   1.960000   4.240000 (  8.600000)
set: mclient:ascii                    4.150000   3.010000   7.160000 ( 11.879000)
set: stash:bin                        5.870000   2.970000   8.840000 ( 13.677000)

conclusion

This is great performance for C extensions in JRuby and Rubinius both. It’s handy that MRI’s extension interface is so simple.

One possible performance improvement remains in memcached.gem itself, which is rewriting the bundled copy of libmemcached to talk directly to Ruby instead of via SWIG, which introduces memory copy overhead.

Also, someone needs to write a faster client for JRuby; there’s no reason why binding to a good native library like Whalin’s or xmemcached should be slow. It should be possible to equal the speed of memcached.gem on Ruby 1.9.

7 responses

  1. I just wrote a simple wrapper for xmemcached (JRuby) and benchmarked it against dalli (because I had no compile/compatibility issues with dalli).

    Both libs where using the binary protocol and a single server. I did not implement any marshaling. I used the JRuby parameters --server --fast -J-Xmn512m -J-Xms2048m -J-Xmx2048m.

    The result is somehow sad:

                                              user     system      total        real
    set: dalli:bin                       16.387000   0.000000  16.387000 ( 16.387000)
    set: xmemcached:bin                  12.662000   0.000000  12.662000 ( 12.661000)
    
    get: dalli:bin                       17.376000   0.000000  17.376000 ( 17.376000)
    get: xmemcached:bin                  12.202000   0.000000  12.202000 ( 12.202000)
    

    That’s 4k gets a second for xmemcached.

    I did some tuning (disabling serialization, tcp nodelay, changing the tcp buffer size) aswell, but the changes aren’t significant. I’m a bit disappointed with the single threaded performance of xmemcached. With 300 threads, each doing 10k loops I get the following results:

    set: dalli:bin                        3.989000   0.000000   3.989000 (  3.989000)
    set: xmemcached:bin                   2.537000   0.000000   2.537000 (  2.537000)
    

    It can’t be the Ruby code slowing it all down in this case.

  2. Charles Nutter told me that the default JRuby/Java integration is not very fast, so I assume that’s the bulk of the problem.

    Maybe you could test the speed of the integration itself with a do-nothing Java stub?

  3. I talked to Charles about this issue too and we’ll see better times for calling java
    code in jRuby 1.7. Though, this does not seem to be the problem here:

                                              user     system      total        real
    set: fake                             0.165000   0.000000   0.165000 (  0.165000)
    set: xmemcached:bin                  13.383000   0.000000  13.383000 ( 13.383000)
    get: dalli:bin                       24.084000   0.000000  24.084000 ( 24.084000)
    
    get: dalli:bin                       22.399000   0.000000  22.399000 ( 22.399000)
    get: fake                             0.194000   0.000000   0.194000 (  0.194000)
    get: xmemcached:bin                  12.918000   0.000000  12.918000 ( 12.918000)
    

    I just added a “fake” client (Java class accepting the same args as other clients). We clearly see the overhead (160ms for nothing is not that good) but it wont help turning the benchmark around.

    If we look at the benchmark over here, single threaded mode is always below 10k ops/second, which is rather slow compared to your results using libmemcached, as far as I understand this.

    Any ideas?

  4. Those tests are against remote servers; mine are against localhost. So not comparable unfortunately.

    You’re right, though, my results are around 20k ops per second.

  5. I had an email conversation with Charles and he decided to implement a JRuby extension for memcached. Find it over here.

    So far, I hadn’t had the time to test it intensively. It helped me understand some JRuby internals though. Maybe you have the time to do the benchmark again?

Follow

Get every new post delivered to your Inbox.

Join 362 other followers