valgrind and ruby

In which we learn about the two kinds of memory leaks, and how to use Valgrind on Ruby 1.8.6.

If you just came for the patch, it’s here.

remedial leaking

A Ruby application can leak memory in two ways.

First, it can leak space on the Ruby heap. This happens when a Ruby object does not go out of scope in your app. The Ruby garbage collector is aware of the object, but it is not allowed to garbage collect it because there still is a reference somewhere. These leaks tend to grow slowly. Your Rails app definitely has this kind of leak, especially if it uses the ActiveRecord session store.

Second, it can leak C memory. This happens when a C extension calls malloc() or its friends at some point, but doesn’t always properly call free(). The process totally forgets that the space even exists. These kind of leaks can grow incredibly fast. Your Rails app might have this kind of leak if you use a broken C extension.

Most C extensions are broken. We are going to see how to fix them with Valgrind, using the memcached client as an example.

about valgrind

Valgrind is a Unix application for detecting C-based memory leaks and race conditions. It is awesome. (It’s no longer Linux-only; thanks to the heroic work of Nicholas Nethercote and Julian Seward, Valgrind is OS X-compatible as of version 3.5.0.)

Valgrind works by running your entire process in an x86 virtual machine. It tracks memory allocations and deallocations in a parallel memory space. This means that it’s extremely accurate, but slow.

Image from here.

Valgrind, in Norse mythology, is the sacred gate to Valhalla through which only the chosen slain can pass. Pronounce it “val-grinned” or “vul-grinned”.

prereqs setup

Install Valgrind in the usual way from the current releases page (it is also available in MacPorts):

mkdir ~/src
cd ~/src
wget 'http://valgrind.org/downloads/valgrind-3.5.0.tar.bz2'
tar xjf valgrind-3.5.0.tar.bz2
./configure
make
sudo make install

Now you need to patch your Ruby build so that it doesn’t throw spurious Valgrind errors. Previously, people have used suppression files to try to avoid the errors, but that is kind of lame (I guess Dave Balmain’s version is the best). But instead of marking particular functions as ignored, let’s mark the actual memory in question as safe.

I made a patch to do that on 1.8.6, and it’s here (note that Ruby 1.9 has a --with-valgrind flag in it already). So, to install:

cd ~/src
wget 'ftp://ftp.ruby-lang.org/pub/ruby/1.8/ruby-1.8.6-p111.tar.gz'
tar xzf ruby-1.8.6-p111.tar.gz
cd ruby-1.8.6-p111
curl 'http://github.com/fauna/bleak_house/tree/master%2Fruby%2Fvalgrind.patch?raw=true' \
  > valgrind.patch
patch -p0 < valgrind.patch
autoconf
export CFLAGS=" -ggdb -DHAVE_DEBUG"
./configure --enable-valgrind
make
sudo make install

Ideally this will not confuse our existing RubyGems install, but you might have to fuss with it a little bit. Also, we use the debugging flags so that Valgrind can give us nice backtraces.

Please, if you are installing this on a production server, use ./configure --enable-valgrind --prefix=/opt/ruby-valgrind to set up a parallel Ruby build instead of overwriting your existing one.

application setup

Now, let’s get the memcached client set up so that we can work on an actual leak. First, we need libmemcached (the C library it’s based on), and echoe, which gives us some handy Rake tasks:

cd ~/src
wget 'http://download.tangent.org/libmemcached-0.15.tar.gz'
tar xzf libmemcached-0.15.tar.gz
cd libmemcached-0.15
./configure --enable-debug
make
sudo make install
sudo gem install echoe

We also need memcached 1.2.4; you can install that one yourself if you don’t already have it.

Finally, we check out the extension code:

cd ~/src
svn co http://fauna.rubyforge.org/svn/memcached/branches/valgrind-example/ memcached
cd memcached

searching, always searching

We’re ready to go! In this case we will test the Memcached#get method. There’s a runner for it in test/profile/valgrind.rb. The part we are interested in looks like this:

when "get"
  @i.times do
    @cache.get @key1
  end

We’ll use some Rake tasks to try it out:

DEBUG=1 rake compile
METHOD=get LOOPS=100 rake valgrind

This spews out 150 very interesting lines. But for now, we only care about the end:

...
==19026== LEAK SUMMARY:
==19026==    definitely lost: 1,588 bytes in 134 blocks.
==19026==      possibly lost: 6,880 bytes in 225 blocks.
==19026==    still reachable: 3,598,680 bytes in 30,127 blocks.
==19026==         suppressed: 0 bytes in 0 blocks.
==19026== Reachable blocks (those to which a pointer was found) are not shown.
==19026== To see them, rerun with: --leak-check=full --show-reachable=yes

Hmm, what’s this? “Definitely lost” is of great concern. “Possibly lost” is of significant concern. “Still reachable” is the Ruby heap—we don’t really know if those are leaks or not, but they probably aren’t.

So…let’s increase the loops in order to get a delta:

METHOD=get LOOPS=200 rake valgrind

Now we have:

...
==19413== LEAK SUMMARY:
==19413==    definitely lost: 2,120 bytes in 235 blocks.
==19413==      possibly lost: 6,848 bytes in 224 blocks.
==19413==    still reachable: 3,598,567 bytes in 30,125 blocks.
==19413==         suppressed: 0 bytes in 0 blocks.
==19413== Reachable blocks (those to which a pointer was found) are not shown.
==19413== To see them, rerun with: --leak-check=full --show-reachable=yes

Oh man. Our “definitely lost” count went up significantly. This means there is a leak in our little bit of code. So let’s scroll up and find a section that leaked at least that much memory:

...
==19413== 1,000 bytes in 200 blocks are definitely lost in loss record 10 of 16
==19413==    at 0x4904E27: malloc (vg_replace_malloc.c:207)
==19413==    by 0x4F78A04: memcached_string_c_copy (memcached_string.c:146)
==19413==    by 0x4F74872: memcached_fetch (memcached_fetch.c:161)
==19413==    by 0x4F74D22: memcached_get_by_key (memcached_get.c:34)
==19413==    by 0x4F74CB4: memcached_get (memcached_get.c:12)
==19413==    by 0x4E1C8D1: memcached_get_rvalue (rlibmemcached_wrap.c:1875)
==19413==    by 0x4E3FF82: _wrap_memcached_get_rvalue (rlibmemcached_wrap.c:8723)
==19413==    by 0x41B7FC: call_cfunc (eval.c:5700)
==19413==    by 0x41ABAC: rb_call0 (eval.c:5856)
==19413==    by 0x41C287: rb_call (eval.c:6103)
==19413==    by 0x4146A0: rb_eval (eval.c:3479)
==19413==    by 0x4138D0: rb_eval (eval.c:3267)
==19413==    by 0x4156C7: rb_eval (eval.c:3658)
==19413==    by 0x41B432: rb_call0 (eval.c:6007)
==19413==    by 0x41C287: rb_call (eval.c:6103)
...

Hey, there are some of our methods, and it even mentions “200 blocks”—our exact loop count. So we are leaking 5 bytes per loop.

fixin it

Let’s look at those methods mentioned in the output. We’ll start with memcached_get_rvalue, since the rest are part of libmemcached and SWIG, which we will assume are fine. This is the principle of “first cast out the beam out of thine own eye”, or “my own code is always at fault”:

VALUE memcached_get_rvalue(memcached_st *ptr, char *key, size_t key_length,
  uint32_t *flags, memcached_return *error) {
  VALUE ret;
  size_t value_length;
  char *value = memcached_get(ptr, key, key_length, &value_length, flags, error);
  ret = rb_str_new(value, value_length);
  return ret;
};

Hmm, that looks ok. It’s not like we malloc anything. But that char* seems shady. Let’s check the libmemcached man page about memcached_get:

memcached_get() is used to fetch an individual value from the server. You must pass in a key and its length to fetch the object. You must supply three pointer variables which will give you the state of the returned object. A uint32_t pointer to contain whatever flags you stored with the value, a size_t pointer which will be filled with size of of the object, and a memcached_return pointer to hold any error. The object will be returned upon success and NULL will be returned on failure. Any object returned by memcached_get() must be released by the caller application.

Well, crap. It is our responsibility to free that pointer. So:

  ret = rb_str_new(value, value_length);
+  free(value);
  return ret;

We recompile and run Valgrind again:

DEBUG=1 rake compile
METHOD=get LOOPS=200 rake valgrind

Sure enough:

...
==20475== LEAK SUMMARY:
==20475==    definitely lost: 1,120 bytes in 35 blocks.
==20475==      possibly lost: 6,848 bytes in 224 blocks.
==20475==    still reachable: 3,598,567 bytes in 30,125 blocks.
==20475==         suppressed: 0 bytes in 0 blocks.
==20475== Reachable blocks (those to which a pointer was found) are not shown.
==20475== To see them, rerun with: --leak-check=full --show-reachable=yes

And more importantly, that method is totally gone from the handful of other backtraces we got out of Valgrind. Success!

conclusion

Hey so, that pretty much rocked. Valgrind is simple and easy to use. Just to be clear, here’s the command string that the Rake task runs:

valgrind --tool=memcheck --leak-check=yes --num-callers=15 \
  --track-fds=yes ruby test/profile/valgrind.rb

Memcheck is the particular function of Valgrind we were using. There are also tools like Cachegrind, Callgrind, and Massif, which you can read about in the manual. Each one serves a different purpose—Valgrind is really the platform for a number of tools.

So…go to it.

safe nils in ruby

Groovy has a feature called safe navigation. It lets you send messages to nil and have them return nil if you’re in a special context. This is neat because it means in Grails views, where you might expect to do:

  ${ if (user.photo != NULL) {user.photo.url} }

You can instead do:

  ${ user.photo?.url }

The special context is explicitly signalled by the ? on the accessor.

ok switching to grails

Nah. What if we could add it to Ruby? Be aware that empty arrays behave this way already, because empty array comprehensions return [], which is safe to keep calling. So we just need:

class NilClass
  def method_missing(*args)
    super unless caller.first =~ /`_run_erb/
  end
end

Now, as long as we’re in a special context (ERb), we can chain our nils.

where should we keep it?

For now environment.rb makes a fine home. If it makes people feel better, I could put it in a plugin, or maybe a Rails patch. I am not using this in production (yet), but I want to hear thoughts.

be the fastest you can be, memcached

New memcached client based on SWIG/libmemcached. 15 to 150 times faster than memcache-client, depending on the architecture. Full coverage, benchmarks.

tell me

Some nice results from OS X x86:

                                     user     system      total
set:ruby:noblock:memcached       0.100000   0.010000   0.110000
set:ruby:memcached               0.150000   0.140000   0.290000
set:ruby:memcache-client        18.070000   0.310000  18.380000
get:ruby:memcached               0.180000   0.140000   0.320000
get:ruby:memcache-client        18.210000   0.320000  18.530000
missing:ruby:memcached           0.290000   0.170000   0.460000
missing:ruby:memcache-client    18.110000   0.330000  18.440000
mixed:ruby:noblock:memcached     0.380000   0.340000   0.720000
mixed:ruby:memcached             0.370000   0.280000   0.650000
mixed:ruby:memcache-client      36.760000   0.700000  37.460000

Ubuntu/Xen AMD64 was similar to the above, while RHEL AMD64 was more like 20x. It’s weird how much better Ruby performance was on RHEL.

I’ll try to push a little more Ruby into C, because we’re already down to counting single dispatches. For any deep object, most of the time is spent in Marshal.

features

Built-in non-blocking IO, consistent key modulus, cross-language hash functions, append/prepend/replace operators, thread safety, and other fancy stuff. CAS (compare and swap) coming as soon as libmemcached finishes it.

The API is not compatible with Ruby-MemCache/memcache-client, but it’s pretty close. Don’t drop it into Rails just yet.

ultrasphinx updates

You’re already using Sphinx, right? Of course. It’s not like it’s only the fastest, most stable Rails search solution there is. At least, based on rigorous anecdotal evidence.

better docs

Lots of updates, especially Ultrasphinx deployment docs. To help the fine conductors of EngineYard.

postgres

Postgres support is now considered mature. You need at least version 8.2. Personally I found Postgres totally annoying:

Lines of stored procedures needed:

MySQL: 0

Postgres: 73

sphinx r985

With Pat Allan’s excellent Riddle, good things coming: multivalues, float faceting, geocoding.

Soon, we will get a smoother is_indexed API. I’m taking suggestions. And some time after that…delta updates. A patch would really speed that along.

The most credit goes to the Russians. I’m just connecting pipes. Got plans to explain someday why exactly Sphinx is as fast as it is. Hint: it’s not the enterprise beans!!!

hello sports racers

evn: o i wrote benchmark/unit

evn: https://evanweaver.files.wordpress.com/2010/12/doc/fauna/benchmark_unit/

defunkt: blog post or it didnt happen

Hey, so, benchmark_unit. Machine-independent benchmark assertions for your unit tests:

$ ruby parser_test.rb
Loaded suite parser_test
Started
F
Finished in 4.253302 seconds.

  1) Failure:
test_parsing_speed(ParserTest) [parser_test.rb:7]:
<2.04597701149425 RubySeconds> is not faster than 1.5 RubySeconds.

1 tests, 1 assertions, 1 failures, 0 errors

better rails caching

I’m pleased to release Interlock, a Rails plugin for maintainable and high-efficiency caching. Documentation is here.

what it does

Interlock uses memcached to make your view fragments and associated controller blocks march along together. If a fragment is fresh, the controller behavior won‘t run. This eliminates duplicate effort from your request cycle. Your controller blocks run so infrequently that you can use regular ActiveRecord finders and not worry about object caching at all.

Interlock automatically tracks invalidation dependencies based on the model lifecyle, and supports arbitrary levels of scoping per-block.

production-ready

Interlock has full C0 test coverage and has been used in production on CHOW for five or six weeks already. We’ve seen 3-4x speedups on controllers that were already accelerated with Sphinx and cache_fu.

If you do have any problems (gosh), please report them on the forum instead of the blog comments.

Anyway, go read the docs; everything is there. In closing, here is a lolcat I made with my own kitten. He’s new: