valgrind and ruby

In which we learn about the two kinds of memory leaks, and how to use Valgrind on Ruby 1.8.6.

If you just came for the patch, it’s here.

remedial leaking

A Ruby application can leak memory in two ways.

First, it can leak space on the Ruby heap. This happens when a Ruby object does not go out of scope in your app. The Ruby garbage collector is aware of the object, but it is not allowed to garbage collect it because there still is a reference somewhere. These leaks tend to grow slowly. Your Rails app definitely has this kind of leak, especially if it uses the ActiveRecord session store.

Second, it can leak C memory. This happens when a C extension calls malloc() or its friends at some point, but doesn’t always properly call free(). The process totally forgets that the space even exists. These kind of leaks can grow incredibly fast. Your Rails app might have this kind of leak if you use a broken C extension.

Most C extensions are broken. We are going to see how to fix them with Valgrind, using the memcached client as an example.

about valgrind

Valgrind is a Unix application for detecting C-based memory leaks and race conditions. It is awesome. (It’s no longer Linux-only; thanks to the heroic work of Nicholas Nethercote and Julian Seward, Valgrind is OS X-compatible as of version 3.5.0.)

Valgrind works by running your entire process in an x86 virtual machine. It tracks memory allocations and deallocations in a parallel memory space. This means that it’s extremely accurate, but slow.

Image from here.

Valgrind, in Norse mythology, is the sacred gate to Valhalla through which only the chosen slain can pass. Pronounce it “val-grinned” or “vul-grinned”.

prereqs setup

Install Valgrind in the usual way from the current releases page (it is also available in MacPorts):

mkdir ~/src
cd ~/src
wget 'http://valgrind.org/downloads/valgrind-3.5.0.tar.bz2'
tar xjf valgrind-3.5.0.tar.bz2
./configure
make
sudo make install

Now you need to patch your Ruby build so that it doesn’t throw spurious Valgrind errors. Previously, people have used suppression files to try to avoid the errors, but that is kind of lame (I guess Dave Balmain’s version is the best). But instead of marking particular functions as ignored, let’s mark the actual memory in question as safe.

I made a patch to do that on 1.8.6, and it’s here (note that Ruby 1.9 has a --with-valgrind flag in it already). So, to install:

cd ~/src
wget 'ftp://ftp.ruby-lang.org/pub/ruby/1.8/ruby-1.8.6-p111.tar.gz'
tar xzf ruby-1.8.6-p111.tar.gz
cd ruby-1.8.6-p111
curl 'http://github.com/fauna/bleak_house/tree/master%2Fruby%2Fvalgrind.patch?raw=true' \
  > valgrind.patch
patch -p0 < valgrind.patch
autoconf
export CFLAGS=" -ggdb -DHAVE_DEBUG"
./configure --enable-valgrind
make
sudo make install

Ideally this will not confuse our existing RubyGems install, but you might have to fuss with it a little bit. Also, we use the debugging flags so that Valgrind can give us nice backtraces.

Please, if you are installing this on a production server, use ./configure --enable-valgrind --prefix=/opt/ruby-valgrind to set up a parallel Ruby build instead of overwriting your existing one.

application setup

Now, let’s get the memcached client set up so that we can work on an actual leak. First, we need libmemcached (the C library it’s based on), and echoe, which gives us some handy Rake tasks:

cd ~/src
wget 'http://download.tangent.org/libmemcached-0.15.tar.gz'
tar xzf libmemcached-0.15.tar.gz
cd libmemcached-0.15
./configure --enable-debug
make
sudo make install
sudo gem install echoe

We also need memcached 1.2.4; you can install that one yourself if you don’t already have it.

Finally, we check out the extension code:

cd ~/src
svn co http://fauna.rubyforge.org/svn/memcached/branches/valgrind-example/ memcached
cd memcached

searching, always searching

We’re ready to go! In this case we will test the Memcached#get method. There’s a runner for it in test/profile/valgrind.rb. The part we are interested in looks like this:

when "get"
  @i.times do
    @cache.get @key1
  end

We’ll use some Rake tasks to try it out:

DEBUG=1 rake compile
METHOD=get LOOPS=100 rake valgrind

This spews out 150 very interesting lines. But for now, we only care about the end:

...
==19026== LEAK SUMMARY:
==19026==    definitely lost: 1,588 bytes in 134 blocks.
==19026==      possibly lost: 6,880 bytes in 225 blocks.
==19026==    still reachable: 3,598,680 bytes in 30,127 blocks.
==19026==         suppressed: 0 bytes in 0 blocks.
==19026== Reachable blocks (those to which a pointer was found) are not shown.
==19026== To see them, rerun with: --leak-check=full --show-reachable=yes

Hmm, what’s this? “Definitely lost” is of great concern. “Possibly lost” is of significant concern. “Still reachable” is the Ruby heap—we don’t really know if those are leaks or not, but they probably aren’t.

So…let’s increase the loops in order to get a delta:

METHOD=get LOOPS=200 rake valgrind

Now we have:

...
==19413== LEAK SUMMARY:
==19413==    definitely lost: 2,120 bytes in 235 blocks.
==19413==      possibly lost: 6,848 bytes in 224 blocks.
==19413==    still reachable: 3,598,567 bytes in 30,125 blocks.
==19413==         suppressed: 0 bytes in 0 blocks.
==19413== Reachable blocks (those to which a pointer was found) are not shown.
==19413== To see them, rerun with: --leak-check=full --show-reachable=yes

Oh man. Our “definitely lost” count went up significantly. This means there is a leak in our little bit of code. So let’s scroll up and find a section that leaked at least that much memory:

...
==19413== 1,000 bytes in 200 blocks are definitely lost in loss record 10 of 16
==19413==    at 0x4904E27: malloc (vg_replace_malloc.c:207)
==19413==    by 0x4F78A04: memcached_string_c_copy (memcached_string.c:146)
==19413==    by 0x4F74872: memcached_fetch (memcached_fetch.c:161)
==19413==    by 0x4F74D22: memcached_get_by_key (memcached_get.c:34)
==19413==    by 0x4F74CB4: memcached_get (memcached_get.c:12)
==19413==    by 0x4E1C8D1: memcached_get_rvalue (rlibmemcached_wrap.c:1875)
==19413==    by 0x4E3FF82: _wrap_memcached_get_rvalue (rlibmemcached_wrap.c:8723)
==19413==    by 0x41B7FC: call_cfunc (eval.c:5700)
==19413==    by 0x41ABAC: rb_call0 (eval.c:5856)
==19413==    by 0x41C287: rb_call (eval.c:6103)
==19413==    by 0x4146A0: rb_eval (eval.c:3479)
==19413==    by 0x4138D0: rb_eval (eval.c:3267)
==19413==    by 0x4156C7: rb_eval (eval.c:3658)
==19413==    by 0x41B432: rb_call0 (eval.c:6007)
==19413==    by 0x41C287: rb_call (eval.c:6103)
...

Hey, there are some of our methods, and it even mentions “200 blocks”—our exact loop count. So we are leaking 5 bytes per loop.

fixin it

Let’s look at those methods mentioned in the output. We’ll start with memcached_get_rvalue, since the rest are part of libmemcached and SWIG, which we will assume are fine. This is the principle of “first cast out the beam out of thine own eye”, or “my own code is always at fault”:

VALUE memcached_get_rvalue(memcached_st *ptr, char *key, size_t key_length,
  uint32_t *flags, memcached_return *error) {
  VALUE ret;
  size_t value_length;
  char *value = memcached_get(ptr, key, key_length, &value_length, flags, error);
  ret = rb_str_new(value, value_length);
  return ret;
};

Hmm, that looks ok. It’s not like we malloc anything. But that char* seems shady. Let’s check the libmemcached man page about memcached_get:

memcached_get() is used to fetch an individual value from the server. You must pass in a key and its length to fetch the object. You must supply three pointer variables which will give you the state of the returned object. A uint32_t pointer to contain whatever flags you stored with the value, a size_t pointer which will be filled with size of of the object, and a memcached_return pointer to hold any error. The object will be returned upon success and NULL will be returned on failure. Any object returned by memcached_get() must be released by the caller application.

Well, crap. It is our responsibility to free that pointer. So:

  ret = rb_str_new(value, value_length);
+  free(value);
  return ret;

We recompile and run Valgrind again:

DEBUG=1 rake compile
METHOD=get LOOPS=200 rake valgrind

Sure enough:

...
==20475== LEAK SUMMARY:
==20475==    definitely lost: 1,120 bytes in 35 blocks.
==20475==      possibly lost: 6,848 bytes in 224 blocks.
==20475==    still reachable: 3,598,567 bytes in 30,125 blocks.
==20475==         suppressed: 0 bytes in 0 blocks.
==20475== Reachable blocks (those to which a pointer was found) are not shown.
==20475== To see them, rerun with: --leak-check=full --show-reachable=yes

And more importantly, that method is totally gone from the handful of other backtraces we got out of Valgrind. Success!

conclusion

Hey so, that pretty much rocked. Valgrind is simple and easy to use. Just to be clear, here’s the command string that the Rake task runs:

valgrind --tool=memcheck --leak-check=yes --num-callers=15 \
  --track-fds=yes ruby test/profile/valgrind.rb

Memcheck is the particular function of Valgrind we were using. There are also tools like Cachegrind, Callgrind, and Massif, which you can read about in the manual. Each one serves a different purpose—Valgrind is really the platform for a number of tools.

So…go to it.

16 responses

  1. I am seeing this in my output:

    ==24692== 155,728 (155,640 direct, 88 indirect) bytes in 1,511
    blocks are definitely lost in loss record 23 of 26
    ==24692==    at 0x4A059F6: malloc (vg_replace_malloc.c:149)
    ==24692==    by 0x43226B: ruby_xmalloc (gc.c:114)
    ==24692==    by 0x465E01: local_append (parse.y:5640)
    ==24692==    by 0x465F64: local_cnt (parse.y:5667)
    ==24692==    by 0x4646AC: assignable (parse.y:4902)
    ==24692==    by 0x458E80: ruby_yyparse (parse.y:844)
    ==24692==    by 0x45E5F4: yycompile (parse.y:2606)
    ==24692==    by 0x45E999: rb_compile_file (parse.y:2698)
    ==24692==    by 0x48B84D: load_file (ruby.c:967)
    ==24692==    by 0x48B337: proc_options (ruby.c:843)
    ==24692==    by 0x48BE10: ruby_process_options (ruby.c:1218)
    ==24692==    by 0x40FC26: ruby_options (eval.c:1531)
    ==24692==    by 0x40D7A4: main (main.c:47)
    

    I’m guessing it’s nothing to worry about, but for some reason it’s not marked as safe.

  2. That’s a pretty interesting thread regarding god.

    In answer to your question, the leak size doesn’t increase with my code, it’s due to some stuff in rails – which I was loading under valgrind to test a C extension I was writing.

    It’s proving difficult to track down those leaks due to the quantity of code involved, but at least I can say that doing the following does cause a tiny leak:

    require 'rubygems'
    Gem.searcher.find("something")
    

    (32 bytes definitely lost, 3712 bytes possibly lost)

    I’ll see if I can track down what causes that tomorrow .

  3. Tracking it through the code, I isolated this:

    a=eval "b=1"
    

    Which results in this:

    ==11194== 32 bytes in 1 blocks are definitely
    lost in loss record 1 of 6
    ==11194==    at 0x4A059F6: malloc (vg_replace_malloc.c:149)
    ==11194==    by 0x43226B: ruby_xmalloc (gc.c:114)
    ==11194==    by 0x465E01: local_append (parse.y:5640)
    ==11194==    by 0x465F64: local_cnt (parse.y:5667)
    ==11194==    by 0x4646AC: assignable (parse.y:4902)
    ==11194==    by 0x458E80: ruby_yyparse (parse.y:844)
    ==11194==    by 0x45E5F4: yycompile (parse.y:2606)
    ==11194==    by 0x45E999: rb_compile_file (parse.y:2698)
    ==11194==    by 0x48B84D: load_file (ruby.c:967)
    ==11194==    by 0x48B337: proc_options (ruby.c:843)
    ==11194==    by 0x48BE10: ruby_process_options (ruby.c:1218)
    ==11194==    by 0x40FC26: ruby_options (eval.c:1531)
    ==11194==    by 0x40D7A4: main (main.c:47)
    

    Plus there are 40 bytes of “possibly lost” memory.

    This is the same stack trace as in the issue I originally mentioned. But at this point I don’t know if the leak is illusory or real.

    Interestingly, if you change it to b=eval "b=1" the definite loss goes away.

  4. It seems to be a real leak. Test code:

    def grow
      for i in 1..100
        eval "b#{i}=1"
      end
    end
    5000.times {grow}
    

    And the output:

    ==18706== 217,988,864 bytes in 499,985 blocks are
    definitely lost in loss record 6 of 6
    ==18706==    at 0x4A05AF7: realloc (vg_replace_malloc.c:306)
    ==18706==    by 0x432398: ruby_xrealloc (gc.c:151)
    ==18706==    by 0x465E9C: local_append (parse.y:5649)
    ==18706==    by 0x465F64: local_cnt (parse.y:5667)
    ==18706==    by 0x4646AC: assignable (parse.y:4902)
    ==18706==    by 0x458E80: ruby_yyparse (parse.y:844)
    ==18706==    by 0x45E5F4: yycompile (parse.y:2606)
    ==18706==    by 0x45E8F4: rb_compile_string (parse.y:2676)
    ==18706==    by 0x41DDF3: compile (eval.c:6412)
    ==18706==    by 0x41E289: eval (eval.c:6493)
    ==18706==    by 0x41E817: rb_f_eval (eval.c:6611)
    ==18706==    by 0x41C765: call_cfunc (eval.c:5700)
    ==18706==    by 0x41BB04: rb_call0 (eval.c:5856)
    ==18706==    by 0x41D291: rb_call (eval.c:6103)
    ==18706==    by 0x415182: rb_eval (eval.c:3494)
    

    You can fiddle with the numbers and it will grow accordingly.

    I guess I’d better write a note to ruby-core.

  5. I’m not surprised eval has a leak, since it’s one of the few places you can generate a parse tree dynamically.

    Are you on the latest 1.8.6 patchlevel?

  6. Yes, I have p114, verified both with the valgrind patch applied and without.

    I popped a note onto ruby-core hoping Matz or someone closer to the parser than me will pick up on it, as figuring out the right way to fix this will be quite challenging.