rubygems memory patch

This patch against RubyGems 1.1.1 improves memory usage by not keeping every unused gemspec permanently in memory. It should have low CPU impact as long as you do your gem requires up-front.

For MacPorts:

cd /opt/local/lib/ruby/site_ruby/1.8
curl https://evanweaver.files.wordpress.com/2010/12/rubygems-memory-1_1_1.diff \
  | sudo patch -p0

Incidentally, I used BleakHouse to track down which references were getting retained.

twitter

Right now, Twitter is suffering slowdowns. Earlier today it was down again. :(

There are an excessive number of single points of failure in the current system, and through developer error* and external circumstance we have managed to hit quite a few of them in the last week. I am sorry and embarrassed.

In particular, we mis-estimated the impact of some cache policy changes. If your site runs so hot that it can’t function without memcached, you’d better understand exactly how much buffer capacity you actually have.

We’re working on fixing it all, but it takes a long time…

* I ain’t sayin’ it was me, but I ain’t sayin’ it wasn’t.

sweeper

Automatically tag your music collection with metadata from Last.fm.

what it is

A while back Last.fm released a command line tool to retrieve metadata for an arbitrary mp3 from their new fingerprint database. I tried it yesterday and it seemed way better than MusicBrainz. So, as a person with a lot of random mp3s, I cooked up a script for retagging entire folders of songs.

Some neat things used in the script:

  • id3lib-ruby for
    handling mp3 tags
  • Text for calculating
    Levenshtein distance to the nearest correct genre name—amatch is a compiled version
    of the same thing, but not Windows-compatible
  • the incredibly comprehensive Last.fm
    API
  • XSD::Mapping for parsing the XML responses (better than
    Hpricot for small, well-formed documents)

A handy feature in the script is the ability to add the top 10 tagged genres to the comment field, so you can use iTunes or Foobar smart playlists for fancier multi-genre sorting. This is similar to lastfmtagger, but not Mac-specific.

demo

Before running sweeper --genre:

$ id3info 1_001.mp3
*** Tag information for 1_001.mp3
*** mp3 info
MPEG1/layer III
Bitrate: 128KBps
Frequency: 44KHz

After:

$ id3info 1_001.mp3
*** Tag information for 1_001.mp3
=== TPE1 (Lead performer(s)/Soloist(s)): Photon Band
=== TIT2 (Title/songname/content description): To Sing For You
=== WORS (Official internet radio station homepage): http://www.last.fm/music/Ph
oton+Band/_/To+Sing+For+You
=== TCON (Content type): Psychadelic
=== COMM (Comments): ()[]: rock, psychedelic, mod, Philly
*** mp3 info
MPEG1/layer III
Bitrate: 128KBps
Frequency: 44KHz

quickstart

Documentation is here, but for OS X:

sudo port install id3lib
sudo gem install sweeper
sweeper --help

Linux is similar to the above, depending on your distribution.

On Windows, you can just download a zipfile from the Rubyforge page and extract sweeper.exe to somewhere in your path.

I expect this to be eventually replaced by an official Last.fm tool, but for now, patches are welcome. It would be especially nice if someone could write a tutorial to help non-Ruby people install the script.

If you are going to contribute some code, grab the SVN checkout from Fauna, since the gem doesn’t ship with the test mp3s.

SVN, I know—how embarrassing!

bleakhouse 4

BleakHouse 4 came to life this weekend.

new implementation

BleakHouse now tracks the spawn points of every object on the heap, somewhat like Valgrind and somewhat like Dike.

This means there is no framing necessary, and the analysis task runs in seconds instead of hours. On the other hand, the pure-C instrumentation means it’s fast enough to run in production, won’t introduce new leaks in your app, and can track T_NODE and other Ruby internals.

sample

After exactly 2000 requests:

$ bleak /tmp/bleak.13795.0.dump
1334329 total objects
Final heap size 1334329 filled, 1132647 free
Displaying top 100 most common line/class pairs
408149 __null__:__null__:__node__
273858 (eval):3:String
135304 __null__:__null__:String
29998 /opt/local/lib/ruby/gems/1.8/gems/mongrel-1.1.4/lib/mongrel.rb:122:String
14000 /rails/activesupport/lib/active_support/core_ext/hash/keys.rb:8:String
11825 /rails/actionpack/lib/action_controller/base.rb:1215:String
7022 /opt/local/lib/ruby/site_ruby/1.8/rubygems/specification.rb:557:Array
5995 /rails/actionpack/lib/action_controller/session/cookie_store.rb:145:String
4524 /opt/local/lib/ruby/gems/1.8/specifications/gettext-1.90.0.gemspec:14:String
4000 /opt/local/lib/ruby/1.8/cgi/session.rb:299:Array
4000 /rails/actionpack/lib/action_controller/response.rb:10:Array
...

Somebody’s got an eval leak, for sure. And those session.rb counts are pretty suspicious.

The BleakHouse docs are here. The codebase is very solid and I look forward to adding some neat things in 4.1 and 4.2.

credit where it’s due

Part of the development of BleakHouse 4 was sponsored by a Rails company you have definitely heard of.

rails search benchmarks

I put together some benchmarks for the three main Rails fulltext search solutions: Sphinx/Ultrasphinx, Ferret/acts_as_ferret, and Solr/acts_as_solr. The book Advanced Rails Recipes was a big help in getting Ferret and Solr running quickly.

dataset

The dataset is the entire KJV Bible, indexed by verse and also by book. This gives us 31,102 smallish records and 66 large ones. Ferret and Solr both use a Ruby method for loading the per-book contents (since they traverse a Rails association), while Sphinx (with Ultrasphinx) uses :concatenate to generate a GROUP_CONCAT MySQL query.

You can checkout or browse the benchmark app yourself from here. Especially note the model configurations. The app should be runnable; the migrations include the dataset load.

performance results

These results exercise some basic queries of varying sizes. Some things are not covered; I may update the benches in the future for facet, filter, phrase, and operator usage.

I search 300 times each for a common short word in verses, the same word in books, the same word in all classes, then a rare pair of words in all classes, and finally a very rare long phrase in all classes. All engines were configured for no stopwords.

$ INDEX=1 rake benchmark

Sphinx
                               user     system      total        real
reindex                    0.000000   0.010000   2.310000 (  8.323945)
verse:god                  1.090000   0.160000   1.250000 ( 31.020551)
verse:god:no_ar            0.720000   0.080000   0.800000 ( 27.364780)
book:god                   0.980000   0.100000   1.080000 ( 26.839016)
all:god                    0.970000   0.100000   1.070000 ( 20.297412)
all:calves                 1.030000   0.110000   1.140000 ( 22.806805)
all:moreover               0.980000   0.120000   1.100000 ( 27.763920)
result counts: [3595, 64, 3659, 5, 2]
index size: 7.6M        total
memory usage in kb: {:virtual=>35356, :total=>35688, :real=>332}

Solr
                               user     system      total        real
reindex                  403.500000   4.650000 408.150000 (500.704153)
verse:god                  2.530000   0.500000   3.030000 ( 30.330766)
book:god                   1.910000   0.280000   2.190000 ( 30.164732)
all:god                    2.940000   0.360000   3.300000 ( 30.864319)
all:calves                 2.250000   0.330000   2.580000 ( 19.039895)
all:moreover               1.860000   0.300000   2.160000 ( 23.407134)
result counts: [4077, 64, 4141, 5, 2]
index size: 7.8M        total
memory usage in kb: {:virtual=>219376, :total=>298644, :real=>79268}

Ferret
                               user     system      total        real
reindex                    0.830000   2.130000   2.960000 (512.818894)
verse:god                  0.760000   0.210000   0.970000 (  2.557016)
book:god                   0.740000   0.030000   0.770000 (  1.914840)
all:god                  144.460000   4.430000 148.890000 (602.861915)
all:calves                 1.010000   0.050000   1.060000 (  3.033010)
all:moreover               0.710000   0.060000   0.770000 (  4.185469)
result counts: [3893, 64, 3957, 7, 2]
index size: 13M         total
memory usage in kb: {:virtual=>47272, :total=>112060, :real=>64788}

The horrible Ferret performance for “all:god” happened consistently. The log suggests that it does not use any kind of limit in multi-model search in order to ensure the relevance order is correct. This is a big fail.

The “real” column times for Sphinx and Solr are the same, which suggests that the bulk of it is socket overhead. The dataset is too small for the actual query time to have an effect. However, it looks like Ferret is reusing the socket (via DRb) which is a point in its favor. Sphinx currently does not support persistent connections.

It is important to realize this does not mean that Ferret is fastest overall. It means that Ferret is fastest for small datasets where the constant socket overhead dwarfs the logarithmic actual lookup overhead.

Do note that the “total” time spent (e.g., time spent in Ruby, instead of waiting on IO) is much lower for Sphinx than for Solr.

It would help the benchmark validity to run many query containers in parallel on separate machines, and to use a much larger dataset.

Other people’s benchmarks suggest that Sphinx starts to scale really well as query volume increases. Solr is likely to be within the same order of magnitude.

quality results

An extremely crude evaluation of search quality: which result set for the word “God” has the word repeated the most times in the records?

# Sphinx
Ultrasphinx::Search.new(:class_names => 'Verse', :query => "God",
:per_page => 10).run.map(&:content).join(" ").split(" ").
select{|s| s[/god/i]}.size
=> 45

# Solr
Verse.find_by_solr("God", :limit => 10).docs.map(&:content).join(" ").
split(" ").select{|s| s[/god/i]}.size
=> 30

# Ferret
Verse.find_by_contents("God", :limit => 10).map(&:content).join(" ").
split(" ").select{|s| s[/god/i]}.size
=> 26

Not much, but it’s something.

thoughts on usage

It’s interesting how similar the acts_as_ferret and acts_as_solr query interfaces are, and how different Ultrasphinx’s is. Multi-model search is an afterthought in Ferret and Solr, and it shows. (No other Rails Sphinx plugin supports multi-model search.)

The configuration interfaces are pretty similar until you start to get into engine-specific stuff like Ultrasphinx’s :association_sql, or Ferret’s analysis modules. Solr has its scary schema.xml but acts_as_solr hides that from you.

Ultrasphinx has some initialization annoyances which acts_as_solr doesn’t suffer from.

Ferret acted weird alongside Ultrasphinx unless I specifically required some acts_as_ferret files in environment.rb. Ferret also will index your entire table when you first reference the constant, which was a big surprise. In general Ferret is overly coupled. Solr is better and acts_as_solr does an especially nice job of hiding the Java from you.

I didn’t test any faceting ability. Solr probably has the best facet implementation. Ferret doesn’t seem to support facets at all.

on coupling

Ferret is unstable under load, and due to its very tight coupling, takes down your Rails containers along with it. Solr is pretty stable, but suffers from the opposite problem—if something goes wrong in a Rails container, and a record callback doesn’t fire, that record will never get indexed.

Sphinx avoids both these problems because it integrates through the database, not through the application. This is what databases are for. Sphinx is incredibly stable, but even if something happens to it, the loose coupling means that the only thing that fails in your app is search. And since it doesn’t rely on container callbacks, your index is always correct. This is the main reason I wrote Ultrasphinx.

Both Solr and Ferret are too slow to reindex on a regular basis. They could be much, much faster if they didn’t have to roundtrip every record through Ruby, but that’s how they’re designed.

Takeaway lesson—be deliberate about your integration points.

delta indexing support in ultrasphinx

Ahead of schedule, Ultrasphinx 1.9 is out with delta indexing, ERB support in the .base files, and official compatibility with Sphinx 0.9.8-rc1.

what it is

Delta indexing speeds up your updates by not reindexing the entire dataset every time. Instead, it keeps a main index which is updated rarely, and a delta index, which is updated frequently and only contains recently changed records.

Of course, your records need timestamps for this to work.

See the documentation for more details. There is also an explanation of the implementation on the forum.

gotchas

Note that there are some gotchas surrounding Sphinx and index merges, mainly that facet counts and text sorting may not be perfectly accurate. In an append-rich environment (most web apps) these tend not to matter.

valgrind and ruby

In which we learn about the two kinds of memory leaks, and how to use Valgrind on Ruby 1.8.6.

If you just came for the patch, it’s here.

remedial leaking

A Ruby application can leak memory in two ways.

First, it can leak space on the Ruby heap. This happens when a Ruby object does not go out of scope in your app. The Ruby garbage collector is aware of the object, but it is not allowed to garbage collect it because there still is a reference somewhere. These leaks tend to grow slowly. Your Rails app definitely has this kind of leak, especially if it uses the ActiveRecord session store.

Second, it can leak C memory. This happens when a C extension calls malloc() or its friends at some point, but doesn’t always properly call free(). The process totally forgets that the space even exists. These kind of leaks can grow incredibly fast. Your Rails app might have this kind of leak if you use a broken C extension.

Most C extensions are broken. We are going to see how to fix them with Valgrind, using the memcached client as an example.

about valgrind

Valgrind is a Unix application for detecting C-based memory leaks and race conditions. It is awesome. (It’s no longer Linux-only; thanks to the heroic work of Nicholas Nethercote and Julian Seward, Valgrind is OS X-compatible as of version 3.5.0.)

Valgrind works by running your entire process in an x86 virtual machine. It tracks memory allocations and deallocations in a parallel memory space. This means that it’s extremely accurate, but slow.

Image from here.

Valgrind, in Norse mythology, is the sacred gate to Valhalla through which only the chosen slain can pass. Pronounce it “val-grinned” or “vul-grinned”.

prereqs setup

Install Valgrind in the usual way from the current releases page (it is also available in MacPorts):

mkdir ~/src
cd ~/src
wget 'http://valgrind.org/downloads/valgrind-3.5.0.tar.bz2'
tar xjf valgrind-3.5.0.tar.bz2
./configure
make
sudo make install

Now you need to patch your Ruby build so that it doesn’t throw spurious Valgrind errors. Previously, people have used suppression files to try to avoid the errors, but that is kind of lame (I guess Dave Balmain’s version is the best). But instead of marking particular functions as ignored, let’s mark the actual memory in question as safe.

I made a patch to do that on 1.8.6, and it’s here (note that Ruby 1.9 has a --with-valgrind flag in it already). So, to install:

cd ~/src
wget 'ftp://ftp.ruby-lang.org/pub/ruby/1.8/ruby-1.8.6-p111.tar.gz'
tar xzf ruby-1.8.6-p111.tar.gz
cd ruby-1.8.6-p111
curl 'http://github.com/fauna/bleak_house/tree/master%2Fruby%2Fvalgrind.patch?raw=true' \
  > valgrind.patch
patch -p0 < valgrind.patch
autoconf
export CFLAGS=" -ggdb -DHAVE_DEBUG"
./configure --enable-valgrind
make
sudo make install

Ideally this will not confuse our existing RubyGems install, but you might have to fuss with it a little bit. Also, we use the debugging flags so that Valgrind can give us nice backtraces.

Please, if you are installing this on a production server, use ./configure --enable-valgrind --prefix=/opt/ruby-valgrind to set up a parallel Ruby build instead of overwriting your existing one.

application setup

Now, let’s get the memcached client set up so that we can work on an actual leak. First, we need libmemcached (the C library it’s based on), and echoe, which gives us some handy Rake tasks:

cd ~/src
wget 'http://download.tangent.org/libmemcached-0.15.tar.gz'
tar xzf libmemcached-0.15.tar.gz
cd libmemcached-0.15
./configure --enable-debug
make
sudo make install
sudo gem install echoe

We also need memcached 1.2.4; you can install that one yourself if you don’t already have it.

Finally, we check out the extension code:

cd ~/src
svn co http://fauna.rubyforge.org/svn/memcached/branches/valgrind-example/ memcached
cd memcached

searching, always searching

We’re ready to go! In this case we will test the Memcached#get method. There’s a runner for it in test/profile/valgrind.rb. The part we are interested in looks like this:

when "get"
  @i.times do
    @cache.get @key1
  end

We’ll use some Rake tasks to try it out:

DEBUG=1 rake compile
METHOD=get LOOPS=100 rake valgrind

This spews out 150 very interesting lines. But for now, we only care about the end:

...
==19026== LEAK SUMMARY:
==19026==    definitely lost: 1,588 bytes in 134 blocks.
==19026==      possibly lost: 6,880 bytes in 225 blocks.
==19026==    still reachable: 3,598,680 bytes in 30,127 blocks.
==19026==         suppressed: 0 bytes in 0 blocks.
==19026== Reachable blocks (those to which a pointer was found) are not shown.
==19026== To see them, rerun with: --leak-check=full --show-reachable=yes

Hmm, what’s this? “Definitely lost” is of great concern. “Possibly lost” is of significant concern. “Still reachable” is the Ruby heap—we don’t really know if those are leaks or not, but they probably aren’t.

So…let’s increase the loops in order to get a delta:

METHOD=get LOOPS=200 rake valgrind

Now we have:

...
==19413== LEAK SUMMARY:
==19413==    definitely lost: 2,120 bytes in 235 blocks.
==19413==      possibly lost: 6,848 bytes in 224 blocks.
==19413==    still reachable: 3,598,567 bytes in 30,125 blocks.
==19413==         suppressed: 0 bytes in 0 blocks.
==19413== Reachable blocks (those to which a pointer was found) are not shown.
==19413== To see them, rerun with: --leak-check=full --show-reachable=yes

Oh man. Our “definitely lost” count went up significantly. This means there is a leak in our little bit of code. So let’s scroll up and find a section that leaked at least that much memory:

...
==19413== 1,000 bytes in 200 blocks are definitely lost in loss record 10 of 16
==19413==    at 0x4904E27: malloc (vg_replace_malloc.c:207)
==19413==    by 0x4F78A04: memcached_string_c_copy (memcached_string.c:146)
==19413==    by 0x4F74872: memcached_fetch (memcached_fetch.c:161)
==19413==    by 0x4F74D22: memcached_get_by_key (memcached_get.c:34)
==19413==    by 0x4F74CB4: memcached_get (memcached_get.c:12)
==19413==    by 0x4E1C8D1: memcached_get_rvalue (rlibmemcached_wrap.c:1875)
==19413==    by 0x4E3FF82: _wrap_memcached_get_rvalue (rlibmemcached_wrap.c:8723)
==19413==    by 0x41B7FC: call_cfunc (eval.c:5700)
==19413==    by 0x41ABAC: rb_call0 (eval.c:5856)
==19413==    by 0x41C287: rb_call (eval.c:6103)
==19413==    by 0x4146A0: rb_eval (eval.c:3479)
==19413==    by 0x4138D0: rb_eval (eval.c:3267)
==19413==    by 0x4156C7: rb_eval (eval.c:3658)
==19413==    by 0x41B432: rb_call0 (eval.c:6007)
==19413==    by 0x41C287: rb_call (eval.c:6103)
...

Hey, there are some of our methods, and it even mentions “200 blocks”—our exact loop count. So we are leaking 5 bytes per loop.

fixin it

Let’s look at those methods mentioned in the output. We’ll start with memcached_get_rvalue, since the rest are part of libmemcached and SWIG, which we will assume are fine. This is the principle of “first cast out the beam out of thine own eye”, or “my own code is always at fault”:

VALUE memcached_get_rvalue(memcached_st *ptr, char *key, size_t key_length,
  uint32_t *flags, memcached_return *error) {
  VALUE ret;
  size_t value_length;
  char *value = memcached_get(ptr, key, key_length, &value_length, flags, error);
  ret = rb_str_new(value, value_length);
  return ret;
};

Hmm, that looks ok. It’s not like we malloc anything. But that char* seems shady. Let’s check the libmemcached man page about memcached_get:

memcached_get() is used to fetch an individual value from the server. You must pass in a key and its length to fetch the object. You must supply three pointer variables which will give you the state of the returned object. A uint32_t pointer to contain whatever flags you stored with the value, a size_t pointer which will be filled with size of of the object, and a memcached_return pointer to hold any error. The object will be returned upon success and NULL will be returned on failure. Any object returned by memcached_get() must be released by the caller application.

Well, crap. It is our responsibility to free that pointer. So:

  ret = rb_str_new(value, value_length);
+  free(value);
  return ret;

We recompile and run Valgrind again:

DEBUG=1 rake compile
METHOD=get LOOPS=200 rake valgrind

Sure enough:

...
==20475== LEAK SUMMARY:
==20475==    definitely lost: 1,120 bytes in 35 blocks.
==20475==      possibly lost: 6,848 bytes in 224 blocks.
==20475==    still reachable: 3,598,567 bytes in 30,125 blocks.
==20475==         suppressed: 0 bytes in 0 blocks.
==20475== Reachable blocks (those to which a pointer was found) are not shown.
==20475== To see them, rerun with: --leak-check=full --show-reachable=yes

And more importantly, that method is totally gone from the handful of other backtraces we got out of Valgrind. Success!

conclusion

Hey so, that pretty much rocked. Valgrind is simple and easy to use. Just to be clear, here’s the command string that the Rake task runs:

valgrind --tool=memcheck --leak-check=yes --num-callers=15 \
  --track-fds=yes ruby test/profile/valgrind.rb

Memcheck is the particular function of Valgrind we were using. There are also tools like Cachegrind, Callgrind, and Massif, which you can read about in the manual. Each one serves a different purpose—Valgrind is really the platform for a number of tools.

So…go to it.

safe nils in ruby

Groovy has a feature called safe navigation. It lets you send messages to nil and have them return nil if you’re in a special context. This is neat because it means in Grails views, where you might expect to do:

  ${ if (user.photo != NULL) {user.photo.url} }

You can instead do:

  ${ user.photo?.url }

The special context is explicitly signalled by the ? on the accessor.

ok switching to grails

Nah. What if we could add it to Ruby? Be aware that empty arrays behave this way already, because empty array comprehensions return [], which is safe to keep calling. So we just need:

class NilClass
  def method_missing(*args)
    super unless caller.first =~ /`_run_erb/
  end
end

Now, as long as we’re in a special context (ERb), we can chain our nils.

where should we keep it?

For now environment.rb makes a fine home. If it makes people feel better, I could put it in a plugin, or maybe a Rails patch. I am not using this in production (yet), but I want to hear thoughts.

be the fastest you can be, memcached

New memcached client based on SWIG/libmemcached. 15 to 150 times faster than memcache-client, depending on the architecture. Full coverage, benchmarks.

tell me

Some nice results from OS X x86:

                                     user     system      total
set:ruby:noblock:memcached       0.100000   0.010000   0.110000
set:ruby:memcached               0.150000   0.140000   0.290000
set:ruby:memcache-client        18.070000   0.310000  18.380000
get:ruby:memcached               0.180000   0.140000   0.320000
get:ruby:memcache-client        18.210000   0.320000  18.530000
missing:ruby:memcached           0.290000   0.170000   0.460000
missing:ruby:memcache-client    18.110000   0.330000  18.440000
mixed:ruby:noblock:memcached     0.380000   0.340000   0.720000
mixed:ruby:memcached             0.370000   0.280000   0.650000
mixed:ruby:memcache-client      36.760000   0.700000  37.460000

Ubuntu/Xen AMD64 was similar to the above, while RHEL AMD64 was more like 20x. It’s weird how much better Ruby performance was on RHEL.

I’ll try to push a little more Ruby into C, because we’re already down to counting single dispatches. For any deep object, most of the time is spent in Marshal.

features

Built-in non-blocking IO, consistent key modulus, cross-language hash functions, append/prepend/replace operators, thread safety, and other fancy stuff. CAS (compare and swap) coming as soon as libmemcached finishes it.

The API is not compatible with Ruby-MemCache/memcache-client, but it’s pretty close. Don’t drop it into Rails just yet.

ultrasphinx updates

You’re already using Sphinx, right? Of course. It’s not like it’s only the fastest, most stable Rails search solution there is. At least, based on rigorous anecdotal evidence.

better docs

Lots of updates, especially Ultrasphinx deployment docs. To help the fine conductors of EngineYard.

postgres

Postgres support is now considered mature. You need at least version 8.2. Personally I found Postgres totally annoying:

Lines of stored procedures needed:

MySQL: 0

Postgres: 73

sphinx r985

With Pat Allan’s excellent Riddle, good things coming: multivalues, float faceting, geocoding.

Soon, we will get a smoother is_indexed API. I’m taking suggestions. And some time after that…delta updates. A patch would really speed that along.

The most credit goes to the Russians. I’m just connecting pipes. Got plans to explain someday why exactly Sphinx is as fast as it is. Hint: it’s not the enterprise beans!!!

%d bloggers like this: