rails search benchmarks

I put together some benchmarks for the three main Rails fulltext search solutions: Sphinx/Ultrasphinx, Ferret/acts_as_ferret, and Solr/acts_as_solr. The book Advanced Rails Recipes was a big help in getting Ferret and Solr running quickly.

dataset

The dataset is the entire KJV Bible, indexed by verse and also by book. This gives us 31,102 smallish records and 66 large ones. Ferret and Solr both use a Ruby method for loading the per-book contents (since they traverse a Rails association), while Sphinx (with Ultrasphinx) uses :concatenate to generate a GROUP_CONCAT MySQL query.

You can checkout or browse the benchmark app yourself from here. Especially note the model configurations. The app should be runnable; the migrations include the dataset load.

performance results

These results exercise some basic queries of varying sizes. Some things are not covered; I may update the benches in the future for facet, filter, phrase, and operator usage.

I search 300 times each for a common short word in verses, the same word in books, the same word in all classes, then a rare pair of words in all classes, and finally a very rare long phrase in all classes. All engines were configured for no stopwords.

$ INDEX=1 rake benchmark

Sphinx
                               user     system      total        real
reindex                    0.000000   0.010000   2.310000 (  8.323945)
verse:god                  1.090000   0.160000   1.250000 ( 31.020551)
verse:god:no_ar            0.720000   0.080000   0.800000 ( 27.364780)
book:god                   0.980000   0.100000   1.080000 ( 26.839016)
all:god                    0.970000   0.100000   1.070000 ( 20.297412)
all:calves                 1.030000   0.110000   1.140000 ( 22.806805)
all:moreover               0.980000   0.120000   1.100000 ( 27.763920)
result counts: [3595, 64, 3659, 5, 2]
index size: 7.6M        total
memory usage in kb: {:virtual=>35356, :total=>35688, :real=>332}

Solr
                               user     system      total        real
reindex                  403.500000   4.650000 408.150000 (500.704153)
verse:god                  2.530000   0.500000   3.030000 ( 30.330766)
book:god                   1.910000   0.280000   2.190000 ( 30.164732)
all:god                    2.940000   0.360000   3.300000 ( 30.864319)
all:calves                 2.250000   0.330000   2.580000 ( 19.039895)
all:moreover               1.860000   0.300000   2.160000 ( 23.407134)
result counts: [4077, 64, 4141, 5, 2]
index size: 7.8M        total
memory usage in kb: {:virtual=>219376, :total=>298644, :real=>79268}

Ferret
                               user     system      total        real
reindex                    0.830000   2.130000   2.960000 (512.818894)
verse:god                  0.760000   0.210000   0.970000 (  2.557016)
book:god                   0.740000   0.030000   0.770000 (  1.914840)
all:god                  144.460000   4.430000 148.890000 (602.861915)
all:calves                 1.010000   0.050000   1.060000 (  3.033010)
all:moreover               0.710000   0.060000   0.770000 (  4.185469)
result counts: [3893, 64, 3957, 7, 2]
index size: 13M         total
memory usage in kb: {:virtual=>47272, :total=>112060, :real=>64788}

The horrible Ferret performance for “all:god” happened consistently. The log suggests that it does not use any kind of limit in multi-model search in order to ensure the relevance order is correct. This is a big fail.

The “real” column times for Sphinx and Solr are the same, which suggests that the bulk of it is socket overhead. The dataset is too small for the actual query time to have an effect. However, it looks like Ferret is reusing the socket (via DRb) which is a point in its favor. Sphinx currently does not support persistent connections.

It is important to realize this does not mean that Ferret is fastest overall. It means that Ferret is fastest for small datasets where the constant socket overhead dwarfs the logarithmic actual lookup overhead.

Do note that the “total” time spent (e.g., time spent in Ruby, instead of waiting on IO) is much lower for Sphinx than for Solr.

It would help the benchmark validity to run many query containers in parallel on separate machines, and to use a much larger dataset.

Other people’s benchmarks suggest that Sphinx starts to scale really well as query volume increases. Solr is likely to be within the same order of magnitude.

quality results

An extremely crude evaluation of search quality: which result set for the word “God” has the word repeated the most times in the records?

# Sphinx
Ultrasphinx::Search.new(:class_names => 'Verse', :query => "God",
:per_page => 10).run.map(&:content).join(" ").split(" ").
select{|s| s[/god/i]}.size
=> 45

# Solr
Verse.find_by_solr("God", :limit => 10).docs.map(&:content).join(" ").
split(" ").select{|s| s[/god/i]}.size
=> 30

# Ferret
Verse.find_by_contents("God", :limit => 10).map(&:content).join(" ").
split(" ").select{|s| s[/god/i]}.size
=> 26

Not much, but it’s something.

thoughts on usage

It’s interesting how similar the acts_as_ferret and acts_as_solr query interfaces are, and how different Ultrasphinx’s is. Multi-model search is an afterthought in Ferret and Solr, and it shows. (No other Rails Sphinx plugin supports multi-model search.)

The configuration interfaces are pretty similar until you start to get into engine-specific stuff like Ultrasphinx’s :association_sql, or Ferret’s analysis modules. Solr has its scary schema.xml but acts_as_solr hides that from you.

Ultrasphinx has some initialization annoyances which acts_as_solr doesn’t suffer from.

Ferret acted weird alongside Ultrasphinx unless I specifically required some acts_as_ferret files in environment.rb. Ferret also will index your entire table when you first reference the constant, which was a big surprise. In general Ferret is overly coupled. Solr is better and acts_as_solr does an especially nice job of hiding the Java from you.

I didn’t test any faceting ability. Solr probably has the best facet implementation. Ferret doesn’t seem to support facets at all.

on coupling

Ferret is unstable under load, and due to its very tight coupling, takes down your Rails containers along with it. Solr is pretty stable, but suffers from the opposite problem—if something goes wrong in a Rails container, and a record callback doesn’t fire, that record will never get indexed.

Sphinx avoids both these problems because it integrates through the database, not through the application. This is what databases are for. Sphinx is incredibly stable, but even if something happens to it, the loose coupling means that the only thing that fails in your app is search. And since it doesn’t rely on container callbacks, your index is always correct. This is the main reason I wrote Ultrasphinx.

Both Solr and Ferret are too slow to reindex on a regular basis. They could be much, much faster if they didn’t have to roundtrip every record through Ruby, but that’s how they’re designed.

Takeaway lesson—be deliberate about your integration points.

34 responses

  1. Nice write up. Sphinx has shown itself to be the most stable and consistent under load of these three search options at Engine Yard. Ferret is not stable enough to run production apps on. Solr is a nice search engine and has some advanced search features Sphinx doesn’t have, but Sphinx wins for speed and stability and least resource consumption.

  2. You probably should make a clearer distinction between Ferret and acts_as_ferret. Ferret is the underlying Ruby implemention of Lucene. It provides generic full-text searching. Acts_as_ferret is a nice Rails plugin that integrates Ferret with your Rails models. Your comments about indexing (including speed) really are particular to the way that acts_as_ferret is built. I can’t tell exactly but the issue with multi-model search also may not be a Ferret issue per se.

    As for stability, we use Ferret (and a heavily modified acts_as_ferret) in production here at vodpod.com. Our approach is to perform indexing as a batch task, and have mongrel instances only serving search results. With this setup we very rarely see any mongrel crashes (although I can’t say it never happens). Also, a good production setup should probably use a centralized search service accessed through DRb or HTTP, in which case Ferret faults will not affect your main server.

  3. I admit I may be missing something, but I don’t really see how to run the benchmark. I poked through the SVN repo and didn’t find the runnable code.

  4. Sujal: I had forgotten to check in the Rake script. It’s in now; you will need to install some dependencies, make sure the Ultrasphinx symlink is ok, and then run INDEX=1 rake benchmark.

    It would be great if people submitted patches for additional query types or engine optimizations.

    Scott: True, but in practice we have to use them together. Ferret and Solr are both datasource agnostic, which means there has to be some other service filling the index queue. Sphinx is not source agnostic, so it wouldn’t make sense to compare plugins separately and engines separately.

    If you are batching index updates, you might as well use Sphinx with delta support and cut out an entire layer of the stack. Also, I’m sure Ferret users would appreciate if you released your modified acts_as_ferret.

    Do you see a lot of Ferret daemon crashes? In my tests, it seemed like acts_as_ferret would lock Mongrel if it tried to start and the Ferret daemon was down.

    I agree that the multi-model search is technically an acts_as_ferret issue. Acts_as_solr doesn’t have this problem.

    Also, I did do the tests with the remote DRb server.

    Big thanks for sharing your Ferret experience.

  5. Awesome metrics, but I’d really like to get a feeling of the resource usage of these three options. Really just a rundown of the memory usage would be great.

  6. Thanks for adding the code. I was curious if you could provide a couple of other numbers (I need to install Sphinx and haven’t had a chance to do that yet, otherwise, I’d just run it myself).

    The reason I wanted to look at the code was to see how big a batch you were using in your Solr test.

    I won’t guess at what’s going on without running the test myself, but my first thought was that the memory is being used in different places by the different daemons, and the architecture of each daemon does change how you’d measure that.

    Solr’s rebuild method uses a ton of Ruby memory because it’s loading 4000 objects at a time, IIRC, into the Ruby VM and then pushing it out, so having the total memory used by the Ruby process would be interesting in that case.

    I’m less familiar with acts_as_ferret, so I can’t comment there. Assuming it works like Lucene in Java, though, I would guess that memory and speed might be affected by the Ruby VM’s GC and memory performance, too (not saying that this is why it’s slow, just that it’s possibly a factor).

    It’s been a while since we built our Sphinx based solution (it’s sitting around until we launch our main app), but doesn’t its memory usage show up in the mysqld process?

    Curious what you’re seeing here, and thanks for doing this and working on Ultrasphinx and HMP. Love both plugins.

    We ended up using Solr in the app we’re building because we needed some analyzers that Sphinx doesn’t have yet (EdgeNGramTokenizer, for example).

  7. I was trying to measure the memory footprint of the query daemon. Sphinx in particular makes indexer footprints useless: the indexer is a separate process that runs, then goes away; and it accepts an absolute memory cap and degrades performance if necessary but does not overstep the bound.

    MySQL (or Postgres for that matter) handles some of the Sphinx index load, as you note, but it also obeys a number of memory caps, so the indexer load would show up, but not as memory usage.

    Solr seems severely handicapped by the speed at which Ruby can load and ship over records. Maybe if you wrote some non-ActiveRecord script to populate the Solr index you could get better performance. Ferret seems daemon-bound. The Sphinx indexer is pretty balanced between loading from MySQL and actually computing the index.

    I really recommend that you install Sphinx and play with the benchmark app yourself; it’s easy.

    What does EdgeNGramTokenizer do, and where are you using it? I’m curious.

  8. EdgeNGramTokenizer builds substring indexes from one edge of a word/field. The main thing I’m using it for is a nice, fast little autocompleter that does good substring matches. For example, searching for Ev or We should offer Evan Weaver as a match.

    The tokenizer takes the stream and generates index entries for E, Ev, Eva, Evan, Evan W, etc. With some additional config you can do it starting with Evan Weaver or with Evan and Weaver as separate tokens (just by using the normal whitespace tokenizers before the EdgeNGram one).

    Google autocomplete and Solr and that should find the same stuff I found.

    No phonetic/fuzzy matches, just straight, do-it-fast substring matches. Doing it in the search engine was the shortest path to getting the feature out and made the most sense.

  9. Ok, that’s a big name for a little concept.

    Sphinx supports infix/prefix indexing, which would store the data you want, but there’s no interface for getting it back independent of a particular record.

  10. So it does…hmmm.

    Getting the whole record isn’t a problem. I’m getting the whole record back from Solr, too (well, the acts_as_solr is).

    Fast-ish? :-)

    I could probably do everything I’m using Solr for with Sphinx. I’ll have to do a feature bakeoff at some point. After we launch our site, though. Priorities.

  11. Have you heard of Xapian? It would be interesting to compare to it too. Thanks for the great article!

    PS: How do you generate those useful status lines: memory usage in kb: {:virtual=>35356, :total=>35688, :real=>332}?

  12. I don’t know anything about Xapian. It has Ruby bindings, but it looks like there’s no easy Rails integration. Maybe someone can submit a patch to the benchmark app that adds Xapian support. For that matter, we could probably add Hyper Estraier too. And there’s also Zebra but I don’t think it has Ruby bindings.

    For the memory usage lines, see the very bottom of benchmark.rake.

  13. I’ve been benchmarking Solr a decent amount lately, both with and without acts_as_solr, and acts_as_solr uses the slowest possible way to index. It defaults to indexing each record one at a time and then calling commit() on the Solr instance, which is insanely slow. If you override it to index a few hundred records at a time, it then spends longer deserializing Solr’s XML response into a Ruby array than Solr spends indexing.

    If you are willing to just dump all your records into Solr-format XML and then post that XML file to Solr with curl, you can bring your indexing time down to 1.2-2.5 times that of Sphinx, even when including the time spent writing the XML file.

    Acts_as_solr also adds a before_save filter that delays every single AR::Base#save call until the server has indexed each record to Solr, one at a time, which also slows down actually using your app. Ugh. Rather than write my own cron jobs to send recent records to Solr for indexing, I think I’m just going to use Ultrasphinx.

  14. I did set the acts_as_solr batch size to 4000 in my benchmark, so it should be near the best performance the plugin currently gives. I didn’t realize that it wastes so much time parsing the Solr response.

    Per-doc indexers seem nice at first, but it seems like most people eventually move to an external batch process. With Rails at least, it’s just too slow and too unreliable to rely on container callbacks.

  15. I believe acts_as_ferret updates the index each time a change to the database is sent through the model. Does this make acts_as_ferret a better fit/easier to admin for a site with heavy database and search traffic?

    I’m a newbie to rails and have just implemented acts_as_ferret, and am wondering whether it makes sense to switch to Ultrasphinx.

  16. John: That’s what I was referring to in my last paragraph in the previous comment. The Rails callback seems good at first, but actually it’s worse, because its both a tighter and less reliable coupling.

  17. On a PostgreSQL DB, I would be curious to know performance differences between TSearch/acts_as_tsearch and Sphinx/Ultrasphinx.

  18. Thanks for this benchmark, I have a fairly good experience with solr/acts_as_solr which i used on a large job search site (with good results on speed and stability) but I’m looking forward to try out sphinx/Ultrasphinx (btw does it have some support for facets & c. ?)

  19. jney: Maybe I can make TSearch happen, or maybe you can.

    Luca: Ultrasphinx supports facets. The main limitation is that you can only get results for one facet group at a time. More require multiple queries (not really a big deal in practice).

  20. Here is the benchmarck. It was an opportunity to try acts_as_tsearch.

    But, I’m not sure about results: I didn’t figure out to index numeric fields with acts_as_tsearch, so I just indexed content on verses table.

    Sphinx
    
                        user   system    total        real
    verse:god       0.900000 0.180000 1.080000 ( 25.119664)
    verse:god:no_ar 0.590000 0.110000 0.700000 ( 27.899481)
    book:god        0.970000 0.130000 1.100000 ( 23.233440)
    all:god         0.820000 0.130000 0.950000 ( 18.713994)
    all:calves      1.030000 0.150000 1.180000 ( 24.452112)
    all:moreover    0.850000 0.140000 0.990000 ( 26.770875)
    result counts: [3595, 64, 3659, 9, 3]
    
    Tsearch
                        user   system    total        real
    verse:god       0.350000 0.020000 0.370000 ( 24.835954)
    book:god        0.160000 0.010000 0.170000 ( 0.358111)
    all:god         0.360000 0.020000 0.380000 ( 29.329491)
    all:calves      0.130000 0.020000 0.150000 ( 13.396735)
    all:moreover    0.140000 0.010000 0.150000 ( 14.048261)
    result counts: [4077, 23, 4077, 2, 1]
    
  21. jney: Thanks. Can you send me a diff of your changes to the benchmark app? Also, how does index time factor in? Try dropping the TSearch index and timing the recreate.

    Incidentally, those results seem pretty slow, because I assume acts_as_tsearch uses the database connection and doesn’t have to build and teardown a socket every time. What do you think about that?

    I’m working on rerunning everything with the Wikipedia dataset. It will be a while though because it’s humongous.

  22. Here is the diff.

    Note that I have commented out the Solr and Ferret declarations in the models, and I moved the tables declarations in the migration file.

    I re-ran the test with index rebuilding and this time Sphinx is really more efficient:

    Tsearch
                              user     system      total        real
    reindex               0.000000   0.000000   0.000000 (  9.327885)
    verse:god             0.230000   0.020000   0.250000 ( 25.335588)
    book:god              0.290000   0.010000   0.300000 (  0.534942)
    all:god               0.230000   0.020000   0.250000 ( 31.835928)
    all:calves            0.130000   0.020000   0.150000 ( 13.870950)
    all:moreover          0.250000   0.020000   0.270000 ( 14.663606)
    result counts: [4077, 16, 4077, 2, 1]
    
    Sphinx
                              user     system      total        real
    reindex               0.000000   0.000000   0.010000 (  0.054219)
    verse:god             0.850000   0.170000   1.020000 ( 17.936695)
    verse:god:no_ar       0.550000   0.100000   0.650000 ( 15.710643)
    book:god              0.920000   0.130000   1.050000 ( 13.570308)
    all:god               0.800000   0.120000   0.920000 ( 17.987609)
    all:calves            0.860000   0.130000   0.990000 (  9.455561)
    all:moreover          0.950000   0.140000   1.090000 ( 18.283489)
    result counts: [3595, 64, 3659, 9, 3]
    index size:   0B        total
    

    I don’t know about the socket, you could probably answer better than me. But yes, the difference between Tsearch and Sphinx is that Sphinx is and external daemon and Tsearch is a part of pgsql.

  23. You’re probably right; I feel stupid. But I did just wake up…

    Sphinx
                              user     system      total        real
    reindex               0.000000   0.000000   2.650000 ( 23.352280)
    verse:god             0.910000   0.170000   1.080000 ( 30.109867)
    verse:god:no_ar       0.590000   0.120000   0.710000 ( 29.060766)
    book:god              0.990000   0.140000   1.130000 ( 27.729506)
    all:god               0.830000   0.120000   0.950000 ( 22.954390)
    all:calves            0.920000   0.150000   1.070000 ( 27.429021)
    all:moreover          0.980000   0.150000   1.130000 ( 31.570498)
    result counts: [3595, 64, 3659, 9, 3]
    
    Tsearch
                              user     system      total        real
    reindex               0.000000   0.000000   0.000000 ( 12.749621)
    verse:god             0.240000   0.020000   0.260000 ( 45.059050)
    book:god              0.140000   0.010000   0.150000 (  0.408823)
    all:god               0.350000   0.010000   0.360000 ( 51.121836)
    all:calves            0.130000   0.020000   0.150000 ( 35.231199)
    all:moreover          0.140000   0.010000   0.150000 ( 35.676392)
    result counts: [4077, 7, 4077, 2, 1]

    My postgresql db is getting slower and slower, but it’s a test config…

  24. Maybe you need to vacuum?

    Ultrasphinx might do some non-optimal things related to installing stored procedures, too; I dunno. Postgres is pretty fussy about that kind of thing. I don’t understand it very well.

  25. No, already done it. This poor db suffer of abuses. Anyway i’d like to have your conclusion about tsearch vs sphinx. because i use tsearch on a project (without acts_as_tsearch just stored procedures) and your (excellent) plugin ultrasphinx on another one.

  26. I think by the end of next week I should have the Wikipedia dataset results. I am waiting to get access to a bigger machine.

  27. evan, jney… would love to see the wikipedia dataset benchmark results on a comparable ultrasphinx/tsearch (native) suite.

    getting the configurations sane and comparable seems to be the crux, so i’ll leave it to you and your infinite search benching wisdom ;)

  28. FULLTEXT is normally an order of magnitude slower at runtime then all of these, and has worse quality results. On the other hand, it’s very convenient.

  29. Evan, is your benchmark application still available ? It’s down at the moment, and would love to use it for some further benchmarking.

  30. Evan, could I ask for your help on a simple Ultrasphinx item? I am sure you are busy, but I haven’t had any reply’s on the forums, and I know you have the answer. It’s a simple question. I am very new to Ultrasphinx, and I simply need someone to look at my post. I would be willing to compensate you for your time, or buy you something (book etc) that is of interest to you.

    http://railsforum.com/viewtopic.php?pid=95216#p95216

    Thanks!

    Chris