Francis Irving sent me a note about his work on a new Rails search plugin, acts_as_xapian. It uses the Xapian engine, which is a C++ indexer similar to Lucene. A particularly neat feature is built-in spellcheck.
I still plan to benchmark all these plugins on the Wikipedia dataset…it’s been delayed by the new job. If anyone has a big piece of iron I could use for a couple weeks I would appreciate it (16GB ram, hundreds of GB of free diskspace, no production load).
If you do a new benchmark, please do not miss act_as_searchable/HyperEstraier.
A.t.m., an updated/better commented version can be found at my branch. Install instructions are here.
“If anyone has a big piece of iron I could use for a couple weeks I would appreciate it.”
Try an EC2 X-Large Instance. $0.80 per hour (15GB RAM, 4 cores).
I thought about that, but I’d rather not spend the $268 it takes to keep it running for two weeks, since I can’t finish the whole task at once.
Any results so far? We’re really curious here about the results… :-)
I’ve used HyperEstraier before on a small dataset (< 100,000) and it was fine. Did some complex filtering with it too that worked a treat. The only problem is that _after_ I implemented it I discovered a number of comments along the lines of it having scale issues, especially as it approached 1 million entries. Slowness, lots of long reindexing needed all the time etc… I didn’t experience this myself as our dataset didn’t get anywhere near that large. I’d be interested to see how it performs with Wikipedia, don’t have any iron for you though sorry :-(
Acts_as_searchable is dead, long life search_do ;)
It basically does the same, but has a simpler architecture, a lot less bugs and quirks and is 100% tested.
Its a x-search-backend plugin but so far the only implemented module is hyperestraier.