top-secret tuned mysql configurations for rails

MySQL configuration is difficult, and a frequent source of mysterious performance problems.

target hardware

These configurations are for a typical small *nix server or VPS with 512MB or 1GB of RAM. The server is expected to act as the application server, httpd host, and DB server all at once, so MySQL needs to share. Connections are expected to occur on the local socket only. I am assuming you will run about 4 to 12 mongrels or fastcgi processes, and no memcached.

The files are also excellent for using on your development machine. Just divide the your RAM in half when choosing (to leave room for your heavyweight desktop apps).

which mysql version

Use MySQL version 5.0. Even 5.1 would be ok. Do not use 4.x.

Realize that the examples included with MySQL itself (my-medium.cnf, my-large.cnf, etc.) pretty much suck. Do not look to them for guidance. They might have made sense in 1998. It is no longer that year.

If you use Debian or Ubuntu, and installed MySQL via APT or deprec, you will get a strange Debian-specific table-checking configuration. Find the /etc/mysql/ folder and empty out the Debian scripts, turning them into no-ops:

sudo bash -c '> /etc/mysql/debian.cnf'
sudo bash -c 'echo \#\!/bin/bash > debian-start'

Now, onward.

choosing a storage engine

Deciding between MyISAM and InnoDB is a confusing and endless issue. Please read this exhilarating Choose Your Own Adventure novel, available at the special price of $0.00:

  • Are you confident in your MySQL abilities?
    • If no, use InnoDB.
    • If yes, keep reading.
  • Do you plan to keep good database backups?
    • If no, use InnoDB.
    • If yes, keep reading.
  • Do you have a typical content-based web app (a forum, for example—many SELECTs, some INSERTs, very few UPDATEs and DELETEs)?
    • If no, use InnoDB.
    • If yes, keep reading.
  • Do you plan to verify referential integrity in only the application layer?
    • If no—if you need real foreign keys and transactions—use InnoDB.
    • If yes, keep reading.
  • Do you need the maximum possible database speed?
    • If no, use InnoDB.
    • If yes, use MyISAM.

The end.

Note that lots of very large sites such as eBay, LiveJournal, and Gamespot verify integrity purely at the application layer (for Ebay, a source, for the other two, just hearsay, albeit reliable hearsay).

Also note that the InnoDB configurations I provide are more complicated than the MyISAM configurations. This is because InnoDB does more things for you. However, it can be slower than MyISAM for the typical web application.

backing up your existing data

If you have any data in your DB already that you need to keep, dump it with mysqldump, because your new setup won’t be compatible. Example:

mysqldump -u root --complete-insert --all-databases --comments > full_dump.sql

You can later restore it with:

cat full_dump.sql | mysql -u root

Note that if you are moving from InnoDB to MyISAM, you can have max key length errors, because InnoDB stores UTF8 keys in a more compact way. You could avoid this by using Latin1 instead, or perhaps by messing around with your index and key definitions. It’s a pain.

available configurations

Choose one of the files below, based on the storage engine you want and the amount of physical memory in your server:

Backup your existing my.cnf (usually in /etc/ or /etc/mysql/), copy the new configuration over it, and restart MySQL. You will have to run mysql_install_db again. Now load your data back in.

The configuration files are copyright 2007 Cloudburst, LLC and are licensed under the Academic Free License 3.0.

topic for debate

Would a typical small server like this be faster if we removed resources from MySQL and the mongrels, but added a memcached instance? What if we couldn’t page-cache?

shouts

Many thanks to Dormando, formerly of LiveJournal, who taught me much of what I know. The rest comes from bitter experience and reading the MySQL 5.0 docs.

bleak_house

It’s bleak to have leaks.

third update

Please see here for up-to-date documentation.

second update

There is now a pure-C heap instrumentation as well as the Ruby/ObjectSpace one. You really should be using the C version. It requires you to compile a custom binary, though. Just go to the plugin’s folder (vendor/plugins/bleak_house in your app, or bleak_house-5.1/ in your system’s gems/) and run:

sudo rake ruby:build

update

I had to junk Rublique because it was introducing its own leaks (via method unbinding) and used an unreliable delta algorithm. Instead the plugin now uses BleakHouse::MemLogger, which is faster and more accurate. Rublique gave me the original idea, so I can’t complain.

A gem version is now available, too. Install and then require 'bleak_house' in config/environment.rb. Usage is the same, but you have to manually install the Rake task in each app (the gem install message will explain).

postscript

You can report any issues on the forum. Also, Chris Carter says BleakHouse is starting a trend in emo Ruby-naming.

That’s all. Go scale something already.

i thought about this thing…

A small blog digest for you.

dependency troubles

Rails’ dependency loading in development mode has been making me unhappy. There are existing reports (6720, 6942), but the behavior persists. Other people are experiencing similar problems.

For example, rake test loads classes differently than (and incompatible with) regular ruby test_file.rb. Instance methods sometimes disappear from model classes. And association targets get instantiated with broken classes after the first request: User expected, got User. I can reproduce the behavior but I can’t always explain it.

missing generators

Be aware that in gem Rails, generators defined in plugins won’t get recognized if the plugin is symlinked into vendor/plugins, even though the rest of the the plugin will work fine. It’s an issue with the PathSource class, fixed in r. 6101.

polymorphs forum

The has_many_polymorphs forum has been buzzing with red bees lately; you might find honey.

camping is speaking loudly into the phone

Also, ActiveSupport and Camping had an argument. Jeremy McAnally and I fought a bug for a while—you can’t extend your Camping app’s main module from outside the module itself. But you should be able to, so why?

scaling debate

Regarding optimizing database access, nothing personal, but I don’t like Dr. Nic’s way. Revolution’s is better. (Although I don’t understand why the existing MySQL solutions weren’t the first resort.) The cost of the app servers is Twitter’s current swinging bridge, though.

David’s blog is flooded by trolls.

san francisco

I was in San Francisco all week at CNET and met some new Bay Rubyists. It was fun. On the plane back there were storms, and also a guy watched me program because he had never seen it done before.

snails conference

RailsConf is coming up. Some dude with the same name as me is giving a talk:

   Going Off Grid: Rails as Platform

   Friday, May 18, 2007

   11:45AM – 12:35PM

   Oregon Ballroom 203

You can come and heckle about polymorphs bugs, assuming you don’t want to hear Alan Francis on Agile, or Scott Raymond on REST, both in the same slot.

I’m not much for methodologies, so if you definitely don’t want to hear about Agile, come to mine. My methodology is to keep it simple, work hard, and always be learning.

This guy agrees.

dumb multi-file find and replace

puts “Done.”

update

Mike suggested rpl, below, and it seems good. This makes our script:

#!/usr/bin/env ruby
puts "Not enough args" or exit unless (A = ARGV)[1]
A.map!{|s| s.inspect[1..-2]}
formats = [".rhtml", ".rb", ".yml", ".rjs", "Rakefile"]
formats.map!{|s| " -x'#{s}'"}
system "rpl -Re #{formats} '#{A[0]}' '#{A[1]}' #{A[2] or '*'}"

I’m still using Ruby as a wrapper; it’s easiest.

dependency injection for rails models

The polymorphs plugin dynamically injects methods into child models. This means that if you referenced a child model before the parent was loaded, the methods would be missing.

inversion of control or what have you

I solved the problem by adding a dependency injection mechanism. Here’s the entire code:

module Dependencies
  mattr_accessor :injection_graph
  self.injection_graph = Hash.new([])

  def inject_dependency(target, *requirements)
    target, requirements = target.to_s, requirements.map(&:to_s)
    injection_graph[target] =
      ((injection_graph[target] + requirements).uniq - [target])
    requirements.each {|requirement| mark_for_unload requirement }
  end

  def new_constants_in_with_injection(*descs, &block)
    returning(new_constants_in_without_injection(*descs, &block)) do |found|
      found.each do |constant|
        injection_graph[constant].each {|req| req.constantize}
      end
    end
  end
  alias_method_chain :new_constants_in, :injection
end

explanation, usage

See what it does? Imagine that the Tag and Tagging classes modify the Recipe class (by injecting a new method, or relationship, or something). Normally in development mode if Recipe gets reloaded, the injections will get lost, since Rails doesn’t know it has to re-evaluate Tag and Tagging after Recipes is refreshed. But now we can do as follows:

Dependencies.inject_dependency("Recipe", "Tag", "Tagging")

This way, when the Recipe constant gets refreshed, Tag and Tagging will also get refreshed, and can go patch up Recipe again. Because constantize() doesn’t reload the same constant multiple times, there is no danger of infinite cycles.

remember, only for development mode

This is not at all useful in production mode, since classes aren’t reloaded. But in development mode it can make a big difference in sanity.

If there is interest I can release it as a separate plugin.

postscript: on the plugin boot process

Also in version 27.1, there is some method chaining to let the plugin finish booting itself after the config.after_initialize block runs. This is useful because users are supposed to set plugin start-up options in config.after_initialize (see here).

The startup sequence is like so (compressed from railties/lib/initializer.rb):

def process
    load_environment  # environment.rb
    load_plugins # init.rb for your plugin
    load_observers
    initialize_routing
    after_initialize # where your user configures your plugin
end

What if things in init.rb need to know the configuration options? Check lib/has_many_polymorphs/autoload.rb for an example of a fix.

Unfortunately config.after_initialize doesn’t allow multiple blocks the way Dispatcher.to_prepare does. There is a Rails patch waiting to happen here…

add gud spelning to ur railz app or wharever

I cleaned up the raspell gem a little bit and put it on Rubyforge. It’s a Ruby interface to the GNU Aspell spellcheck library:

instalnation

Mac people:

sudo port install aspell aspell-dict-en

Or, Ubuntu people:

sudo apt-get install aspell libaspell-dev aspell-en

Then:

sudo gem install raspell

Of course you shouldn’t install the gem without trusting me and/or auditing the source code, and also making sure your DNS isn’t poisoned with regard to the Rubyforge mirrors, and that the Rubyforge mirrors themselves haven’t been hacked.

Yeah.

usige

The above spell.rb file for your app:

require 'rubygems'
require 'raspell'

module Spell
  SP = Aspell.new("en")
  SP.suggestion_mode = Aspell::NORMAL
  SP.set_option("ignore-case", "false")

  def self.correct string
     string.gsub(/[\w\']+/) do |word|
       not SP.check(word) and SP.suggest(word).first or word
     end
  end
end

thuoghts

If you just need a “did you mean X” method, this is a great start. Of course sometimes Aspell gets it wrong. However, there are lots of options you can set, and different ways you can report the word possibilities, so you might be able to tune it to your specific situation.

Aspell supports custom wordlists and custom stemming and all kinds of cool stuff. Check the gem’s README and Aspell’s manual for more details.

audit your gems

Gems don’t have to be trustworthy:

prevention?

Can you find a way to inspect the gem without installing it? Somehow the file doesn’t appear in the project list. And gem unpack wants it installed first. Would the unpacked version even tell you anything?

What if I had slipped this into cgi_multipart_eof_fix, one of the most-downloaded gems? What if someone had compromised my Rubyforge account, and they did it?

Audit your gems. Update specific gems when you need new features, and avoid sudo gem update. Gems and Rubyforge are a great convenience. But know who you’re trusting.

make python quit like a normal person

Let’s add quit and exit command support to the Python interactive interpreter. We’ll ignore all the arguments for and against. It’s just something we want.

problem

You’ve seen this before:

mackenzie:~ eweaver$ python
Python 2.5 (r25:51908, Feb 25 2007, 06:35:16)
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for
more information.
>>> quit
'Use Ctrl-D (i.e. EOF) to exit.'
>>> 

I’ll avoid editorializing, except to note:

>>> help()
Welcome to Python 2.5!  This is the online help utility.
To quit this help utility and return to the interpreter,
just type "quit".
help> quit
You are now leaving help and returning to the Python
interpreter.
>>> quit
'Use Ctrl-D (i.e. EOF) to exit.'
>>>

Hrm.

solution

Rubyists may find this very beautiful. First, in ~/.bash_profile:

export PYTHONSTARTUP=~/.pythonrc

Now, in ~/.pythonrc:

class Quit:
  def __repr__(self):
    import sys
    sys.exit()
exit = quit = Quit()
del Quit

Source your ~/.bash_profile or restart your shell. Try it out:

mackenzie:~ eweaver$ python2.5
Python 2.5 (r25:51908, Feb 25 2007, 06:35:16)
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for
more information.
>>> quit
mackenzie:~ eweaver$ 

Hax!

three thoughts

A Ruby version:

quit = Object.new
def quit.inspect
  exit
end

I don’t know how to define eigenmethods in Python. Can someone fill me in? They were talking about borgs.

The Python solution is gleaned from this bug report. Contrary to what arigo says, I haven’t noticed any problems with displaying the builtins dict; dir() returns strings anyway. The issue is that you can’t do the rough equivalent of (method :quit).inspect.

an aside

I added an all articles view to Snax to make it easier to find old posts. Also the category pages use the new concise view.

how to make a changeset-preserving svn mirror

I have a private svn repository, as well as a Trac instance, that I use for almost everything. I also have a bunch of Rubyforge projects with their own repositories (Polymorphs, Allison, Fauna). Since the Trac changeset browser is so nice, I wanted to be able to use it to browse my Rubyforge projects with changesets and commit messages intact.

install dependencies

Make sure you have Perl (quiet, you in the back), and then set up CPAN:

sudo apt-get install perl
sudo cpan

CPAN will ask you a billion install questions. Answer them sanely. Then when you are at the CPAN prompt, you can install SVN::Mirror:

force install SVN::Mirror

CPAN is not done with the questioning; it’s very social. It may ask you things like “Which directory for UUID store? [/tmp]” or “Ah, I see you already have installed libnet before; update your configuration?”. Also notice that it runs all the tests on install. Eventually it will finish. Type exit.

write the mirror script

Hold your breath; we need to write the updater script that we can call periodically with cron:

mirror_svn
#!/usr/bin/env perl

use strict;
use warnings;
use SVN::Mirror;

my $url = "svn://rubyforge.org/var/svn";
my $repository = "/svn";
my @projects = ("fauna", "allison", "polymorphs");

foreach (@projects) {
  my %opts = (source => "$url/$_",
    target => $repository,
    target_path => "/rubyforge/$_");
  $opts{'skip_to'} = 1 if ($ARGV[0] && $ARGV[0] eq "--init");
  my $mirror = SVN::Mirror->new(%opts);
  $mirror->init;
  $mirror->run;
}

I made you guys use $_. Just because.

Note that $repository refers to the real repository on your server, not a local checkout. The script must be run on the server that houses the actual repository.

initial import

Rubyforge has a tendency to drop the connection, and we want to make sure the initial import finishes ok because it could be lengthy and a race condition could result if two cron updates overlap. Set the executable bit on the script, then run it by hand:

chmod u+x mirror_svn
./mirror_svn --init

Ignore the “Network connection closed unexpectedly” errors. Keep repeating until it finishes cleanly.

schedule a cronjob

Now we can run every two minutes to import any new changesets. Add the following cronjob:

*/2 * * * * /path/to/mirror_svn &> /tmp/mirror_svn.log

Sit back and the changes will roll in.

troubleshooting

You may get an error like so:

Waiting for sync lock on /rubyforge/fauna: server:21338.

This means the mirror script was forcefully killed (probably by you, playing fast and loose with CTRL-C, jerkface). To fix it, temporarily disable the cronjob, kill any running mirror_svn processes, and then find your Mirror.pm file at /usr/local/share/perl/5.8.7/SVN/Mirror.pm or similar, and change line 518 to:

        while (0) {

This will ignore the lock. Now run the script once by hand, then change the file back, and re-enable the cronjob. A little icky, but it works.

Also, if you delete a mirror, make sure you remove any out-of-date svm:mirror properties from the repository root. That’s not a typo; svm:mirror is the correct name.

It would be nice to be able to make commits to the local repository and push the changes back out to the remote. I spent some time trying with svm mergeback and also SVN::Pusher. Nothing even partially worked, and every attempt broke my mirror setup.

trac considerations

Note there is a rare chance that Trac will not display your changesets after a certain point if they get added in reverse-temporal order. This can happen in the following scenario:

  • mirror update runs
  • someone commits to the remote repository
  • someone commits right away to the local repository
  • mirror update runs
  • mirror commits the remote changes into the local repository

The best solution to this is to just run the mirror update very frequently. Either way, your data is always safe. It’s just a Trac display issue.

conclusion

Why not some other tool? I couldn’t use svnsync because it requires a separate brand-new repository for every mirror source, and Trac doesn’t support external or multi-repository browsing. SVK is too much to learn and too invasive of my normal workflow. And Piston doesn’t preserve changesets.

Go Perl.

sti abuse

I’ve noticed people misusing Rails’ single-table inheritance recently, with negative effects on maintainability. Admittedly, it is tempting to regard the class of a record as a piece of record data. But this is wrong. Instead, think of the class as a handle to a set of behaviors (sounds like duck typing, doesn’t it).

good sti

Consider a person and a dog. They both can eat things. They both can walk. But if you Dog#walk, you need to find someone with a leash, but if you Person#walk, the person walks on their own. The difference in the walking is at the model level, and sending respond_to? :walk doesn’t tell you what you need to know to walk properly. So we have to distinguish based on the behavior, rather than the data. Since STI is the only inheritance ActiveRecord supports, we must use STI to make that distinction.

bad sti

But say we have two styles of reports: receipts and invoices. Their behavior is identical (I know you’ve seen empty STI child classes too). However, some you render with the heading “Receipt” and some with “Invoice”. In this case the branch takes place in your view. So just use a report_name field and render that directly, or call a partial based on the name. Nothing here is related to the behavior of the record.

STI just confuses things in this situation. You end up using class constants as backhanded curried finders. Associations become unclear, and obj.class.name calls proliferate. Worse yet, you start writing separate routes and controllers for identical classes, and that soon leads to very wet views or strange hacks to share views among controllers.

But what if invoices have more fields than receipts? Is it “dry” to duplicate columns in separate tables? It is, because then you don’t have nulled columns in the base table. Ultimately, though, I usually recommend using a single, more abstract model in the first place, perhaps with a non-behavioral type field, and just not rendering irrelevant fields unless appropriate.

conclusion

Inheritance is about behavior. Not data. The data is secondary. If your schema has models that are basically the same, unify them. For maintainability’s sake, branch at the last possible moment, in the view.

Basically, STI has a sweet spot. You need to hit it exactly or you will be in for a rough time. Here is the same thought from the other direction; that is, don’t try to mix in behavior based on data.

That way lies madness, he says.

Stay in the middle. Don’t wobble the boat.