bleak_house

It’s bleak to have leaks.

third update

Please see here for up-to-date documentation.

second update

There is now a pure-C heap instrumentation as well as the Ruby/ObjectSpace one. You really should be using the C version. It requires you to compile a custom binary, though. Just go to the plugin’s folder (vendor/plugins/bleak_house in your app, or bleak_house-5.1/ in your system’s gems/) and run:

sudo rake ruby:build

update

I had to junk Rublique because it was introducing its own leaks (via method unbinding) and used an unreliable delta algorithm. Instead the plugin now uses BleakHouse::MemLogger, which is faster and more accurate. Rublique gave me the original idea, so I can’t complain.

A gem version is now available, too. Install and then require 'bleak_house' in config/environment.rb. Usage is the same, but you have to manually install the Rake task in each app (the gem install message will explain).

postscript

You can report any issues on the forum. Also, Chris Carter says BleakHouse is starting a trend in emo Ruby-naming.

That’s all. Go scale something already.

i thought about this thing…

A small blog digest for you.

dependency troubles

Rails’ dependency loading in development mode has been making me unhappy. There are existing reports (6720, 6942), but the behavior persists. Other people are experiencing similar problems.

For example, rake test loads classes differently than (and incompatible with) regular ruby test_file.rb. Instance methods sometimes disappear from model classes. And association targets get instantiated with broken classes after the first request: User expected, got User. I can reproduce the behavior but I can’t always explain it.

missing generators

Be aware that in gem Rails, generators defined in plugins won’t get recognized if the plugin is symlinked into vendor/plugins, even though the rest of the the plugin will work fine. It’s an issue with the PathSource class, fixed in r. 6101.

polymorphs forum

The has_many_polymorphs forum has been buzzing with red bees lately; you might find honey.

camping is speaking loudly into the phone

Also, ActiveSupport and Camping had an argument. Jeremy McAnally and I fought a bug for a while—you can’t extend your Camping app’s main module from outside the module itself. But you should be able to, so why?

scaling debate

Regarding optimizing database access, nothing personal, but I don’t like Dr. Nic’s way. Revolution’s is better. (Although I don’t understand why the existing MySQL solutions weren’t the first resort.) The cost of the app servers is Twitter’s current swinging bridge, though.

David’s blog is flooded by trolls.

san francisco

I was in San Francisco all week at CNET and met some new Bay Rubyists. It was fun. On the plane back there were storms, and also a guy watched me program because he had never seen it done before.

snails conference

RailsConf is coming up. Some dude with the same name as me is giving a talk:

   Going Off Grid: Rails as Platform

   Friday, May 18, 2007

   11:45AM – 12:35PM

   Oregon Ballroom 203

You can come and heckle about polymorphs bugs, assuming you don’t want to hear Alan Francis on Agile, or Scott Raymond on REST, both in the same slot.

I’m not much for methodologies, so if you definitely don’t want to hear about Agile, come to mine. My methodology is to keep it simple, work hard, and always be learning.

This guy agrees.

dumb multi-file find and replace

puts “Done.”

update

Mike suggested rpl, below, and it seems good. This makes our script:

#!/usr/bin/env ruby
puts "Not enough args" or exit unless (A = ARGV)[1]
A.map!{|s| s.inspect[1..-2]}
formats = [".rhtml", ".rb", ".yml", ".rjs", "Rakefile"]
formats.map!{|s| " -x'#{s}'"}
system "rpl -Re #{formats} '#{A[0]}' '#{A[1]}' #{A[2] or '*'}"

I’m still using Ruby as a wrapper; it’s easiest.

dependency injection for rails models

The polymorphs plugin dynamically injects methods into child models. This means that if you referenced a child model before the parent was loaded, the methods would be missing.

inversion of control or what have you

I solved the problem by adding a dependency injection mechanism. Here’s the entire code:

module Dependencies
  mattr_accessor :injection_graph
  self.injection_graph = Hash.new([])

  def inject_dependency(target, *requirements)
    target, requirements = target.to_s, requirements.map(&:to_s)
    injection_graph[target] =
      ((injection_graph[target] + requirements).uniq - [target])
    requirements.each {|requirement| mark_for_unload requirement }
  end

  def new_constants_in_with_injection(*descs, &block)
    returning(new_constants_in_without_injection(*descs, &block)) do |found|
      found.each do |constant|
        injection_graph[constant].each {|req| req.constantize}
      end
    end
  end
  alias_method_chain :new_constants_in, :injection
end

explanation, usage

See what it does? Imagine that the Tag and Tagging classes modify the Recipe class (by injecting a new method, or relationship, or something). Normally in development mode if Recipe gets reloaded, the injections will get lost, since Rails doesn’t know it has to re-evaluate Tag and Tagging after Recipes is refreshed. But now we can do as follows:

Dependencies.inject_dependency("Recipe", "Tag", "Tagging")

This way, when the Recipe constant gets refreshed, Tag and Tagging will also get refreshed, and can go patch up Recipe again. Because constantize() doesn’t reload the same constant multiple times, there is no danger of infinite cycles.

remember, only for development mode

This is not at all useful in production mode, since classes aren’t reloaded. But in development mode it can make a big difference in sanity.

If there is interest I can release it as a separate plugin.

postscript: on the plugin boot process

Also in version 27.1, there is some method chaining to let the plugin finish booting itself after the config.after_initialize block runs. This is useful because users are supposed to set plugin start-up options in config.after_initialize (see here).

The startup sequence is like so (compressed from railties/lib/initializer.rb):

def process
    load_environment  # environment.rb
    load_plugins # init.rb for your plugin
    load_observers
    initialize_routing
    after_initialize # where your user configures your plugin
end

What if things in init.rb need to know the configuration options? Check lib/has_many_polymorphs/autoload.rb for an example of a fix.

Unfortunately config.after_initialize doesn’t allow multiple blocks the way Dispatcher.to_prepare does. There is a Rails patch waiting to happen here…

add gud spelning to ur railz app or wharever

I cleaned up the raspell gem a little bit and put it on Rubyforge. It’s a Ruby interface to the GNU Aspell spellcheck library:

instalnation

Mac people:

sudo port install aspell aspell-dict-en

Or, Ubuntu people:

sudo apt-get install aspell libaspell-dev aspell-en

Then:

sudo gem install raspell

Of course you shouldn’t install the gem without trusting me and/or auditing the source code, and also making sure your DNS isn’t poisoned with regard to the Rubyforge mirrors, and that the Rubyforge mirrors themselves haven’t been hacked.

Yeah.

usige

The above spell.rb file for your app:

require 'rubygems'
require 'raspell'

module Spell
  SP = Aspell.new("en")
  SP.suggestion_mode = Aspell::NORMAL
  SP.set_option("ignore-case", "false")

  def self.correct string
     string.gsub(/[\w\']+/) do |word|
       not SP.check(word) and SP.suggest(word).first or word
     end
  end
end

thuoghts

If you just need a “did you mean X” method, this is a great start. Of course sometimes Aspell gets it wrong. However, there are lots of options you can set, and different ways you can report the word possibilities, so you might be able to tune it to your specific situation.

Aspell supports custom wordlists and custom stemming and all kinds of cool stuff. Check the gem’s README and Aspell’s manual for more details.

audit your gems

Gems don’t have to be trustworthy:

prevention?

Can you find a way to inspect the gem without installing it? Somehow the file doesn’t appear in the project list. And gem unpack wants it installed first. Would the unpacked version even tell you anything?

What if I had slipped this into cgi_multipart_eof_fix, one of the most-downloaded gems? What if someone had compromised my Rubyforge account, and they did it?

Audit your gems. Update specific gems when you need new features, and avoid sudo gem update. Gems and Rubyforge are a great convenience. But know who you’re trusting.

make python quit like a normal person

Let’s add quit and exit command support to the Python interactive interpreter. We’ll ignore all the arguments for and against. It’s just something we want.

problem

You’ve seen this before:

mackenzie:~ eweaver$ python
Python 2.5 (r25:51908, Feb 25 2007, 06:35:16)
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for
more information.
>>> quit
'Use Ctrl-D (i.e. EOF) to exit.'
>>> 

I’ll avoid editorializing, except to note:

>>> help()
Welcome to Python 2.5!  This is the online help utility.
To quit this help utility and return to the interpreter,
just type "quit".
help> quit
You are now leaving help and returning to the Python
interpreter.
>>> quit
'Use Ctrl-D (i.e. EOF) to exit.'
>>>

Hrm.

solution

Rubyists may find this very beautiful. First, in ~/.bash_profile:

export PYTHONSTARTUP=~/.pythonrc

Now, in ~/.pythonrc:

class Quit:
  def __repr__(self):
    import sys
    sys.exit()
exit = quit = Quit()
del Quit

Source your ~/.bash_profile or restart your shell. Try it out:

mackenzie:~ eweaver$ python2.5
Python 2.5 (r25:51908, Feb 25 2007, 06:35:16)
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type "help", "copyright", "credits" or "license" for
more information.
>>> quit
mackenzie:~ eweaver$ 

Hax!

three thoughts

A Ruby version:

quit = Object.new
def quit.inspect
  exit
end

I don’t know how to define eigenmethods in Python. Can someone fill me in? They were talking about borgs.

The Python solution is gleaned from this bug report. Contrary to what arigo says, I haven’t noticed any problems with displaying the builtins dict; dir() returns strings anyway. The issue is that you can’t do the rough equivalent of (method :quit).inspect.

an aside

I added an all articles view to Snax to make it easier to find old posts. Also the category pages use the new concise view.

how to make a changeset-preserving svn mirror

I have a private svn repository, as well as a Trac instance, that I use for almost everything. I also have a bunch of Rubyforge projects with their own repositories (Polymorphs, Allison, Fauna). Since the Trac changeset browser is so nice, I wanted to be able to use it to browse my Rubyforge projects with changesets and commit messages intact.

install dependencies

Make sure you have Perl (quiet, you in the back), and then set up CPAN:

sudo apt-get install perl
sudo cpan

CPAN will ask you a billion install questions. Answer them sanely. Then when you are at the CPAN prompt, you can install SVN::Mirror:

force install SVN::Mirror

CPAN is not done with the questioning; it’s very social. It may ask you things like “Which directory for UUID store? [/tmp]” or “Ah, I see you already have installed libnet before; update your configuration?”. Also notice that it runs all the tests on install. Eventually it will finish. Type exit.

write the mirror script

Hold your breath; we need to write the updater script that we can call periodically with cron:

mirror_svn
#!/usr/bin/env perl

use strict;
use warnings;
use SVN::Mirror;

my $url = "svn://rubyforge.org/var/svn";
my $repository = "/svn";
my @projects = ("fauna", "allison", "polymorphs");

foreach (@projects) {
  my %opts = (source => "$url/$_",
    target => $repository,
    target_path => "/rubyforge/$_");
  $opts{'skip_to'} = 1 if ($ARGV[0] && $ARGV[0] eq "--init");
  my $mirror = SVN::Mirror->new(%opts);
  $mirror->init;
  $mirror->run;
}

I made you guys use $_. Just because.

Note that $repository refers to the real repository on your server, not a local checkout. The script must be run on the server that houses the actual repository.

initial import

Rubyforge has a tendency to drop the connection, and we want to make sure the initial import finishes ok because it could be lengthy and a race condition could result if two cron updates overlap. Set the executable bit on the script, then run it by hand:

chmod u+x mirror_svn
./mirror_svn --init

Ignore the “Network connection closed unexpectedly” errors. Keep repeating until it finishes cleanly.

schedule a cronjob

Now we can run every two minutes to import any new changesets. Add the following cronjob:

*/2 * * * * /path/to/mirror_svn &> /tmp/mirror_svn.log

Sit back and the changes will roll in.

troubleshooting

You may get an error like so:

Waiting for sync lock on /rubyforge/fauna: server:21338.

This means the mirror script was forcefully killed (probably by you, playing fast and loose with CTRL-C, jerkface). To fix it, temporarily disable the cronjob, kill any running mirror_svn processes, and then find your Mirror.pm file at /usr/local/share/perl/5.8.7/SVN/Mirror.pm or similar, and change line 518 to:

        while (0) {

This will ignore the lock. Now run the script once by hand, then change the file back, and re-enable the cronjob. A little icky, but it works.

Also, if you delete a mirror, make sure you remove any out-of-date svm:mirror properties from the repository root. That’s not a typo; svm:mirror is the correct name.

It would be nice to be able to make commits to the local repository and push the changes back out to the remote. I spent some time trying with svm mergeback and also SVN::Pusher. Nothing even partially worked, and every attempt broke my mirror setup.

trac considerations

Note there is a rare chance that Trac will not display your changesets after a certain point if they get added in reverse-temporal order. This can happen in the following scenario:

  • mirror update runs
  • someone commits to the remote repository
  • someone commits right away to the local repository
  • mirror update runs
  • mirror commits the remote changes into the local repository

The best solution to this is to just run the mirror update very frequently. Either way, your data is always safe. It’s just a Trac display issue.

conclusion

Why not some other tool? I couldn’t use svnsync because it requires a separate brand-new repository for every mirror source, and Trac doesn’t support external or multi-repository browsing. SVK is too much to learn and too invasive of my normal workflow. And Piston doesn’t preserve changesets.

Go Perl.

sti abuse

I’ve noticed people misusing Rails’ single-table inheritance recently, with negative effects on maintainability. Admittedly, it is tempting to regard the class of a record as a piece of record data. But this is wrong. Instead, think of the class as a handle to a set of behaviors (sounds like duck typing, doesn’t it).

good sti

Consider a person and a dog. They both can eat things. They both can walk. But if you Dog#walk, you need to find someone with a leash, but if you Person#walk, the person walks on their own. The difference in the walking is at the model level, and sending respond_to? :walk doesn’t tell you what you need to know to walk properly. So we have to distinguish based on the behavior, rather than the data. Since STI is the only inheritance ActiveRecord supports, we must use STI to make that distinction.

bad sti

But say we have two styles of reports: receipts and invoices. Their behavior is identical (I know you’ve seen empty STI child classes too). However, some you render with the heading “Receipt” and some with “Invoice”. In this case the branch takes place in your view. So just use a report_name field and render that directly, or call a partial based on the name. Nothing here is related to the behavior of the record.

STI just confuses things in this situation. You end up using class constants as backhanded curried finders. Associations become unclear, and obj.class.name calls proliferate. Worse yet, you start writing separate routes and controllers for identical classes, and that soon leads to very wet views or strange hacks to share views among controllers.

But what if invoices have more fields than receipts? Is it “dry” to duplicate columns in separate tables? It is, because then you don’t have nulled columns in the base table. Ultimately, though, I usually recommend using a single, more abstract model in the first place, perhaps with a non-behavioral type field, and just not rendering irrelevant fields unless appropriate.

conclusion

Inheritance is about behavior. Not data. The data is secondary. If your schema has models that are basically the same, unify them. For maintainability’s sake, branch at the last possible moment, in the view.

Basically, STI has a sweet spot. You need to hit it exactly or you will be in for a rough time. Here is the same thought from the other direction; that is, don’t try to mix in behavior based on data.

That way lies madness, he says.

Stay in the middle. Don’t wobble the boat.

polymorphs 25: total insanity branch

I merged and released the ActiveRecord compatibility branch today for has_many_polymorphs, which represents about a 60% rewrite and a 100% code audit. Custom finders! This is huge.

new features overview

  • Custom find-er support (:conditions, :order, :limit, :offset, and :group).
  • Custom conditions (same options as above) on the has_many_polymorphs macro itself, which also get propogated to the individual relationships in a reasonable way.
  • ActiveRecord-derived :has_many_polymorphs reflection type, which appears in .reflect_on_all_associations, so you can dynamically reason about the polymorphic relationship.
  • New extension parameters (:extend, :join_extend, :parent_extend). Toss around modules, procs, and blocks like you just don’t care.
  • Option support for double polymorphic relationships. Everything from single relationships, with the exception of :parent_extend, can be defined on one or both of the sides of the double join.
  • :ignore_duplicates option, which can be set to false if you want to associate more than once (or to handle duplicates with model validations).

Rails 1.2 or greater is now required.

custom finder usage

Some examples are in order. For instance, we can sort the target records by the join record’s creation date:

zoo = Zoo.find(:first)
zoo.animals.find(:all,
  :order => "animals_zoos.created_at ASC")

Or perhaps we have a decorated join model, and want to select on that, with a limit:

zoo.animals.find(:all, :limit => 2,
  :conditions =>
    "animals_zoos.last_cleaned IS NOT NULL")

We can even select on attributes of the targets, rather than the join, even though the targets are different models and don’t have overlapping fields. For example, we can order by the date the animals were born:

zoo.eaters.find(:all,
  :order => "IFNULL(monkeys.created_at,
  (IFNULL(elephants.created_at,
  (IFNULL(snakes.created_at))))) ASC") 

How can this work? In the polymorphic query, fields that aren’t present in a particular child table get nulled out, so you can conditionally traverse them for your query. This example is MySQL specific, but every RDBMS has a similar equivalent. You can use any custom logic on any set of fields you want. Perform math, truncate strings, whatever.

explanation of the optional parameters

There are a lot of options now, so I have made a table to explain which relationships will receive which parameters, what they mean, and what values are allowed. The general rule is that the polymorphic association receives everything, while the individual and join associations receive less, depending on how orthogonal they are to the polymorphic association.

Note that when I say “parent” and “child” I am referring to the polymorphic association’s owner class and target classes, and not to some kind of class inheritance. The owner and target could have an inheritance relationship, but it is irrelevant to this discussion.

single polymorphism parameters

key meaning/value affected classes
:from an array of symbols for the child classes the polymorph, and then is split to create the individual collections
:as a symbol for the relationship the parent acts as in the join the parent model, the join model
:through a symbol for the join model everything (the polymorph, the individual associations, the join associations, and the parent associations as seen from the children)
:join_class_
name
the name of the join model’s class the join model
:foreign_key the column for the parent’s id in the join the join model
:foreign_type_
key
the column for the parent’s class in the join, if the parent itself is polymorphic the join model, although usually only useful for double polymorphs
:polymorphic_
key
the column for the child’s id in the join the join model
:polymorphic_
type_key
the column for the child’s class in the join the join model
:dependent :destroy, :nullify, :delete_all how the join record gets treated on any associate delete (whether from the polymorph or from an individual collection); defaults to :destroy
:ignore_
duplicates
if true, will silently ignore pushing of already associated records the polymorph and individual associations; defaults to true
:rename_
individual_
collections
if true, all individual collections are prepended with the polymorph name, and the children’s parent collection becomes :as + “_of_” + polymorph; for example, zoos_of_animals the names of the individual collections everywhere
:extend one or an array of mixed modules and procs the polymorph, the individual collections
:join_extend one or an array of mixed modules and procs the join associations for both the parent and the children
:parent_extend one or an array of mixed modules and procs the parent associations as seen from the children
:table_aliases a hash of ambiguous table/column names to disambiguated temporary names the instantiation of the polymorphic query; never change this
:select a string containing the SELECT portion of the polymorphic SQL query the polymorph only; never change this
:conditions an array or string of conditions for the SQL WHERE clause everything
:order a string for the SQL ORDER BY clause everything
:group an array or string of conditions for the SQL GROUP BY clause the polymorph, the individual collections
:limit an integer the polymorph, the individual collections
:offset an integer the polymorph only
:uniq if true, the records returned are passed through a pure-Ruby uniq() before they are given to you the polymorph; almost never useful (inherited from has_many :through)

Additionally, if you pass a block to the macro, it gets converted to a Proc and added to :extend.

Phew. Actually, the only required parameter, aside from the association name, is :from.

double polymorphism parameters

Double polymorphism allows the following general keys: :conditions, :order, :limit, :offset, :extend, :join_extend, :dependent, :rename_individual_collections. The general keys get applied to both sides of the relationship. Say you have two sides like:

acts_as_double_polymorphic_join :mammals => [...],
                                :amphibians => [...]

If you want to :extend just the :amphibians, use the key :amphibians_extend. If you have both :extend and :amphibians_extend present, the :amphibians will receive the extensions from both. Technically the general extensions take precedence, but it doesn’t matter unless you are doing something totally bizarre with module inclusion callbacks.

macro conditions devolution

Above, I mentioned that if you set a :conditions parameter on the has_many_polymorphs macro, those same conditions are used on the individual class collections. However, the individual collections are regular has_many :through’s (more or less) and don’t query the superset of fields that a polymorphic SELECT has. How does that not explode?

The plugin is your friend, here. When it builds the conditions for the individual collections, it checks which tables and columns will be still available. If your :conditions or :order or :group contains fields from irrelevant target tables, the plugin will devolve them to nulls. Take this single polymorphism example:

has_many_polymorphs :animals,
  :from => [:monkeys, :elephants],
  :conditions => "monkeys.volume > 1 OR elephants.weight > 10"

The :conditions string will become "NULL > 1 OR elephants.weight > 10" when you are dealing with zoo.elephants, but "monkeys.volume > 1 OR NULL > 10" when you are dealing with zoo.monkeys.

new polymorphic reflection type

Just an example:

>> Tag.reflect_on_association(:taggables)
=> #<ActiveRecord::Reflection::PolymorphicReflection:0x2
  @active_record=Tag, @macro=:has_many_polymorphs,
  @join_class=Tagging, @options={:extend=>[],
  :join_class_name=>"Tagging", :dependent=>:destroy,
  :from=>[:categories, :recipes, :tags],
  :foreign_key=>"tagger_id", :join_extend=>[],
  :select=>"...", :as=>:tagger, :table_aliases=>{...},
  name:taggables

wishlist

Wait, what?

A few people asked if I had any wishlist or donation box set up so they could show their appreciation. So, I made an Amazon list and put some games and books on it:

Small price to pay :).

looking forward

We are moving closer and closer to supporting every possible polymorphic operation in O(1) time. The only big things left to implement are with_scope and infinite meta-:include (:include-ing polymorphs that :include other polymorphs). Also, I might roll in a tagging system generator, since people have shown so much interest in the flexible tagging that has_many_polymorphs supports.

Exciting times.