polymorphs 25: total insanity branch

I merged and released the ActiveRecord compatibility branch today for has_many_polymorphs, which represents about a 60% rewrite and a 100% code audit. Custom finders! This is huge.

new features overview

  • Custom find-er support (:conditions, :order, :limit, :offset, and :group).
  • Custom conditions (same options as above) on the has_many_polymorphs macro itself, which also get propogated to the individual relationships in a reasonable way.
  • ActiveRecord-derived :has_many_polymorphs reflection type, which appears in .reflect_on_all_associations, so you can dynamically reason about the polymorphic relationship.
  • New extension parameters (:extend, :join_extend, :parent_extend). Toss around modules, procs, and blocks like you just don’t care.
  • Option support for double polymorphic relationships. Everything from single relationships, with the exception of :parent_extend, can be defined on one or both of the sides of the double join.
  • :ignore_duplicates option, which can be set to false if you want to associate more than once (or to handle duplicates with model validations).

Rails 1.2 or greater is now required.

custom finder usage

Some examples are in order. For instance, we can sort the target records by the join record’s creation date:

zoo = Zoo.find(:first)
zoo.animals.find(:all,
  :order => "animals_zoos.created_at ASC")

Or perhaps we have a decorated join model, and want to select on that, with a limit:

zoo.animals.find(:all, :limit => 2,
  :conditions =>
    "animals_zoos.last_cleaned IS NOT NULL")

We can even select on attributes of the targets, rather than the join, even though the targets are different models and don’t have overlapping fields. For example, we can order by the date the animals were born:

zoo.eaters.find(:all,
  :order => "IFNULL(monkeys.created_at,
  (IFNULL(elephants.created_at,
  (IFNULL(snakes.created_at))))) ASC") 

How can this work? In the polymorphic query, fields that aren’t present in a particular child table get nulled out, so you can conditionally traverse them for your query. This example is MySQL specific, but every RDBMS has a similar equivalent. You can use any custom logic on any set of fields you want. Perform math, truncate strings, whatever.

explanation of the optional parameters

There are a lot of options now, so I have made a table to explain which relationships will receive which parameters, what they mean, and what values are allowed. The general rule is that the polymorphic association receives everything, while the individual and join associations receive less, depending on how orthogonal they are to the polymorphic association.

Note that when I say “parent” and “child” I am referring to the polymorphic association’s owner class and target classes, and not to some kind of class inheritance. The owner and target could have an inheritance relationship, but it is irrelevant to this discussion.

single polymorphism parameters

key meaning/value affected classes
:from an array of symbols for the child classes the polymorph, and then is split to create the individual collections
:as a symbol for the relationship the parent acts as in the join the parent model, the join model
:through a symbol for the join model everything (the polymorph, the individual associations, the join associations, and the parent associations as seen from the children)
:join_class_
name
the name of the join model’s class the join model
:foreign_key the column for the parent’s id in the join the join model
:foreign_type_
key
the column for the parent’s class in the join, if the parent itself is polymorphic the join model, although usually only useful for double polymorphs
:polymorphic_
key
the column for the child’s id in the join the join model
:polymorphic_
type_key
the column for the child’s class in the join the join model
:dependent :destroy, :nullify, :delete_all how the join record gets treated on any associate delete (whether from the polymorph or from an individual collection); defaults to :destroy
:ignore_
duplicates
if true, will silently ignore pushing of already associated records the polymorph and individual associations; defaults to true
:rename_
individual_
collections
if true, all individual collections are prepended with the polymorph name, and the children’s parent collection becomes :as + “_of_” + polymorph; for example, zoos_of_animals the names of the individual collections everywhere
:extend one or an array of mixed modules and procs the polymorph, the individual collections
:join_extend one or an array of mixed modules and procs the join associations for both the parent and the children
:parent_extend one or an array of mixed modules and procs the parent associations as seen from the children
:table_aliases a hash of ambiguous table/column names to disambiguated temporary names the instantiation of the polymorphic query; never change this
:select a string containing the SELECT portion of the polymorphic SQL query the polymorph only; never change this
:conditions an array or string of conditions for the SQL WHERE clause everything
:order a string for the SQL ORDER BY clause everything
:group an array or string of conditions for the SQL GROUP BY clause the polymorph, the individual collections
:limit an integer the polymorph, the individual collections
:offset an integer the polymorph only
:uniq if true, the records returned are passed through a pure-Ruby uniq() before they are given to you the polymorph; almost never useful (inherited from has_many :through)

Additionally, if you pass a block to the macro, it gets converted to a Proc and added to :extend.

Phew. Actually, the only required parameter, aside from the association name, is :from.

double polymorphism parameters

Double polymorphism allows the following general keys: :conditions, :order, :limit, :offset, :extend, :join_extend, :dependent, :rename_individual_collections. The general keys get applied to both sides of the relationship. Say you have two sides like:

acts_as_double_polymorphic_join :mammals => [...],
                                :amphibians => [...]

If you want to :extend just the :amphibians, use the key :amphibians_extend. If you have both :extend and :amphibians_extend present, the :amphibians will receive the extensions from both. Technically the general extensions take precedence, but it doesn’t matter unless you are doing something totally bizarre with module inclusion callbacks.

macro conditions devolution

Above, I mentioned that if you set a :conditions parameter on the has_many_polymorphs macro, those same conditions are used on the individual class collections. However, the individual collections are regular has_many :through’s (more or less) and don’t query the superset of fields that a polymorphic SELECT has. How does that not explode?

The plugin is your friend, here. When it builds the conditions for the individual collections, it checks which tables and columns will be still available. If your :conditions or :order or :group contains fields from irrelevant target tables, the plugin will devolve them to nulls. Take this single polymorphism example:

has_many_polymorphs :animals,
  :from => [:monkeys, :elephants],
  :conditions => "monkeys.volume > 1 OR elephants.weight > 10"

The :conditions string will become "NULL > 1 OR elephants.weight > 10" when you are dealing with zoo.elephants, but "monkeys.volume > 1 OR NULL > 10" when you are dealing with zoo.monkeys.

new polymorphic reflection type

Just an example:

>> Tag.reflect_on_association(:taggables)
=> #<ActiveRecord::Reflection::PolymorphicReflection:0x2
  @active_record=Tag, @macro=:has_many_polymorphs,
  @join_class=Tagging, @options={:extend=>[],
  :join_class_name=>"Tagging", :dependent=>:destroy,
  :from=>[:categories, :recipes, :tags],
  :foreign_key=>"tagger_id", :join_extend=>[],
  :select=>"...", :as=>:tagger, :table_aliases=>{...},
  name:taggables

wishlist

Wait, what?

A few people asked if I had any wishlist or donation box set up so they could show their appreciation. So, I made an Amazon list and put some games and books on it:

Small price to pay :).

looking forward

We are moving closer and closer to supporting every possible polymorphic operation in O(1) time. The only big things left to implement are with_scope and infinite meta-:include (:include-ing polymorphs that :include other polymorphs). Also, I might roll in a tagging system generator, since people have shown so much interest in the flexible tagging that has_many_polymorphs supports.

Exciting times.

table indexes in rails

Table indexes are sometimes neglected in the rush to fancy RESTful web 2.0 double chocolate milkshake apps. But they are critical for performance. This post is MySQL-oriented, but the ideas apply to other databases too.

let’s take a ride, on the country side

Consider this ordered query. We will alter the index structure both times to make sure the query cache is clear:

>> Topic.count # => 357383
>> ActiveRecord::Migration.add_index :topics, :title
>> benchmark { Topic.find(:all, :limit => 1000, :order => "title asc"); nil }
0.049945 seconds

Pretty quick for my Mac Mini. Now, we’ll drop the index:

>> ActiveRecord::Migration.remove_index :topics, :title
>> benchmark { Topic.find(:all, :limit => 1000, :order => "title asc"); nil }
1.185544 seconds (2273% change)

Clearly this is something you might want (note, benchmark method is here). But before you go adding an index to every possible column, it is important to understand what they do.

what is an index?

A table index is like a real index in a book or a magazine. Imagine you have a copy of Cosmopolitan, and you want to know which articles were written by “Justin Timberlake”. You have some choices: if you will never need to find out this information again, you can just flip through every page, in order, looking at the byline. But if this is a common search (of course it is!), you can write down on a separate piece of paper the bylines of every article in alphabetical order, and note what page each article is on. Then you can just use that paper the next time the search comes up, and if you just need to know the page you don’t even have to open the real magazine itself.

What’s the cost to this? Each time another article is added to the magazine, in this universe of dynamic magazines, you can’t just slip it into the right spot. You also have to add its byline to the big list of bylines off to the side. As you can imagine, though, since the list of bylines is already in order it’s not really a big deal. You do have to have enough desk space to keep the extra list, though.

what they can’t do

An index won’t help you when you need to know partial fields—usually. If you want articles by “Timberlake” and your index is organized like “Timberlake, Justin”, then you can just glance at half of the information and the index still helps. But you want articles by “Justin” then you have to look at every article again, unless you have a separate index for first names. So a regular index won’t help for fulltext searches unless you index every word separately (don’t do that; it would massively slow down your inserts).

rules of thumb

Primary keys are indexed by default, because indexing is also (usually) how the database enforces UNIQUE constraints. But for other fields, you usually want to index:

  • Any field used in a JOIN—usually foreign keys (for Rails, this means fields ending in _id). The :include ActiveRecord key generates JOIN clauses.
  • Any field used in an ORDER BY clause, for instance, with an ActiveRecord :order key.
  • Any field used in a GROUP BY clause, for instance, with the ActiveRecord :group key.
  • Any field used in a validates_uniqueness_of check, unless it’s really big.

What about multi-column indexes? An index is like a treasure map to a row, and sometimes you need to take multiple steps. If you have a query that uses multiple fields to hone in on some particular set of records, you can add a multi-column index:

add_index :tagging, [:item_id, :item_type]

In this example from real life, we have a polymorphic association. An optimized join to the target table, such as that generated by has_many_polymorphs, will hinge on two columns. Indexing both at once will gain us some speed, because MySQL can only use one index per query. In creating this index, I started with the column that is most unique. But if you are also :order-ing a lot based on the other column, you could start with that.

have some explaining to do

A multi-column index can also be used in place of partial, less specific indexes, to avoid duplication. For example, an index that goes [item_id, item_type, item_pos] can be used if we just need to ORDER or SELECT based on the item_id and then the item_type, or just the item_id. But it cannot be used if we need to select on the item_type but not the item_id. The specificity has an order. If you’re not sure whether or not a particular query can use your index or not, add the index and then use the MySQL EXPLAIN command:

>> ActiveRecord::Base.logger = Logger.new(STDOUT)
>> Topic.find(:all, :limit => 1000, :order => "title asc"); nil
  Topic Load (0.003195)
  SELECT * FROM topics ORDER BY title asc LIMIT 1000

There’s the generated SQL, so we can now ask for an explanation even from within the Rails console:

>> s = "SELECT * FROM topics ORDER BY title asc LIMIT 1000"
>> puts `echo "EXPLAIN #{s}" | mysql -u root app_development`
id select_type table  type  possible_keys key  key_len ref  rows   Extra
1  SIMPLE      topics ALL   NULL          NULL NULL    NULL 357383 Using filesort

And again, but with the index added:

>> ActiveRecord::Migration.add_index :topics, :title
>> puts `echo "EXPLAIN #{s}" | mysql -u root app_development`
id  select_type table  type  possible_keys key                     key_len ref  rows   Extra
1   SIMPLE      topics index NULL          index_topics_on_title   257     NULL 357383

See that key column? For MySQL, key and index mean the same. In the first EXPLAIN, we see that no key is being used. Then we add the index, and yes, our index gets used. Good news.

The behavior of the possible_keys column doesn’t seem to match what I read in the docs. Maybe someone can clear this up for me.

fly away

That’s about the size of it. To find potential index points, watch your MySQL slow query log or your Rails production.log for slow requests. Benchmark when you’re unsure of a decision, and beware the query cache when you do. And be ready for some pleasant performance gains.

log system security events to twitter

Ok, so cdcarter’s server got compromised just recently. The attacker deleted a bunch of logs, so we don’t really know what went down.

you have got to be kidding

No, I’m not. There’s a really nice twitter gem all ready to go. So, here’s the core of our app, just 13 lines:

twit = Twitter::Base.new config[:user], config[:password]
tail = "tail -n 200 /var/log/auth.log | grep -v '(pam_unix)'" # optionally filter some events

while (sleep 1)
  if @last_msg
    msgs = `#{tail}`.split("\n")
    msgs = msgs[msgs.index(@last_msg)+1..-1] if msgs.include? @last_msg
    msgs.map{|n| twit.update n }
    @last_msg = msgs.last unless msgs.empty?
  else
    twit.update "Twist rebooted at #{Time.now}"
    @last_msg = `#{tail}`.split("\n").last
  end
end

twitter + system = twist

Check out the code from the Fauna repository:

svn co svn://rubyforge.org/var/svn/fauna/twist

Create a configuration file, /etc/twist.yml:

---
:sysuser: localuser
:user: twitteruser
:password: twittersecret

If the :sysuser is not root, you will have to make sure /var/log/auth.log as well as /etc/twist.rb are readable by the :sysuser. I recommend using a non-privileged :sysuser and adding it to a group that has permissions to read those two files.

Symlink twist.rb into /etc/init.d, set its permissions, and install it as a boot service (Ubuntu specific, your requirements may vary):

cd /etc/init.d/
sudo ln -s /home/you/twist/twist.rb twist.rb
sudo chown root twist.rb
sudo chgrp root twist.rb
sudo /usr/sbin/update-rc.d twist.rb defaults

Run /etc/init.d/twist.rb start to start it immediately.

but this isn’t secure!

It’s only insecure if you leave your twitterings set to public, since you can use the friend system as a privacy control. Also, it is certainly possible for an attacker to notice your Twist setup, see the password, and go in to Twitter and start deleting entries. But if you have your personal Twitter user “follow” the server user, and receive updates by SMS, then you will have a permanent, real-time log on your phone.

It’s not supposed to be the end-all of server monitoring. But it’s pretty fun.

how to find the most popular tags

As a faithful follower of the growing up article, you now have a pretty nice custom tagging system for your app. But how do you find which tags are the most popular? After all, your tag cloud is waiting!

late for the sky

Here’s a find_popular class method for you:

class Tag < ActiveRecord::Base
  # [snip]
  def self.find_popular(args = {})
    find(:all, :select => 'tags.*, count(*) as popularity',
      :limit => args[:limit] || 10,
      :joins => "JOIN taggings ON taggings.tag_id = tags.id",
      :conditions => args[:conditions],
      :group => "taggings.tag_id",
      :order => "popularity DESC"  )
  end
end

usage note

Note that because of the count(*) as popularity clause, the returned tags will have an extra field with the number of tagged objects:

>> Tag.find_popular[0]
=> #<Tag:0x3790d18 @attributes={
  "name"=>"love",
  "id"=>"2",
  "popularity"=>"3"}
>> _.popularity
=> "3"

Pretty neat.

conclusion

Implementing this will become easier once I add custom finders to the polymorphic collection, but for now, it’s not so painful to just drop to SQL.

Thanks to defunkt for working some of this out a while back and to dbii for bugging me about an explanation.

making a web app rewrite work

One of my friends was recently involved in a very large Rails rewrite project that failed pretty abjectly. Meanwhile, at CNET, we just deployed a total overhaul of a crufty old Rails app, replacing it with a new, pleasant, better-designed app. And Chad Fowler has been writing about big rewrites for a while. It must be in the water.

move move move

It is critical, during a rewrite, to get new code into production as fast as possible. Plan for this. Immediate, visible results are more persuasive and motivating then any projected maintenance benefit or performance increase. And deploying code will expose bad design decisions right away, rather than months down the road.

But this means we have to suffer some pain up front to keep the project flying, because the new application will have to work in parallel with the old one for quite some time. Our new app will have to maintain data compatibility with the old. We will have to connect to the same database as the old…and use the same schema.

noooo, our fancy new domain model, noooo

RDBM systems are separate from applications for a reason. We need to use this separation to our advantage. Instead of trying to maintain API compatibility while fragmenting the data store, we can use the database as the interoperability layer, and focus on deploying new features. Old site written in PHP? Doesn’t matter. The database still talks only SQL.

Using the old schema will not break our fancy new domain model. It is relatively straightforward, if tedious, to graft a new domain model onto a partially orthogonal legacy schema.

models, unite!

Say we run EBay (hey, we’re rich!). But money != developer_happiness, and our old domain model, from 1896, has buyers and sellers. We just want users now. How do we treat buyers and sellers as a single model? They might even live in separate databases!

How about:

class Buyer < ActiveRecord::Base
  set_table_name "tbuytslq"
  alias_attribute :name, :buyer_name
end

class Seller < ActiveRecord::Base
  # MySQL can jump across databases, see below
  set_table_name "salsdb.seljhnlp"
  alias_attribute :name, :nm
end

What’s going on? Besides some legacy table configuration, we have to first encapsulate the old domain so that we can merge it into the new one. The alias calls make the legacy models expose a unified API. We can then work with them in our new model without pain. (If the tables contained the same information, but organized it differently, e.g., separate first_name and last_name fields, some helper methods would take care of the conversion in and out.)

pure magic?

Now for something cool:

class User # no inheritance

  def self.method_missing *args
    sources = [Buyer, Seller]
    if args.first.to_s =~ /^(find|create|new)/
      begin
        self.new(sources.unshift.send *args)
      rescue
        retry unless sources.empty?
        raise
      end
    else
      raise "Not supported: #{args.first}"
    end
  end

  attr_accessor :proxy

  def initialize(_proxy)
     @proxy = _proxy
  end

  def method_missing *args
    proxy.send(*args)
  end

  # new business logic here

end

We can delegate the ActiveRecord operations of our new model to multiple old models, and keep the new business logic unified. Crazyness! (We might want to add some id munging, so that for example find("b10") returns a Buyer-User, and find("s10") returns a Seller-User.)

If we need to add fields to User which are not present in both Buyer and Seller, we will have to give it a table (probably in yet another database) and descend from ActiveRecord. The proxy trick will need updating:

class User < ActiveRecord::Base

  belongs_to :buyer
  belongs_to :seller

  def method_missing *args
    (buyer or seller).send(*args)
  end

  # new business logic here

end

Some method chaining and other hackery will be required to make find() work when we haven’t yet build the aggregate instance for the legacy instance we are interested in, but that will depend on specifics of the schemas.

Regarding those associated legacy instances, MySQL can make table joins and generally act sane across disparate databases, as long as they run in the same server. I don’t know about Postgres. Oracle can make table joins across disparate servers (damn; something good in Oracle), but might require some fudging to get working. If it’s impossible to get the right joins together to make an :include work in our new aggregate model, we will have to rely on aggressive caching to handle performance issues.

Of course, you can go the other way, too, and split an overly complicated model into multiple simpler ones. Just set the table name and use undef_method to mask off irrelevant fields.

back to work

Now we can set up our Apache routes to forward the appropriate requests to the new app, copy over our CSS, and start rolling out new features while leaving the old app running and in place. When we have completely replaced some functionality in the old app, we can flip a switch in Apache and let the new app take over.

And if we need to use business logic in the old app from the new app, we can make POST requests to our (old) self. I’m not even kidding. This is what the web is for.

Of course a real-life scenario will be much more complicated than this, and the code above is somewhere between pseudo-code and something we can deploy. But we are on our way.

Will some annoying changes have to be made to both apps for interoperability purposes? Probably. Will it be slower than a pristine, solo deployment? Definitely. But it will work, and it will work now, and someday when the old application is completely replaced, the database can be migrated and the legacy schema interactions can be dropped. The users won’t notice a thing—they’ve been using the rewrite code since forever.

Baby steps; always baby steps.

update

There are some more obscure features in Rails that may help you with your legacy mapping. Check the composed_of field aggregator as well as delegate.

polymorphs 21 harder faster stronger

update

old content

You can now skip class reloading in model classes referenced by has_many_polymorphs in development mode.

Normally if you modify a model, Rails will unload and reload the file. This means the parent model has to build the polymorphic relationship all over again, which can be slow. But if the child classes aren’t unloaded in the first place, we avoid this problem. Of course while you are working on the models themselves you need to switch the caching off.

how to

In config/environments/development.rb:

config.after_initialize do
  config.has_many_polymorphs_cache_classes = true
end

That’s all.

preloading

The instructions in this section are unnecessary as of version 27.1 of the plugin.

Also at the top of app/controllers/application.rb, you need:

require 'model_that_contains_the_polymorphs_call'

This makes sure the parent model is always loaded at least once. Note that STI parent classes will also get ejected from the reloadable lists, since they need to be kept around for their children to function.

If you are working in the console or migrations and need to access the methods added by the plugin, you still need to manually reference your polymorphic parent class.

download

The usual place. Rails 1.2.0 or later is required for the new features.

polymorphs 20

Release 20 of the has_many_polymorphs plugin is available.

new features

  • clear method on the polymorphic collection
  • push, <<, delete, and clear on all individual subcollections

An undocumented API was changed.

With double polymorphic relationships (say, eaters and foods) instead of using .corn_as_eater to access the Corn subcollection, you now need to use .food_corn. Referencing the join tables still uses as. This better parallels the general collection API.

reconnaissance

Also, goodenough appeared on IRC and made a report:

class Hamster < ActiveRecord::Base
  has_many_polymorphs :toys,
    :from => [:wheels, :balls],
    :through => :hamsters
end

No join table! More to come.

web2nam.es

I am pleased to release Web2Names, a little toy to help you find a domain name. It does some tricky stemming on your search term to help find nearby words. Try searching on iphone (I kinda like “iphoni.es” and “iphoned.org”.)

implementation details

The site caches results very aggressively, because querying nameservers for scores of domains at a time is slow. It uses dig ns for the lookups, which means it occasionally returns false negatives (saying a domain is available when it really isn’t) if a site’s DNS configuration is completely hosed.

Web2Names is a Camping app. It is deployed behind Apache 2.2 with multiple load-balanced mongrels. It doesn’t use a database. Instead it uses Manfred Stienstra’s page cache library.

growing up your acts_as_taggable

The acts_as_taggable plugin, although built in to Rails, is basically deprecated. And the gem version is old and requires a separate join table for every taggable model, which is silly.

update

Please see the updated has_many_polymorphs documentation, which has instructions for using the new, built-in tagging generator.

install

Uninstall whatever acts_as_taggable version you have, to prevent fighting among the plugins. Then, install has_many_polymorphs:

script/plugin install -x svn://rubyforge.org/var/svn/fauna/has_many_polymorphs/trunk

models

We will assume you are using the default acts_as_taggable models. You should have a Tag model, with a name field, as well as a Taggings join model. However, these models are part of the acts_as_taggable plugin source, whereas their migrations have to be part of your app (which is confusing). Instead, with has_many_polymorphs, we have to make the models part of the app, too:

class Tag < ActiveRecord::Base
  has_many_polymorphs :taggables,
    :from => [:books, :magazines],
    :through => :taggings,
    :dependent => :destroy
end

class Tagging < ActiveRecord::Base
  belongs_to :tag
  belongs_to :taggable, :polymorphic => true

  def before_destroy
    # disallow orphaned tags
    tag.destroy_without_callbacks if tag.taggings.count < 2
  end
end

See the line [:books, :magazines]? This line replaces all the acts_as_taggable macro calls strewn through your models. Simply list in the array the models that you want to be able to tag. (You can even tag Tags, see below.)

migrations

We need to make sure your existing tag schema fits this. It should be something like:

class AddTagSupport < ActiveRecord::Migration
  def self.up
    create_table :tags do |t|
      t.column :name, :string, :null => false
    end
    add_index :tags, :name, :unique => true

    create_table :taggings do |t|
      t.column :tag_id, :integer, :null => false
      t.column :taggable_id, :integer, :null => false
      t.column :taggable_type, :string, :null => false
    end
    add_index :taggings, [:tag_id, :taggable_id, :taggable_type], :unique => true
  end

  def self.down
    drop_table :tags
    drop_table :taggings
  end
end

If your schema isn’t like this already, you have two choices. You can add more options to the association macros in your models, to show them how to relate to your schema, or you can write a migration to convert your schema into canonical form. The first option is possibly quicker to implement, but the second one will be easier to maintain.

api

Hey, you’re done! Well, not really. The API is different, which is the biggest sticking point. We’ll write some convenience methods to mimic the old way. You should put the methods in RAILS_ROOT/lib/tag_extensions.rb or similar.

class ActiveRecord::Base
  def tag_with tags
    tags.split(" ").each do |tag|
      Tag.find_or_create_by_name(tag).taggables << self
    end
  end

  def tag_list
    tags.map(&:name).join(' ')
  end
end

Pretty straightforward, which makes it easy to modify. For example, to make creating a model from params more transparent, we could add:

alias :tags= :tag_with

Or to enforce lowercase tags, just do:

def tag_with tags
  tags.downcase.split(" ").each do |tag|
    Tag.find_or_create_by_name(tag).taggables << self
   end
end

If you want to allow tags with spaces in them, we can accept an array instead of a string:

def tag_with *tags
  tags.flatten.each do |tag|
    Tag.find_or_create_by_name(tag).taggables << self
  end
end

To delete tags (don’t forget to downcase the tag_string if you need to):

def tag_delete tag_string
  split = tag_string.split(" ")
  tags.delete tags.select{|t| split.include? t.name}
end

To get all models for a tag (and all in a single SQL query, thanks to the plugin):

Tag.find_by_name("artichoke").taggables

To get only a specific model for a tag:

Tag.find_by_name("artichoke").books

Easy and powerful. And if you need to find which tags are most popular, see here.

Make sure that you require the file containing your API methods in environment.rb, since Rails won’t load it automatically:

require 'tag_extensions'

performance note

It is more efficient if we add :skip_duplicates => false to the has_many_polymorphs :taggables call. Then the taggables for each Tag won’t get loaded at all during the <<, because there is no reason to check them.

If we do this, though, and use database constraints to enforce uniqueness, we need to manually rescue assignment errors in our tag_with method:

def tag_with tags
  tags.split(" ").each do |tag|
    begin
      Tag.find_or_create_by_name(tag).taggables << self
    rescue ActiveRecord::StatementInvalid => e
      raise unless e.to_s[/Duplicate entry/]
    end
  end
end

self-referential tagging

What if you want to be able to tag tags? This is more useful than might first appear. It lets you create an ad-hoc non-exclusive hierarchy of categories without extra models. This is much more maintainable than one based on hard-coded models for each level. But you need some way to distinguish “the tags tagged by a tag” and “the tags a tag is tagged by”—a directed graph. So we will rename the parent relationship:

class Tag < ActiveRecord::Base
  has_many_polymorphs :taggables,
    :from => [:books, :magazines, :tags],
    :through => :taggings,
    :dependent => :destroy,
    :as => :tagger
end

class Tagging < ActiveRecord::Base
  belongs_to :tagger,
             :class_name => "Tag",
             :foreign_key => "tagger_id"
  belongs_to :taggable,
             :polymorphic => true
end

Modify the migration accordingly. Now, you can use some_tag.taggables to get the targets of some_tag, and some_tag.taggers to get the tags for which some_tag is itself a target.

Note that your tag_delete method might have to be more longwinded due to this long-outstanding Rails bug. (Rails 1.2.3 might be ok now; it’s unclear.)

wrapping up

You should find, once you complete the move, that tag lookup is faster, that the API is more intuitive and in line with regular ActiveRecord, that database compatibility is improved, and that the subsystem is more extensible. Quite a benefit!

Since has_many_polymorphs is basically a huge superset of the acts_as_taggable features, there is not much that isn’t supported. Many people use has_many_polymorphs in production for tagging already. Join them!