object allocations on the web

How many objects does a Rails request allocate? Here are Twitter’s numbers:

  • API: 22,700 objects per request
  • Website: 67,500 objects per request
  • Daemons: 27,900 objects per action

I want them to be lower. Overall, we burn 20% of our front-end CPU on garbage collection, which seems high. Each process handles ~29,000 requests before getting killed by the memory limit, and the GC is triggered about every 30 requests.

In memory-managed languages, you pay a performance penalty at object allocation time and also at collection time. Since Ruby lacks a generational GC (although there are patches available), the collection penalty is linear with the number of objects on the heap.

a note about structs and immediates

In Ruby 1.8, Struct instances use fewer bytes and allocate less objects than Hash and friends. This can be an optimization opportunity in circumstances where the Struct class is reusable.

A little bit of code shows the difference (you need REE or Sylvain Joyeux’s patch to track allocations):

def sizeof(obj)
  puts "#{GC.num_allocations} allocations"
  puts "#{GC.allocated_size} bytes"

Let’s try it:

>> Struct.new("Test", :a, :b, :c)
>> struct = Struct::Test.new(1,2,3)
=> #<struct Struct::Test a=1, b=2, c=3>
>> sizeof(struct)
1 allocations
24 bytes

>> hash = {:a => 1, :b => 2, :c => 3}
>> sizeof(hash)
5 allocations
208 bytes

Watch out, though. The Struct class itself is expensive:

>> sizeof(Struct::Test)
29 allocations
1216 bytes

In my understanding, each key in a Hash is a VALUE pointer to another object, while each slot in a Struct is merely a named position.

Immediate types (Fixnum, nil, true, false, and Symbol) don’t allocate, except for Symbol. Symbol is interned and keeps its string representations on a special heap that is not garbage-collected.

your turn

If you have allocation counts from a production web application, I would be delighted to know them. I am especially interested in Python, PHP, and Java.

Python should be about the same as Ruby. PHP, though, discards the entire heap per-request in some configurations, so collection can be dramatically cheaper. And I would expect Java to allocate fewer objects and have a more efficient collection cycle.

6 responses

  1. Generally allocations take a lot of time on their own. Freeing these allocations is also a problem. For example, I had a small bit of code where it would run N iterations per second, but if I allocated an array of a few million immediate values (large but not huge) the iterations on my benchmark dropped to about N/2. My code was only allocating about 10 objects per iteration. The reason for the drop is that some of these objects were rather large (10k or more in some cases).

    This deals with heap storage efficiency, suboptimal GC scanning, reallocation (a bigger problem than many consider when you just grow objects, which is almost as bad as allocating new ones), dead weight in the AST, etc.

    So it’s hard to quantify on just the measurement of allocations. In Java, for example, the copying collectors can do very very cheap allocation of short lived objects so it might allocate many objects but use something like the semispace GC technique to keep scanning overhead to a minimum.

    When I tune my Ruby apps I check allocations first as they trigger the GC. But the GC still fires and it still needs to be fast, so the next step is to lighten things because you might not allocate a lot but the GC time doesn’t disappear.

    To improve GC runtime itself, I recommend avoiding large amounts of rarely used/dead code. Just don’t load code you don’t need. ActiveSupport did a good thing by making things lazily loaded but also allowing loaded code to be trashed is a good idea if you know you are done. Next, check for objects you rarely touch or use and decide if they should be there at all.

    I’m surprised at how much GC time can grow, but a 200+ MB heap just sucks so I try to design apps around working below that number. If it goes above that I try to refactor the application into services. Small things running around or under 100MB will do much better. If they leak or grow to large to often, then cut down start up time so you can fork new workers and kill off large processes. Not pretty and I don’t recommend killing rapidly but it works and it improves performance.

    Anyway, those are my thoughts. Thanks for the post. I’ll try to collect some numbers on apps I have here sometime but life is busy and it takes time to setup a clean environment for measurement. And now I get to pack, yay for moving. ;-)

  2. Working in Java, the cost of allocating short-lived objects is really small. Allocation just involves pushing a pointer in eden forward sizeof(object), and the vast majority of those objects are reclaimed by the young-generation copying collector.

    For this reason, I would expect a well-written Java app to allocate more objects than an equally well-written Ruby app, simply because developers can worry less about the cost of doing so.

  3. One advantage to JRuby, of course, is that you get to write in Ruby but take advantage of all the Java runtime infrastructure like the garbage collector.

    But even there, the less garbage, the better.

  4. On a somewhat related note, I have been experimenting with an IdentityMap in rails.

    Here is my incomplete, minimally tested and not ready for production code: http://gist.github.com/215925. From the tests I have made with this, the requests are quicker (less allocations of AR::Base and all of the objects related to instantiating a model) and there is no need for a reload method (less DB hits).

    This is more useful in an app where are a lot of circular relationships that all get loaded in the same request. But it will also help in the case that the same object gets loaded for each object in a collection. For example, a collection of Book objects may all share the same Author object. With the identity_map all of the Authors that have the same record id will be exactly the same object, no duplicates.

  5. Interesting observations, however I do think in the current Ruby ecosystem it’s somewhat dangerous to focus on interpreter specific optimizations and not generalize solutions.

    I, along with a few coworkers, think the underlying issue is:

    a) The state of stdlib, and it being set in stone, pretty much moving forward. lib/logger.rb & lib/time.erb specifically, given its widespread framework use

    b) Lack of either Ruby core or stdlib knowledge that leads to a cycle of 20 github repos all doing the same thing, but to varying degrees of efficiency.

    c) Framework authors/core contributors shooting patches down as “micro optimizations” I salute Merb and wycats for being totally anal about this—framework = critical path, always.

    I’ve experimented with allocations at the libc level some time ago (I understand it’s mostly the creation side of the equation) and many seems to think a mmap’ed region exclusively for strings could work wonders.

    Runtime hints/hard raises for exceeding X allocations per method dispatch on class Y in a staging environment is interesting also.

  6. An interesting snippet from Daniel Berger transforms hashes to structs on the fly :

    class Hash
       def to_struct(struct_name)

    Example usage:

    foo = {:name=>"Dan","age"=>33}.to_struct("Foo")
    puts "name:  #{foo.name}"
    puts "age: #{foo.age}"

    The snippet can be found here.