snax

object allocations on the web

How many objects does a Rails request allocate? Here are Twitter’s numbers:

I want them to be lower. Overall, we burn 20% of our front-end CPU on garbage collection, which seems high. Each process handles ~29,000 requests before getting killed by the memory limit, and the GC is triggered about every 30 requests.

In memory-managed languages, you pay a performance penalty at object allocation time and also at collection time. Since Ruby lacks a generational GC (although there are patches available), the collection penalty is linear with the number of objects on the heap.

a note about structs and immediates

In Ruby 1.8, Struct instances use fewer bytes and allocate less objects than Hash and friends. This can be an optimization opportunity in circumstances where the Struct class is reusable.

A little bit of code shows the difference (you need REE or Sylvain Joyeux’s patch to track allocations):

GC.enable_stats
def sizeof(obj)
  GC.clear_stats
  obj.clone
  puts "#{GC.num_allocations} allocations"
  GC.clear_stats
  obj.clone
  puts "#{GC.allocated_size} bytes"
end

Let’s try it:

>> Struct.new("Test", :a, :b, :c)
>> struct = Struct::Test.new(1,2,3)
=> #<struct Struct::Test a=1, b=2, c=3>
>> sizeof(struct)
1 allocations
24 bytes

>> hash = {:a => 1, :b => 2, :c => 3}
>> sizeof(hash)
5 allocations
208 bytes

Watch out, though. The Struct class itself is expensive:

>> sizeof(Struct::Test)
29 allocations
1216 bytes

In my understanding, each key in a Hash is a VALUE pointer to another object, while each slot in a Struct is merely a named position.

Immediate types (Fixnum, nil, true, false, and Symbol) don’t allocate, except for Symbol. Symbol is interned and keeps its string representations on a special heap that is not garbage-collected.

your turn

If you have allocation counts from a production web application, I would be delighted to know them. I am especially interested in Python, PHP, and Java.

Python should be about the same as Ruby. PHP, though, discards the entire heap per-request in some configurations, so collection can be dramatically cheaper. And I would expect Java to allocate fewer objects and have a more efficient collection cycle.