snax

scaling rails

How many objects does a Rails request allocate? Here are Twitter's numbers:

  • API: 22,700 objects per request
  • Website: 67,500 objects per request
  • Daemons: 27,900 objects per action

I want them to be lower. Overall, we burn 20% of our front-end CPU on garbage collection, which seems high. Each process handles ~29,000 requests before getting killed by the memory limit, and the GC is triggered about every 30 requests.

In memory-managed languages, you pay a performance penalty at object allocation time and also at collection time. Since Ruby lacks a generational GC (although there are patches available), the collection penalty is linear with the number of objects on the heap.

a note about structs and immediates

In Ruby 1.8, Struct instances use fewer bytes and allocate less objects than Hash and...

October 21, 2009

I've released Scribe 0.1, a Ruby client for the Scribe remote log server.

sudo gem install scribe

Usage is simple:

client = Scribe.new
client.log("I'm lonely in a crowded room.", "Rails")

Documentation is here.

about scribe

The primary benefit of Scribe over something like syslog-ng is increased scalability, because of Scribe's fundamentally distributed architecture. Scribe also does away with the legacy syslog alert levels, and lets you define more application-appropriate categories on the fly instead.

Dmytro Shteflyuk has good article about installing the Scribe server itself on OS X. It would be nice if someone would put it in MacPorts, but it may be blocked on the release of Thrift.

September 30, 2009

We recently migrated Twitter from a custom Ruby 1.8.6 build to a Ruby Enterprise Edition release candidate, courtesy of Phusion. Our primary motivation was the integration of Brent's MBARI patches, which increase memory stability.

Some features of REE have no effect on our codebase, but we definitely benefit from the MBARI patchset, the Railsbench tunable GC, and the various leak fixes in 1.8.7p174. These are difficult to integrate and Phusion has done a fine job.

testing notes

I ran into an interesting issue. Ruby is faster if compiled with -Os (optimize for size) than with -O2 or -O3 (optimize for speed). Hongli pointed out...

September 24, 2009

One of the hardest gems to install is no more. It's now easy to install!

Memcached 0.15 features:

  • Update to libmemcached 0.31.1
  • Bundle libmemcached itself with the gem (antifuchs)
  • UDP connection support
  • Unix domain socket support (hellvinz)
  • AUTO_EJECT_HOSTS bugfixes (mattknox)

Install with gem install memcached. Since libmemcached is bundled in, there are no longer any dependencies.

on coordination

Andreas Fuchs suggested several months ago that I include libmemcached itself in the gem, but at the time I resisted. I was wrong.

My opposition was based on the idea that libmemcached itself would be an integration point, so running multiple versions on a system would be bad...

August 04, 2009

Cassandra is a hybrid non-relational database in the same class as Google's BigTable. It is more featureful than a key/value store like Dynomite, but supports fewer query types than a document store like MongoDB.

Cassandra was started by Facebook and later transferred to the open-source community. It is an ideal runtime database for web-scale domains like social networks.

This post is both a tutorial and a "getting started" overview. You will learn about Cassandra's features, data model, API, and operational requirements—everything you need to know to deploy a Cassandra-backed service.

Jan 8, 2010: post updated for Cassandra gem 0.7 and Cassandra version 0.5.

features

There are a number of reasons to choose Cassandra for your website. Compared to other databases, three big features stand out:

  • Flexible schema: with Cassandra, like a document store, you don't have to decide what fields you need in your records ahead of time. You can add and remove arbitrary fields on the fly. This is an incredible productivity boost, especially in large deployments.
  • True scalability: Cassandra scales horizontally in the purest sense. To add more capacity to a cluster, turn on another machine. You don't have restart any processes, change your application queries, or manually relocate any data.
  • Multi-datacenter awareness: you can adjust your node layout to ensure that if one datacenter burns in a fire, an alternative datacenter will have at least one full copy of every record.

Some other features that help put Cassandra above the competition ...

July 02, 2009

I've been reading a bunch of papers about distributed systems recently, in order to help systematize for myself the thing that we built over the last year. Many of them were originally passed to me by Toby DiPasquale. Here is an annotated list so everyone can benefit.

It helps if you have some algorithms literacy, or have built a system at scale, but don't let that stop you.

prelude

The Death of Architecture, Julian Browne, 2007.

First, a reminder of what it means to build a system that solves a business problem. Browne built the space-based billing system at Virgin Mobile, one of the most well-known SBAs outside the financial and research industries.

That lovely diagram showing clean service-oriented interfaces, between unified systems of record, holding clean data, performing well-defined business processes is never going to be....You have to roll up your sleeves, talk to the business analysts, developers, operations and make a contribution that makes those boxes and arrows real.

System failures in the web world are most often due to inflated technical requirements and integration risks, not an incorrect decision to skip two-phase commit...

May 02, 2009