ultimate architecture


  • every action loads only a few hyper-denormalized objects (ideally, one)
  • proxy chooses app server by predicting memcached object location
  • total memcached size is bigger than persistence store
  • write to memcached is synchronous with request; write-through to store is asynchronous
  • can use an MQ to avoid race writes (almost never important)


  • infinite scale-out


18 responses

  1. I think I heard your marker squeaking out that diagram earlier. Now, I can tell my kids what it was like to be there at the birth of the future.

    Oh, and I think legally this makes me a cofounder. Let me know when we’re rich.

  2. Preach on. I’ve been trying to fight the good fight on this because people want to stick to the same old box.

    I even think if you used a smart DNS scheme, you could remove the need for the proxy in front for the majority of apps.

    If Amazon would expand their EC2 platform to multiple data centers or you skip that part and use multiple data centers yourself you could add “completely fault tolerant” and “geographically distributed” to the benefits.

  3. I too have my eye on EC2. I like to have the director be a proxy instead of using DNS load-balancing, though, to avoid round-tripping at least one memcached read on the relatively high-latency internal Amazon network.

    Joins belong in the app, people.

  4. Just keep a write-updated index. In a web app, you always know the types of queries your users will make ahead of time.

    More unpredictable queries (e.g. full-text) need an external indexer/daemon anyway.

  5. There are lots of queries a database can do without an external indexer that are awkward, at best, on a key/value store. Or maybe I’m missing something?

    Full text is actually easier: just store the inverse list for each word in key-value store. Or by n-gram if you want fuzzy match.

    Range queries get harder though…best scheme I’m aware of is PHT, and I don’t exactly relish the idea of implementing that.

    It’d help if S3 supported conditional writes so we could use a query/update protocol. Atomic append would be nice, too. You might get pretty far just with prefix list though.

  6. Can you give an example of a query that couldn’t be easily pre-calculated? I’m thinking more in terms of write-time updating the result sets for preset queries (e.g. zipcode placements, paginated lists) rather than trying to make a perfectly general prefix tree.

    Also, I’m in favor of S3 staying simple. If you write asynchronously you can just bomb out updates and not care if the data is changed (removes an entire network round-trip from the process).

    And atomic appends—we have message queues for that.

  7. Any multipoint query with arbitrary constants. Is this concept really so hard we need examples? Let’s say we’re doing a travel website and want to let users find all events of a particular category for a particular date range. I imagine your initial response will be to just fix the ranges so as to allow a static hierarchy. Take it as a business constraint that this isn’t acceptable. What do you do?

    I don’t think you understood what I meant by conditional writes. It’s a form of optimistic concurrency control, not a performance optimization. For example, if we have a web store with an inventory count, with a conditional write we can check the count, find there’s enough for an order, and write back the modified count, but with a conditional tag from the first read that causes the write to fail if any concurrent writes also updated the value. It’s simple and fits well into the http picture of the world. It’s also a primitive sufficient to implement any concurrency control you like on top, all the way up to byzantine quorums, which could be quite useful in a clustered environment like EC2.

    I don’t see how a message queue implements an atomic append to my data if S3 is my data store. Sure, I can put a work item in the queue to defer the update, but to append data to a resource in S3, I first have to read it, add my data, then write it back. Due to the lack of conditional writes, something else can do the same thing in an overlapped way and erase my write. Conditional writes are sufficient to prevent this corruption, but atomic appends are a simple optimization to avoid the backoff/resolution protocol.

    And before you call me a kook, note that other people find this useful enough to look at implementing it in lightweight systems.

  8. Ahh, wait, i missed something. You’re suggesting serializing writes through the queue. That is more interesting. Hashing resources to queues should let us scale write processors while keeping per-item serialization.

  9. Any concept can benefit from an example. :)

    In my view, the efficiency of any type of index is dependent on the maximum granularity of the query and the size of the global scope. So yes, implementing a fast, multi-faceted arbitrary query interface would be a pain.

    Some solutions might be:

    restrict the ranges to day-size quanta (you say unacceptable)
    restrict the scope to per-city (a sort of index sharding)
    Or, use an external faceted indexer like Lucene, Hyper Estraier, Sphinx/XMLpipe, etc., which people do anyway for SQL-based systems because SQL isn’t good enough. The situation hasn’t gotten any worse.

    I understand now your point about conditional writes. That definitely could be useful, although in my scheme it would need to happen in memcached rather than S3. S3 reads should only happen during recovery from catastrophic failure.

    In regard to your last comment: yes, exactly.