Well, it's been a long time. But! I have four papers to add to my original distributed systems primer:
coordination
CRDTs: Consistency Without Concurrency Control, Mihai Letia, Nuno Preguiça, and Marc Shapiro, 2009.
Guaranteeing eventual consistency by constraining your data structure, rather than adding heavyweight distributed algorithms. FlockDB works this way.
partitioning
Scaling Online Social Networks Without Pains , Josep M. Pujol, Georgos Siganos, Vijay Erramilli, and Pablo Rodriguez, 2010.
Optimally partitioning overlapping graphs through lazy replication. Think of applying this technique at a cluster level, not just a server level.
There's a better version of this paper titled "The Little Engines That Could"; I'll update the post when it's generally available.
Feeding Frenzy: Selectively Materializing Users' Event Feeds, Adam Silberstein, Jeff Terrace, Brian F. Cooper, and Raghu Ramakrishnan, 2010.
Judicious session management and application of domain knowledge allow for optimal high-velocity mailbox updates in a memory grid. Twitter's timeline system works this way.
systems integration
Dapper, a Large-Scale Distributed Systems Tracing Infrastructure, Benjamin H. Sigelman, Luiz André Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan, and Chandan Shanbhag, 2010.
Add a transaction-tracking, sampling profiler to a reusable RPC framework and get full stack visibility without performance degradation.
Happy scaling. Make sure to read the original post if you haven't.
August 12, 2010
