<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>snax</title>
	<atom:link href="http://blog.evanweaver.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.evanweaver.com</link>
	<description>on software</description>
	<lastBuildDate>Tue, 10 Jan 2012 23:44:56 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='blog.evanweaver.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>snax</title>
		<link>http://blog.evanweaver.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://blog.evanweaver.com/osd.xml" title="snax" />
	<atom:link rel='hub' href='http://blog.evanweaver.com/?pushpress=hub'/>
		<item>
		<title>ideal hdtv settings for xbox 360</title>
		<link>http://blog.evanweaver.com/2012/01/10/ideal-hdtv-settings-for-xbox-360/</link>
		<comments>http://blog.evanweaver.com/2012/01/10/ideal-hdtv-settings-for-xbox-360/#comments</comments>
		<pubDate>Tue, 10 Jan 2012 21:34:02 +0000</pubDate>
		<dc:creator>evan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.evanweaver.com/?p=2106</guid>
		<description><![CDATA[My XBox 360 broke, and since my new one supported HDMI, I reworked the connection to the TV (a Samsung PN50A450 plasma). It&#8217;s tricky to get the best performance out of the combination so I wanted to mention it here. scalers Even though the HDMI connection is digital, both the XBox and the TV have [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.evanweaver.com&amp;blog=18067431&amp;post=2106&amp;subd=evanweaver&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>My XBox 360 broke, and since my new one supported HDMI, I reworked the connection to the TV (a Samsung PN50A450 plasma). It&#8217;s tricky to get the best performance out of the combination so I wanted to mention it here.</p>
<h2>scalers</h2>
<p>Even though the HDMI connection is digital, both the XBox and the TV have hardware scalers that degrade the signal. The conversion chain works like this:</p>
<p style="text-align:left;padding-left:30px;">Game resolution (for Battlefield 3, <code>704p</code>) → XBox HD resolution (for standard HD, <code>720p</code>) → TV native resolution (for this Samsung, <code>768p</code>)</p>
<p style="text-align:left;">Remember the XBox is essentially a Windows PC and games can choose <a href="http://forum.beyond3d.com/showthread.php?t=46241">whatever resolution they please</a>. Now, in a normal PC, the resolution requested by the game would be transmitted directly to the monitor, and the monitor&#8217;s scaler would scale it. If the game chooses the monitor&#8217;s native resolution then there is no scaling.</p>
<p style="text-align:left;">Also remember that a 720p-labeled TV doesn&#8217;t necessarily mean the TV&#8217;s native resolution is <code>720p</code>. It&#8217;s just <code>720p</code> &#8220;class&#8221;.</p>
<p style="text-align:left;">We can&#8217;t eliminate scaling entirely because we can&#8217;t change the game&#8217;s resolution, but we can still remove one scaler from the chain by having the XBox scale to the native TV resolution. When that happens, the Samsung shuts off its internal scaler (which also handles some post-processing effects). This gives us much sharper detail and reduces the notorious HDTV display latency.</p>
<p style="text-align:left;">How to do it:</p>
<ol>
<li>Connect the XBox to the TV via HDMI on <code>HDMI channel 2</code>.</li>
<li>Go into <code>Settings</code> → <code>Console Settings</code> → <code>Display</code> → <code>HDTV Settings</code> on the XBox and choose <code>1360 x 768</code>. If the setting isn&#8217;t available, it means you&#8217;re connected to the wrong HDMI port.</li>
</ol>
<p><a href="http://evanweaver.files.wordpress.com/2012/01/6675101191_43ded9c74d.jpg"><img class="aligncenter size-full wp-image-2109" title="Xbox 360 HDTV Display Settings" src="http://evanweaver.files.wordpress.com/2012/01/6675101191_43ded9c74d.jpg" alt="" width="490" height="312" /></a></p>
<p>Crispy pixels! This configuration also works for VGA output if your XBox doesn&#8217;t support HDMI.</p>
<p>You can also tell that you&#8217;re running at the native resolution via the TV&#8217;s menu, because the <code>Detailed Settings</code> option will be grayed out. This is because the TV&#8217;s scaler is not running.</p>
<h2>colors</h2>
<p>Now we need to open up the color response range of the TV while it&#8217;s in native resolution in order to return to high contrast. Since the scaler/post-processor is not running, my TV at least can&#8217;t do the usual 16-235 levels remapping for video signals.</p>
<p>Go into the service menu on the Samsung. (Dangerous! Stay away from anything that says &#8220;calibration&#8221; or your TV can become unusable).</p>
<p><a href="http://evanweaver.files.wordpress.com/2012/01/6675783291_9c02233f61.jpg"><img class="aligncenter size-full wp-image-2128" title="Samsung PN50A450 Service Menu" src="http://evanweaver.files.wordpress.com/2012/01/6675783291_9c02233f61.jpg" alt="" width="478" height="368" /></a></p>
<ol>
<li>With the TV off, press <code>MUTE 1 8 2 POWER</code> on the remote.</li>
<li>Using the up and down menu keys, choose <code>ADC Target</code>.</li>
<li>Use the following settings for <code>1st PC</code>, <code>2nd PC</code>, and <code>2nd HDMI</code>:
<ol>
<li><code>Low: 0</code></li>
<li><code>High: 255</code></li>
<li><code>Delta: 0</code></li>
</ol>
</li>
<li>Press <code>MUTE MUTE POWER</code> on the remote to save your settings.</li>
</ol>
<p>&nbsp;<br />
Go into <code>Settings</code> → <code>Console Settings</code> → <code>Display</code> → <code>Reference Levels</code> on the XBox and set it to <code>Expanded</code>. Also set <code>HDMI Color Space</code> to <code>RGB</code>.</p>
<h2>references</h2>
<ul>
<li>PN50A450 <a href="http://www.avsforum.com/avs-vb/showthread.php?t=1019776">owner&#8217;s thread</a> on AVS Forum</li>
<li>XBox 360 <a href="http://www.highdefforum.com/gaming-systems/84647-xbox-360-display-settings-hdmi-color-reference-levels.html">color levels</a> on High Def Forum</li>
</ul>
<p>Also note that the Samsung service menu will display the resolution the TV is running at, which is handy.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/evanweaver.wordpress.com/2106/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/evanweaver.wordpress.com/2106/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/evanweaver.wordpress.com/2106/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/evanweaver.wordpress.com/2106/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/evanweaver.wordpress.com/2106/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/evanweaver.wordpress.com/2106/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/evanweaver.wordpress.com/2106/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/evanweaver.wordpress.com/2106/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/evanweaver.wordpress.com/2106/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/evanweaver.wordpress.com/2106/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/evanweaver.wordpress.com/2106/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/evanweaver.wordpress.com/2106/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/evanweaver.wordpress.com/2106/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/evanweaver.wordpress.com/2106/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.evanweaver.com&amp;blog=18067431&amp;post=2106&amp;subd=evanweaver&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.evanweaver.com/2012/01/10/ideal-hdtv-settings-for-xbox-360/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">evanweaver</media:title>
		</media:content>

		<media:content url="http://evanweaver.files.wordpress.com/2012/01/6675101191_43ded9c74d.jpg" medium="image">
			<media:title type="html">Xbox 360 HDTV Display Settings</media:title>
		</media:content>

		<media:content url="http://evanweaver.files.wordpress.com/2012/01/6675783291_9c02233f61.jpg" medium="image">
			<media:title type="html">Samsung PN50A450 Service Menu</media:title>
		</media:content>
	</item>
		<item>
		<title>memcached gem performance across VMs</title>
		<link>http://blog.evanweaver.com/2011/09/23/memcached-gem-performance-across-vms/</link>
		<comments>http://blog.evanweaver.com/2011/09/23/memcached-gem-performance-across-vms/#comments</comments>
		<pubDate>Fri, 23 Sep 2011 10:22:04 +0000</pubDate>
		<dc:creator>evan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.evanweaver.com/?p=1871</guid>
		<description><![CDATA[Thanks to Evan Phoenix, memcached.gem 1.3.2 is compatible with Rubinius again. I have added Rubinius to the release QA, so it will stay this way.  The master branch is compatible with JRuby, but a JRuby segfault (as well as a mkmf bug) prevents it from working for most people. vm comparison Memcached.gem makes an unusual [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.evanweaver.com&amp;blog=18067431&amp;post=1871&amp;subd=evanweaver&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Thanks to Evan Phoenix, <a href="http://rubygems.org/gems/memcached"><code>memcached.gem</code> 1.3.2</a> is compatible with Rubinius again. I have added Rubinius to the release QA, so it will stay this way. </p>
<p>The <a href="https://github.com/fauna/memcached">master branch</a> is compatible with JRuby, but a JRuby segfault (as well as a <code>mkmf</code> bug) prevents it from working for most people.</p>
<h2>vm comparison</h2>
<p><code>Memcached.gem</code> makes an unusual benchmark case for VMs. The gem is highly optimized in general, and specially optimized for MRI. This means it will tend to not reward speedups of &#8220;dumb&#8221; aspects of MRI because it doesn&#8217;t exercise them—contrary to many micro-benchmarks.</p>
<pre>                                          user     system      total        real
JRuby-head
set: libm:ascii                       2.440000   1.760000   4.200000 (  8.284000)
get: libm:ascii                       [SEGFAULT]

RBX-head
set: libm:ascii                       1.387198   1.590912   2.978110 (  6.576674)
get: libm:ascii                       2.076829   1.705302   3.782131 (  7.237497)

REE 1.8.7-2011.03
set: libm:ascii                       1.130000   1.530000   2.660000 (  6.331992)
get: libm:ascii                       1.250000   1.540000   2.790000 (  6.142529)

Ruby 1.9.2-p290
set: libm:ascii                       0.860000   1.490000   2.350000 (  5.917467)
get: libm:ascii                       1.030000   1.580000   2.610000 (  6.238965)</pre>
<p>JRuby&#8217;s performance is surprisingly OK, but only once Hotspot has been convinced to JIT the function to native code (which the benchmark does ahead of time). Rubinius&#8217;s performance is good. Ruby 1.9.2 is the fastest.</p>
<h2>jruby client comparison</h2>
<p>Curiously, <code>memcached.gem</code> is the fastest Ruby memcached client on every VM <em>including</em> JRuby. It is 70% faster than <code>jruby-memcache-client</code>, which wraps <a href="https://github.com/gwhalin/Memcached-Java-Client/wiki/">Whalin&#8217;s Java client</a> via JRuby&#8217;s Java integration:</p>
<pre>memcached 1.3.3
remix-stash 1.1.3
jruby-memcache-client 1.7.0
dalli 1.1.2
                                          user     system      total        real
set: dalli:bin                       10.720000   7.250000  17.970000 ( 17.859000)
set: libm:ascii                       2.440000   1.760000   4.200000 (  8.284000)
set: libm:bin                         2.280000   1.960000   4.240000 (  8.600000)
set: mclient:ascii                    4.150000   3.010000   7.160000 ( 11.879000)
set: stash:bin                        5.870000   2.970000   8.840000 ( 13.677000)</pre>
<h2>conclusion</h2>
<p>This is great performance for C extensions in JRuby and Rubinius both. It&#8217;s handy that MRI&#8217;s extension interface is so simple.</p>
<p>One possible performance improvement remains in <code>memcached.gem</code> itself, which is rewriting the bundled copy of <a href="http://libmemcached.org/">libmemcached</a> to talk directly to Ruby instead of via SWIG, which introduces memory copy overhead.</p>
<p>Also, someone needs to write a faster client for JRuby; there&#8217;s no reason why binding to a good native library like Whalin&#8217;s or <a href="http://code.google.com/p/xmemcached/">xmemcached</a> should be slow. It should be possible to equal the speed of <code>memcached.gem</code> on Ruby 1.9.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/evanweaver.wordpress.com/1871/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/evanweaver.wordpress.com/1871/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/evanweaver.wordpress.com/1871/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/evanweaver.wordpress.com/1871/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/evanweaver.wordpress.com/1871/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/evanweaver.wordpress.com/1871/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/evanweaver.wordpress.com/1871/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/evanweaver.wordpress.com/1871/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/evanweaver.wordpress.com/1871/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/evanweaver.wordpress.com/1871/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/evanweaver.wordpress.com/1871/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/evanweaver.wordpress.com/1871/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/evanweaver.wordpress.com/1871/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/evanweaver.wordpress.com/1871/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.evanweaver.com&amp;blog=18067431&amp;post=1871&amp;subd=evanweaver&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.evanweaver.com/2011/09/23/memcached-gem-performance-across-vms/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">evanweaver</media:title>
		</media:content>
	</item>
		<item>
		<title>simplicity</title>
		<link>http://blog.evanweaver.com/2011/07/25/simplicity/</link>
		<comments>http://blog.evanweaver.com/2011/07/25/simplicity/#comments</comments>
		<pubDate>Tue, 26 Jul 2011 06:49:58 +0000</pubDate>
		<dc:creator>evan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.evanweaver.com/?p=1996</guid>
		<description><![CDATA[Maximizing simplicity is the only guaranteed way to minimize software maintenance. Other techniques exist, but are situational. No complex system will be cheaper to maintain than a simple one that meets the same goals. &#8216;Simple&#8217;, pedantically, means &#8216;not composed of parts&#8217;. However! Whatever system you are working on may already be a part of a whole. [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.evanweaver.com&amp;blog=18067431&amp;post=1996&amp;subd=evanweaver&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Maximizing simplicity is the only guaranteed way to minimize software maintenance. Other techniques exist, but are situational. No complex system will be cheaper to maintain than a simple one that meets the same goals.</p>
<p>&#8216;Simple&#8217;, pedantically, means &#8216;not composed of parts&#8217;. However! Whatever system you are working on may already be a part of a whole. Your output should reduce the number and size of parts <em>over all</em>, not just in your own project domain.</p>
<div id="attachment_2000" class="wp-caption aligncenter" style="width: 159px"><a href="http://en.wikipedia.org/wiki/Electra"><img class="size-medium wp-image-2000 " title="Electra at the Tomb of Agamemnon" src="http://evanweaver.files.wordpress.com/2011/07/1869_frederic_leighton_-_electra_at_the_tomb_of_agamemnon-1.jpg?w=149&#038;h=300" alt="" width="149" height="300" /></a><p class="wp-caption-text">Electra at the Tomb of Agamemnon, Frederic Leighton</p></div>
<p>I&#8217;ve started asking myself, &#8220;does this add the least amount of new code?&#8221; A system in isolation may be quite simple, but if it duplicates existing functionality, it has increased complexity. The ideal change is <em>subtractive</em>, reducing the total amount of code: by collapsing features together, removing configuration, or merging overlapping components.</p>
<p>Better to put your configuration in version control you already understand, than introduce a remote discovery server. Better to use the crufty RPC library you already have, than introduce a new one with a handy feature—unless you entirely replace the old one.</p>
<p>Beware the daughter that aspires not to the throne of her mother.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/evanweaver.wordpress.com/1996/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/evanweaver.wordpress.com/1996/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/evanweaver.wordpress.com/1996/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/evanweaver.wordpress.com/1996/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/evanweaver.wordpress.com/1996/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/evanweaver.wordpress.com/1996/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/evanweaver.wordpress.com/1996/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/evanweaver.wordpress.com/1996/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/evanweaver.wordpress.com/1996/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/evanweaver.wordpress.com/1996/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/evanweaver.wordpress.com/1996/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/evanweaver.wordpress.com/1996/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/evanweaver.wordpress.com/1996/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/evanweaver.wordpress.com/1996/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.evanweaver.com&amp;blog=18067431&amp;post=1996&amp;subd=evanweaver&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.evanweaver.com/2011/07/25/simplicity/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">evanweaver</media:title>
		</media:content>

		<media:content url="http://evanweaver.files.wordpress.com/2011/07/1869_frederic_leighton_-_electra_at_the_tomb_of_agamemnon-1.jpg?w=149" medium="image">
			<media:title type="html">Electra at the Tomb of Agamemnon</media:title>
		</media:content>
	</item>
		<item>
		<title>performance engineering at twitter</title>
		<link>http://blog.evanweaver.com/2011/04/27/performance-engineering-at-twitter/</link>
		<comments>http://blog.evanweaver.com/2011/04/27/performance-engineering-at-twitter/#comments</comments>
		<pubDate>Wed, 27 Apr 2011 19:30:30 +0000</pubDate>
		<dc:creator>evan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.evanweaver.com/?p=1919</guid>
		<description><![CDATA[A few weeks ago I gave a performance engineering talk at QCon Beijing/Tokyo. The abstract and slides are below. abstract Twitter has undergone exponential growth with very limited staff, hardware, and time. This talk discusses principles by which the wise performance engineer can make dramatic improvements in a constrained environment. Of course, these apply to [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.evanweaver.com&amp;blog=18067431&amp;post=1919&amp;subd=evanweaver&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>A few weeks ago I gave a performance engineering talk at QCon Beijing/Tokyo. The abstract and slides are below.</p>
<h2>abstract</h2>
<blockquote style="padding:0 40px;"><p>Twitter has undergone exponential growth with very limited staff, hardware, and time. This talk discusses principles by which the wise performance engineer can make dramatic improvements in a constrained environment. Of course, these apply to any systems architect who wants to do more with less. Principles will be illustrated with concrete examples of successes and lessons learned from Twitter&#8217;s development and operations history.</p></blockquote>
<h2>slides</h2>
<iframe frameborder="0" width="558" height="408" src="http://wpcomwidgets.com/?id=preziEmbed_x_cw02rviaqq&amp;name=preziEmbed_x_cw02rviaqq&amp;src=http%3A%2F%2Fprezi.com%2Fbin%2Fpreziloader.swf&amp;type=application%2Fx-shockwave-flash&amp;allowfullscreen=true&amp;allowscriptaccess=always&amp;width=550&amp;height=400&amp;bgcolor=%23ffffff&amp;flashvars=prezi_id%3Dx_cw02rviaqq%26lock_to_path%3D0%26color%3Dffffff%26autoplay%3Dno%26autohide_ctrls%3D0&amp;_tag=gigya&amp;_hash=3805d3e9cca20bc6040c1fac4ca4b70d" id="3805d3e9cca20bc6040c1fac4ca4b70d"></iframe>
<p><a title="QCon Beijing/Tokyo 2011" href="http://prezi.com/x_cw02rviaqq/performance-engineering-at-twitter/">Performance Engineering at Twitter</a> on <a href="http://prezi.com">Prezi</a></p>
<p>This is the first time I&#8217;ve used <a href="http://prezi.com">Prezi</a>; the non-linear flow is compelling.</p>
<h2>see it again sam</h2>
<p>I will be giving the same talk this fall at <a href="http://www.qconsp.com/">QCon São Paulo</a> and <a href="http://qconsf.com/">QCon San Francisco</a>, so you can catch it there, and I think eventually the video will be online. This was also my first time speaking publicly in two years. Tons of new things to share with the world!</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/evanweaver.wordpress.com/1919/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/evanweaver.wordpress.com/1919/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/evanweaver.wordpress.com/1919/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/evanweaver.wordpress.com/1919/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/evanweaver.wordpress.com/1919/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/evanweaver.wordpress.com/1919/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/evanweaver.wordpress.com/1919/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/evanweaver.wordpress.com/1919/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/evanweaver.wordpress.com/1919/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/evanweaver.wordpress.com/1919/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/evanweaver.wordpress.com/1919/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/evanweaver.wordpress.com/1919/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/evanweaver.wordpress.com/1919/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/evanweaver.wordpress.com/1919/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.evanweaver.com&amp;blog=18067431&amp;post=1919&amp;subd=evanweaver&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.evanweaver.com/2011/04/27/performance-engineering-at-twitter/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">evanweaver</media:title>
		</media:content>
	</item>
		<item>
		<title>distributed systems primer, updated</title>
		<link>http://blog.evanweaver.com/2010/08/12/distributed-systems-primer-update/</link>
		<comments>http://blog.evanweaver.com/2010/08/12/distributed-systems-primer-update/#comments</comments>
		<pubDate>Thu, 12 Aug 2010 08:00:00 +0000</pubDate>
		<dc:creator>evan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.evanweaver.com/?p=117</guid>
		<description><![CDATA[Well, it&#8217;s been a long time. But! I have five papers to add to my original distributed systems primer: coordination CRDTs: Consistency Without Concurrency Control, Mihai Letia, Nuno Preguiça, and Marc Shapiro, 2009. Guaranteeing eventual consistency by constraining your data structure, rather than adding heavyweight distributed algorithms. FlockDB works this way. partitioning The Little Engines [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.evanweaver.com&amp;blog=18067431&amp;post=117&amp;subd=evanweaver&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Well, it&#8217;s been a long time. But! I have five papers to add to my original <a href="http://blog.evanweaver.com/articles/2009/05/04/distributed-systems-primer/">distributed systems primer</a>:</p>
<h2>coordination</h2>
<p style="margin-bottom:2px;"><a href="http://pagesperso-systeme.lip6.fr/Marc.Shapiro/papers/RR-6956.pdf">CRDTs: Consistency Without Concurrency Control</a>, Mihai Letia, Nuno Preguiça, and Marc Shapiro, 2009.</p>
<p>Guaranteeing eventual consistency by constraining your data structure, rather than adding heavyweight distributed algorithms. <a href="http://github.com/twitter/flockdb">FlockDB</a> works this way.</p>
<h2>partitioning</h2>
<p style="margin-bottom:2px;"><a href="http://ccr.sigcomm.org/online/files/p375.pdf">The Little Engines That Could: Scaling Online Social Networks</a>, Josep M. Pujol, Vijay Erramilli, Georgos Siganos, Xiaoyuan Yang Nikos Laoutaris, Parminder Chhabra, and Pablo Rodriguez, 2010.</p>
<p>Optimally partitioning overlapping graphs through lazy replication. Think of applying this technique at a cluster level, not just a server level.</p>
<p style="margin-bottom:2px;"><a href="http://research.yahoo.com/files/sigmod278-silberstein.pdf">Feeding Frenzy: Selectively Materializing Users&#8217; Event Feeds</a>, Adam Silberstein, Jeff Terrace, Brian F. Cooper, and Raghu Ramakrishnan, 2010.</p>
<p>Judicious session management and application of domain knowledge allow for optimal high-velocity mailbox updates in a memory grid. Twitter&#8217;s timeline system works this way.</p>
<h2>systems integration</h2>
<p style="margin-bottom:2px;"><a href="http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/papers/dapper-2010-1.pdf">Dapper, a Large-Scale Distributed Systems Tracing Infrastructure</a>, Benjamin H. Sigelman, Luiz André Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan, and Chandan Shanbhag, 2010.</p>
<p>Add a transaction-tracking, sampling profiler to a reusable RPC framework and get full stack visibility without performance degradation.</p>
<p style="margin-bottom:2px;"><a href="http://www.percona.com/files/white-papers/forecasting-mysql-scalability.pdf">Forecasting MySQL Scalability with the Universal Scalability Law</a>, Baron Schwartz and Ewen Fortune, 2010.</p>
<p>An example of data-driven scalability modeling in a concurrent system, via a least-squares regression approach.</p>
<p>Happy scaling. Make sure to read the <a href="http://blog.evanweaver.com/articles/2009/05/04/distributed-systems-primer/">original post</a> if you haven&#8217;t.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/evanweaver.wordpress.com/117/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/evanweaver.wordpress.com/117/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/evanweaver.wordpress.com/117/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/evanweaver.wordpress.com/117/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/evanweaver.wordpress.com/117/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/evanweaver.wordpress.com/117/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/evanweaver.wordpress.com/117/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/evanweaver.wordpress.com/117/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/evanweaver.wordpress.com/117/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/evanweaver.wordpress.com/117/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/evanweaver.wordpress.com/117/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/evanweaver.wordpress.com/117/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/evanweaver.wordpress.com/117/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/evanweaver.wordpress.com/117/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.evanweaver.com&amp;blog=18067431&amp;post=117&amp;subd=evanweaver&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.evanweaver.com/2010/08/12/distributed-systems-primer-update/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">evanweaver</media:title>
		</media:content>
	</item>
		<item>
		<title>object allocations on the web</title>
		<link>http://blog.evanweaver.com/2009/10/21/object-allocations-on-the-web/</link>
		<comments>http://blog.evanweaver.com/2009/10/21/object-allocations-on-the-web/#comments</comments>
		<pubDate>Wed, 21 Oct 2009 08:00:00 +0000</pubDate>
		<dc:creator>evan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.evanweaver.com/?p=116</guid>
		<description><![CDATA[How many objects does a Rails request allocate? Here are Twitter&#8217;s numbers: API: 22,700 objects per request Website: 67,500 objects per request Daemons: 27,900 objects per action I want them to be lower. Overall, we burn 20% of our front-end CPU on garbage collection, which seems high. Each process handles ~29,000 requests before getting killed [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.evanweaver.com&amp;blog=18067431&amp;post=116&amp;subd=evanweaver&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>How many objects does a Rails request allocate? Here are Twitter&#8217;s numbers:</p>
<ul>
<li>
<b>API</b>: 22,700 objects per request</li>
<li>
<b>Website</b>: 67,500 objects per request </li>
<li>
<b>Daemons</b>: 27,900 objects per action</li>
</ul>
<p>I want them to be lower. Overall, we burn 20% of our front-end CPU on garbage collection, which seems high. Each process handles ~29,000 requests before getting killed by the memory limit, and the GC is triggered about every 30 requests.</p>
<p>In memory-managed languages, you pay a performance penalty at object allocation time and also at collection time. Since Ruby lacks a generational GC (although there are <a href="http://github.com/authorNari/patch_bag/blob/master/ruby/gc_partial_longlife_r23386.patch">patches</a> available), the collection penalty is linear with the number of objects on the heap.</p>
<h2>a note about structs and immediates</h2>
<p>In Ruby 1.8, <code>Struct</code> instances <a href="http://eigenclass.org/R2/writings/object-size-ruby-ocaml">use fewer bytes</a> and allocate less objects than  <code>Hash</code> and friends. This can be an optimization opportunity in circumstances where the <code>Struct</code> class is reusable.</p>
<p>A little bit of code shows the difference (you need <a href="http://www.rubyenterpriseedition.com/">REE</a> or Sylvain Joyeux&#8217;s <a href="http://rubyforge.org/tracker/download.php/426/1700/11497/2087/ruby-track-alloc.patch">patch</a> to track allocations):</p>
<pre>
GC.enable_stats
def sizeof(obj)
  GC.clear_stats
  obj.clone
  puts "#{GC.num_allocations} allocations"
  GC.clear_stats
  obj.clone
  puts "#{GC.allocated_size} bytes"
end
</pre>
<p>Let&#8217;s try it:</p>
<pre>
&gt;&gt; Struct.new("Test", :a, :b, :c)
&gt;&gt; struct = Struct::Test.new(1,2,3)
=&gt; #&lt;struct Struct::Test a=1, b=2, c=3&gt;
&gt;&gt; sizeof(struct)
1 allocations
24 bytes

&gt;&gt; hash = {:a =&gt; 1, :b =&gt; 2, :c =&gt; 3}
&gt;&gt; sizeof(hash)
5 allocations
208 bytes
</pre>
<p>Watch out, though. The <code>Struct</code> class itself is expensive:</p>
<pre>
&gt;&gt; sizeof(Struct::Test)
29 allocations
1216 bytes
</pre>
<p>In my understanding, each key in a <code>Hash</code> is a <code>VALUE</code> pointer to another object, while each slot in a <code>Struct</code> is merely a named position.</p>
<p>Immediate types (<code>Fixnum</code>, <code>nil</code>, <code>true</code>, <code>false</code>, and <code>Symbol</code>) don&#8217;t allocate, except for <code>Symbol</code>. <code>Symbol</code> is interned and keeps its string representations on a special heap that is not garbage-collected.</p>
<h2>your turn</h2>
<p>If you have allocation counts from a production web application, I would be delighted to know them. I am especially interested in Python, PHP, and Java.</p>
<p>Python should be about the same as Ruby. PHP, though, discards the entire heap per-request in some configurations, so collection can be  dramatically cheaper. And I would expect Java to allocate fewer objects and have a more efficient collection cycle.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/evanweaver.wordpress.com/116/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/evanweaver.wordpress.com/116/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/evanweaver.wordpress.com/116/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/evanweaver.wordpress.com/116/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/evanweaver.wordpress.com/116/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/evanweaver.wordpress.com/116/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/evanweaver.wordpress.com/116/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/evanweaver.wordpress.com/116/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/evanweaver.wordpress.com/116/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/evanweaver.wordpress.com/116/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/evanweaver.wordpress.com/116/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/evanweaver.wordpress.com/116/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/evanweaver.wordpress.com/116/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/evanweaver.wordpress.com/116/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.evanweaver.com&amp;blog=18067431&amp;post=116&amp;subd=evanweaver&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.evanweaver.com/2009/10/21/object-allocations-on-the-web/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">evanweaver</media:title>
		</media:content>
	</item>
		<item>
		<title>scribe client</title>
		<link>http://blog.evanweaver.com/2009/09/30/scribe-client-gem/</link>
		<comments>http://blog.evanweaver.com/2009/09/30/scribe-client-gem/#comments</comments>
		<pubDate>Wed, 30 Sep 2009 08:00:00 +0000</pubDate>
		<dc:creator>evan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.evanweaver.com/?p=115</guid>
		<description><![CDATA[I&#8217;ve released Scribe 0.1, a Ruby client for the Scribe remote log server. sudo gem install scribe Usage is simple: client = Scribe.new client.log("I'm lonely in a crowded room.", "Rails") Documentation is here. about scribe The primary benefit of Scribe over something like syslog-ng is increased scalability, because of Scribe&#8217;s fundamentally distributed architecture. Scribe also [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.evanweaver.com&amp;blog=18067431&amp;post=115&amp;subd=evanweaver&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve released Scribe 0.1, a Ruby client for the <a href="http://sourceforge.net/apps/mediawiki/scribeserver/index.php?title=Main_Page">Scribe</a>  remote log server.</p>
<pre>sudo gem install scribe</pre>
<p>Usage is simple:</p>
<pre>
client = Scribe.new
client.log("I'm lonely in a crowded room.", "Rails")
</pre>
<p>Documentation is <a href="http://evanweaver.files.wordpress.com/2010/12/doc/fauna/scribe/files/README.html">here</a>.</p>
<h2>about scribe</h2>
<p>The primary benefit of Scribe over something like <a href="http://www.balabit.com/network-security/syslog-ng/">syslog-ng</a> is  increased scalability, because of Scribe&#8217;s fundamentally distributed architecture. Scribe also does away with the legacy  <code>syslog</code> alert levels, and lets you define more application-appropriate categories on the fly instead.</p>
<p>Dmytro Shteflyuk has <a href="http://kpumuk.info/development/installing-and-using-scribe-with-ruby-on-mac-os/">good article</a> about installing the Scribe  server itself on OS X. It would be nice if someone would put it in MacPorts, but it may be blocked on the  release of Thrift.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/evanweaver.wordpress.com/115/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/evanweaver.wordpress.com/115/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/evanweaver.wordpress.com/115/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/evanweaver.wordpress.com/115/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/evanweaver.wordpress.com/115/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/evanweaver.wordpress.com/115/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/evanweaver.wordpress.com/115/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/evanweaver.wordpress.com/115/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/evanweaver.wordpress.com/115/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/evanweaver.wordpress.com/115/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/evanweaver.wordpress.com/115/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/evanweaver.wordpress.com/115/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/evanweaver.wordpress.com/115/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/evanweaver.wordpress.com/115/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.evanweaver.com&amp;blog=18067431&amp;post=115&amp;subd=evanweaver&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.evanweaver.com/2009/09/30/scribe-client-gem/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">evanweaver</media:title>
		</media:content>
	</item>
		<item>
		<title>ree</title>
		<link>http://blog.evanweaver.com/2009/09/24/ree/</link>
		<comments>http://blog.evanweaver.com/2009/09/24/ree/#comments</comments>
		<pubDate>Thu, 24 Sep 2009 08:00:00 +0000</pubDate>
		<dc:creator>evan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.evanweaver.com/?p=114</guid>
		<description><![CDATA[We recently migrated Twitter from a custom Ruby 1.8.6 build to a Ruby Enterprise Edition release candidate, courtesy of Phusion. Our primary motivation was the integration of Brent&#8217;s MBARI patches, which increase memory stability. Some features of REE have no effect on our codebase, but we definitely benefit from the MBARI patchset, the Railsbench tunable [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.evanweaver.com&amp;blog=18067431&amp;post=114&amp;subd=evanweaver&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>We recently migrated Twitter from a custom Ruby 1.8.6 build to a <a href="http://www.rubyenterpriseedition.com/">Ruby Enterprise Edition</a> release candidate, courtesy of <a href="http://www.phusion.nl/">Phusion</a>. Our primary motivation was the integration of Brent&#8217;s <a href="http://github.com/brentr/matzruby/tree/v1_8_7_72-mbari">MBARI patches</a>, which increase memory stability.</p>
<p>Some features of REE have no effect on our codebase, but we definitely benefit from the MBARI patchset, the Railsbench tunable GC, and the various leak fixes in 1.8.7p174. These are difficult to integrate and Phusion has done a fine job.</p>
<h2>testing notes</h2>
<p>I ran into an interesting issue. Ruby is faster if compiled with <code>-Os</code> (optimize for size) than with <code>-O2</code> or <code>-O3</code> (optimize for speed). Hongli pointed out that Ruby has poor instruction locality and benefits most from squeezing tightly into the instruction cache. This is an unusual phenomenon, although probably more common in interpreters and virtual machines than in &#8220;standard&#8221; C programs.</p>
<p>I also tested a build that included Joe Damato&#8217;s <a href="http://timetobleed.com/fixing-threads-in-ruby-18-a-2-10x-performance-boost/">heaped thread frames</a>, but it would hang Mongrel in <code>rb_thread_schedule()</code> after the first GC run, which is not exactly what we want. Hopefully this can be integrated later.</p>
<h2>benchmarks</h2>
<p>I ran a suite of benchmarks via <a href="http://www.xenoclast.org/autobench/">Autobench/httperf</a> and plotted them with <a href="http://plot.micw.eu/">Plot</a>. The hardware was a 4-core Xeon machine with RHEL5, running 8 Mongrels balanced behind Apache 2.2. I made a typical API request that is answered primarily from composed caches.</p>
<div style="margin:0;padding:0;"><img src="http://evanweaver.files.wordpress.com/2010/12/ree_benchmark.png"/></div>
<p>As usual, we see that <a href="http://blog.evanweaver.com/articles/2009/04/09/ruby-gc-tuning/">tuning the GC parameters</a> has the greatest impact on throughput, but there is a definite gain from switching to the REE bundle. It&#8217;s also interesting how much the standard deviation is improved by the GC settings. (Some data points are skipped due to errors at high concurrency.)</p>
<h2>upgrading</h2>
<p>Moving from 1.8.6 to REE 1.8.7 was trivial, but moving to 1.9 will be more of an ordeal. It will be interesting to see what patches are still necessary on 1.9. Many of them are getting upstreamed, but some things (such as <a href="http://goog-perftools.sourceforge.net/doc/tcmalloc.html">tcmalloc</a>) will probably remain only available from 3rd parties.</p>
<p>All in all, good times in MRI land.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/evanweaver.wordpress.com/114/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/evanweaver.wordpress.com/114/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/evanweaver.wordpress.com/114/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/evanweaver.wordpress.com/114/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/evanweaver.wordpress.com/114/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/evanweaver.wordpress.com/114/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/evanweaver.wordpress.com/114/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/evanweaver.wordpress.com/114/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/evanweaver.wordpress.com/114/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/evanweaver.wordpress.com/114/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/evanweaver.wordpress.com/114/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/evanweaver.wordpress.com/114/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/evanweaver.wordpress.com/114/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/evanweaver.wordpress.com/114/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.evanweaver.com&amp;blog=18067431&amp;post=114&amp;subd=evanweaver&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.evanweaver.com/2009/09/24/ree/feed/</wfw:commentRss>
		<slash:comments>24</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">evanweaver</media:title>
		</media:content>

		<media:content url="http://evanweaver.files.wordpress.com/2010/12/ree_benchmark.png" medium="image" />
	</item>
		<item>
		<title>memcached gem release</title>
		<link>http://blog.evanweaver.com/2009/08/04/memcached-gem-release/</link>
		<comments>http://blog.evanweaver.com/2009/08/04/memcached-gem-release/#comments</comments>
		<pubDate>Tue, 04 Aug 2009 08:00:00 +0000</pubDate>
		<dc:creator>evan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.evanweaver.com/?p=113</guid>
		<description><![CDATA[One of the hardest gems to install is no more. It&#8217;s now easy to install! Memcached 0.15 features: Update to libmemcached 0.31.1 Bundle libmemcached itself with the gem (antifuchs) UDP connection support Unix domain socket support (hellvinz) AUTO_EJECT_HOSTS bugfixes (mattknox) Install with gem install memcached. Since libmemcached is bundled in, there are no longer any [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.evanweaver.com&amp;blog=18067431&amp;post=113&amp;subd=evanweaver&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>One of the hardest gems to install is no more. It&#8217;s now easy to install!</p>
<p><a href="http://evanweaver.files.wordpress.com/2010/12/doc/fauna/memcached">Memcached 0.15</a> features:</p>
<ul>
<li>Update to libmemcached 0.31.1</li>
<li>Bundle libmemcached itself with the gem (<a href="http://github.com/antifuchs">antifuchs</a>)</li>
<li>UDP connection support</li>
<li>Unix domain socket support (<a href="http://github.com/hellvinz/">hellvinz</a>)</li>
<li>
<code>AUTO_EJECT_HOSTS</code> bugfixes (<a href="http://github.com/mattknox">mattknox</a>)</li>
</ul>
<p>Install with <code>gem install memcached</code>. Since <a href="http://tangent.org/552/libmemcached.html">libmemcached</a> is bundled in, there are no longer any dependencies.</p>
<h2>on coordination</h2>
<p><a href="http://github.com/antifuchs">Andreas Fuchs</a> suggested several months ago that I include libmemcached itself in the gem, but at the time I resisted. I was wrong.</p>
<p>My opposition was based on the idea that libmemcached itself would be an integration point, so running multiple versions on a system would be bad.</p>
<p>In real life, the hash algorithm became the integration point, not the library itself. And since the library&#8217;s ABI kept changing, the gem always required a very specific custom build. This annoyed the public and caused extra work for my operations team, who had to make sure to upgrade both the library and the gem at the same time.</p>
<p>Updates can come thick and fast now because I don&#8217;t have to worry about publishing custom builds or waiting for the libmemcached developers to merge my patches.</p>
<p>In retrospect it seems obvious—it&#8217;s always a win to remove coordination from a system.</p>
<h2>linker woes</h2>
<p>Unfortunately, it was easier to make that decision than it was to implement it. Linux and OS X link libraries differently, and I had a lot of trouble making sure that no system-installed version of libmemcached would get linked, instead of the custom one built during <code>gem install</code>.</p>
<p>When you link a shared object, OS X seems to maintain a reference to the original <code>.dylib</code>. Linux does not, and depends on ldconfig and <code>LD_LIBRARY_PRELOAD</code> to find the object at runtime. Since you can&#8217;t modify the shell environment from within a running process, there&#8217;s no way to override <code>LD_LIBRARY_PRELOAD</code>, so I needed to statically link libmemcached into the gem&#8217;s own <code>.so</code> or <code>.bundle</code>.</p>
<p>The only way I could do this on both systems was to configure libmemcached with <code>CFLAGS=-fPIC --disable-shared</code>, rename the <code>libemcached.*</code> static object files to <code>libemcached_gem.*</code>, and pass <code>-lmemcached_gem</code> to the linker rather than <code>-lmemcached</code>. Otherwise the linker would prefer the system-installed dynamic objects, even with the correct paths and <code>-static</code> option set.</p>
<p>Note that you can check what objects a binary has linked to via <code>otool -F</code> on OS X, and <code>ldd</code> on Linux.</p>
<p>Feel free to look at the <a href="http://github.com/fauna/memcached/blob/6f2c517df97a5a6c871802d93ef56fdb4358c22c/ext/extconf.rb">extconf.rb source</a> and let me know if there&#8217;s a better way to do this.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/evanweaver.wordpress.com/113/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/evanweaver.wordpress.com/113/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/evanweaver.wordpress.com/113/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/evanweaver.wordpress.com/113/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/evanweaver.wordpress.com/113/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/evanweaver.wordpress.com/113/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/evanweaver.wordpress.com/113/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/evanweaver.wordpress.com/113/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/evanweaver.wordpress.com/113/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/evanweaver.wordpress.com/113/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/evanweaver.wordpress.com/113/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/evanweaver.wordpress.com/113/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/evanweaver.wordpress.com/113/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/evanweaver.wordpress.com/113/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.evanweaver.com&amp;blog=18067431&amp;post=113&amp;subd=evanweaver&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.evanweaver.com/2009/08/04/memcached-gem-release/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">evanweaver</media:title>
		</media:content>
	</item>
		<item>
		<title>up and running with cassandra</title>
		<link>http://blog.evanweaver.com/2009/07/06/up-and-running-with-cassandra/</link>
		<comments>http://blog.evanweaver.com/2009/07/06/up-and-running-with-cassandra/#comments</comments>
		<pubDate>Mon, 06 Jul 2009 08:00:00 +0000</pubDate>
		<dc:creator>evan</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://blog.evanweaver.com/?p=112</guid>
		<description><![CDATA[Cassandra is a hybrid non-relational database in the same class as Google&#8217;s BigTable. It is more featureful than a key/value store like Riak, but supports fewer query types than a document store like MongoDB. Cassandra was started by Facebook and later transferred to the open-source community. It is an ideal runtime database for web-scale domains [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.evanweaver.com&amp;blog=18067431&amp;post=112&amp;subd=evanweaver&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://cassandra.apache.org/">Cassandra</a> is a hybrid non-relational database in the same class as Google&#8217;s BigTable. It is more featureful than a key/value store like <a href="http://www.basho.com/Riak.html">Riak</a>, but supports fewer query types than a document store like <a href="http://www.mongodb.org">MongoDB</a>.</p>
<p>Cassandra was started by Facebook and later transferred to the open-source community. It is an ideal runtime database for web-scale domains like social networks.</p>
<p>This post is both a tutorial and a &#8220;getting started&#8221; overview. You will learn about Cassandra&#8217;s features, data model, API, and operational requirements—everything you need to know to deploy a Cassandra-backed service.</p>
<p><strong>April 28, 2011</strong>: post updated for Cassandra gem 0.10 and Cassandra version 0.7.</p>
<h2>features</h2>
<p>There are a number of reasons to choose Cassandra for your website. Compared to other databases, three big features stand out:</p>
<ul>
<li><strong>Flexible schema</strong>: with Cassandra, like a document store, you don&#8217;t have to decide what fields you need in your records ahead of time. You can add and remove arbitrary fields on the fly. This is an incredible productivity boost, especially in large deployments.</li>
<li><strong>True scalability</strong>: Cassandra scales horizontally in the purest sense. To add more capacity to a cluster, turn on another machine. You don&#8217;t have restart any processes, change your application queries, or manually relocate any data.</li>
<li><strong>Multi-datacenter awareness</strong>: you can adjust your node layout to ensure that if one datacenter burns in a fire, an alternative datacenter will have at least one full copy of every record.</li>
</ul>
<p>Some other features that help put Cassandra above the competition :</p>
<ul>
<li><strong>Range queries</strong>: unlike most key/value stores, you can query for ordered ranges of keys.</li>
<li><strong>List datastructures</strong>: super columns add a 5th dimension to the hybrid model, turning columns into lists. This is very handy for things like per-user indexes.</li>
<li><strong>Distributed writes</strong>: you can read and write any data to anywhere in the cluster at any time. There is never any single point of failure.</li>
</ul>
<h2>installation</h2>
<p>You need a Unix system. If you are using Mac OS 10.5, all you need is Git. Otherwise, you need to install Java 1.6, Git 1.6, Ruby, and Rubygems in some reasonable way.</p>
<p>Start a terminal and run:</p>
<pre>sudo gem install cassandra</pre>
<p>If you are using Mac OS, you need to export the following environment variables:</p>
<pre>export JAVA_HOME="/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home"
export PATH="/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/bin:$PATH"</pre>
<p>Now you can build and start a test server with <code>cassandra_helper</code>:</p>
<pre>cassandra_helper cassandra</pre>
<p>It runs! In another terminal run:</p>
<pre>cassandra_helper data:load</pre>
<p>Now your schema is loaded too.</p>
<h2>live demo</h2>
<p>To insert some data and make some queries, open another terminal window and start <code>irb</code>, the Ruby shell:</p>
<pre>irb</pre>
<p>In the <code>irb</code> prompt, require the Ruby client library:</p>
<pre>require 'rubygems'
require 'cassandra'
include SimpleUUID</pre>
<p>Now instantiate a client object:</p>
<pre>twitter = Cassandra.new('Twitter')</pre>
<p>Let&#8217;s insert a few things:</p>
<pre>user = {'screen_name' =&gt; 'buttonscat'}
twitter.insert(:Users, '5', user)

tweet1 = {'text' =&gt; 'Nom nom nom nom nom.', 'user_id' =&gt; '5'}
twitter.insert(:Statuses, '1', tweet1)

tweet2 = {'text' =&gt; '@evan Zzzz....', 'user_id' =&gt; '5', 'reply_to_id' =&gt; '8'}
twitter.insert(:Statuses, '2', tweet2)</pre>
<p>Notice that the two status records do not have all the same columns. Let&#8217;s go ahead and connect them to our user record:</p>
<pre>twitter.insert(:UserRelationships, '5', {'user_timeline' =&gt; {UUID.new =&gt; '1'}})
twitter.insert(:UserRelationships, '5', {'user_timeline' =&gt; {UUID.new =&gt; '2'}})</pre>
<p>The <code>UUID.new</code> call creates a collation key based on the current time; our tweet ids are stored in the values.</p>
<p>Now we can query our user&#8217;s tweets:</p>
<pre>timeline = twitter.get(:UserRelationships, '5', 'user_timeline', :reversed =&gt; true)
timeline.map { |time, id| twitter.get(:Statuses, id, 'text') }
# =&gt; ["@evan Zzzz....", "Nom nom nom nom nom."]</pre>
<p>Two tweet bodies, returned in recency order—not bad at all. In a similar fashion, each time a user tweets, we could loop through their followers and insert the status key into their follower&#8217;s <code>home_timeline</code> relationship, for handling general status delivery.</p>
<h2>the data model</h2>
<p>Cassandra is best thought of as a 4 or 5 dimensional hash. The usual way to refer to a piece of data is as follows: a <strong>keyspace</strong>, a <strong>column family</strong>, a <strong>key</strong>, an <em>optional</em> <strong>super column</strong>, and a <strong>column</strong>. At the end of that chain lies a single, lonely value.</p>
<p>Let&#8217;s break down what these layers mean.</p>
<ul>
<li><strong>Keyspace</strong> (also confusingly called &#8220;table&#8221;): the outer-most level of organization. This is usually the name of the application. For example, <code>'Twitter'</code> and <code>'Wordpress'</code> are both good keyspaces. Keyspaces must be defined at startup in the <code>storage-conf.xml</code> file.</li>
<li><strong>Column family</strong>: a slice of data corresponding to a particular key. Each column family is stored in a separate file on disk, so it can be useful to put frequently accessed data in one column family, and rarely accessed data in another. Some good column family names might be <code>:Posts</code>, <code>:Users</code> and <code>:UserAudits</code>. Column families must be defined at startup.</li>
<li><strong>Key</strong>: the permanent name of the record. You can query over ranges of keys in a column family, like <code>:start =&gt; '10050', :finish =&gt; '10070'</code>—this is the only index Cassandra provides for free. Keys are defined on the fly.</li>
</ul>
<p>After the column family level, the organization can diverge—this is a feature unique to Cassandra. You can choose either:</p>
<ul>
<li>A <strong>column</strong>: this is a tuple with a name and a value. Good columns might be <code>'screen_name' =&gt; 'lisa4718'</code> or <code>'Google' =&gt; 'http://google.com'</code>.It is common to not specify a particular column name when requesting a key; the response will then be an ordered hash of all columns. For example, querying for <code>(:Users, '174927')</code>might return:
<pre>{'name' =&gt; 'Lisa Jones',
 'gender' =&gt; 'f',
 'screen_name' =&gt; 'lisa4718'}</pre>
<p>In this case, <code>name</code>, <code>gender</code>, and <code>screen_name</code> are all column names. Columns are defined on the fly, and different records can have different sets of column names, even in the same keyspace and column family. This lets you use the column name itself as either <strong>structure</strong> or <strong>data</strong>. Columns can be stored in recency order, or alphabetical by name, and all columns keep a timestamp.</li>
<li>A <strong>super column</strong>: this is a named list. It contains standard columns, stored in recency order.Say Lisa Jones has bookmarks in several categories. Querying <code>(:UserBookmarks, '174927')</code>might return:
<pre>{'work' =&gt; {
    'Google' =&gt; 'http://google.com',
    'IBM' =&gt; 'http://ibm.com'},
 'todo': {...},
 'cooking': {...}}</pre>
<p>Here, <code>work</code>, <code>todo</code>, and <code>cooking</code> are all super column names. They are defined on the fly, and there can be any number of them per row. <code>:UserBookmarks</code> is the name of the <strong>super column family</strong>. Super columns are stored in alphabetical order, with their sub columns physically adjacent on the disk.</li>
</ul>
<p>Super columns and standard columns cannot be mixed at the same (4th) level of dimensionality. You must define at startup which column families contain standard columns, and which contain super columns with standard columns inside them.</p>
<p>Super columns are a great way to store one-to-many indexes to other records: make the sub column names TimeUUIDs (or whatever you&#8217;d like to use to sort the index), and have the values be the foreign key. We saw an example of this strategy in the demo, above.</p>
<p>If this is confusing, don&#8217;t worry. We&#8217;ll now look at two example schemas in depth.</p>
<h2>twitter schema</h2>
<p>Here is the schema definition we used for the demo, above. It is based on Eric Florenzano&#8217;s <a href="http://github.com/ericflo/twissandra/tree/master">Twissandra</a>, but updated for 0.7:</p>
<pre>{"Twitter":{
    "Users":{
      "comparator_type":"org.apache.cassandra.db.marshal.UTF8Type",
      "column_type":"Standard"},
    "Statuses":{
      "comparator_type":"org.apache.cassandra.db.marshal.UTF8Type",
      "column_type":"Standard"},
    "StatusRelationships":{
      "subcomparator_type":"org.apache.cassandra.db.marshal.TimeUUIDType",
      "comparator_type":"org.apache.cassandra.db.marshal.UTF8Type",
      "column_type":"Super"},
}}</pre>
<p>You can load a schema with this command (replace <code>schema.json</code> with your own filename):</p>
<pre>bin/cassandra-cli --host localhost --batch &lt; schema.json</pre>
<p>The server must be running; as of version 0.7, Cassandra supports updating the schema at runtime.</p>
<p>What could be in <code>StatusRelationships</code>? Maybe a list of users who favorited the tweet? Having a super column family for both record types lets us index each direction of whatever many-to-many relationships we come up with.</p>
<p>Here&#8217;s how the data is organized:</p>
<div style="text-align:center;margin-bottom:10px;"><a href="http://evanweaver.files.wordpress.com/2010/12/twitter.jpg"><img src="http://evanweaver.files.wordpress.com/2010/12/twitter_small.jpg" alt="Click to enlarge" /></a></div>
<p>Cassandra lets you distribute the keys across the cluster either randomly, or in order, via the <code>Partitioner</code> option in the <code>storage-conf.xml</code> file.</p>
<p>For the Twitter application, if we were using the order-preserving partitioner, all recent statuses would be stored on the same node. This would cause hotspots. Instead, we should use the random partitioner.</p>
<p>Alternatively, we could preface the status keys with the user key, which has less temporal locality. If we used <code>user_id:status_id</code> as the status key, we could do range queries on the user fragment to get tweets-by-user, avoiding the need for a <code>user_timeline</code> super column.</p>
<h2>multi-blog schema</h2>
<p>Here&#8217;s a another schema, suggested to me by <a href="http://spyced.blogspot.com/">Jonathan Ellis</a>, the primary Cassandra maintainer. It&#8217;s for a multi-tenancy blog platform:</p>
<pre>{"Multiblog":{
    "Blogs":{
      "comparator_type":"org.apache.cassandra.db.marshal.TimeUUIDType",
      "column_type":"Standard"},
    "Comments":{
      "comparator_type":"org.apache.cassandra.db.marshal.TimeUUIDType",
      "column_type":"Standard"}
  },}
<span class="Apple-style-span" style="font-family:Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif;line-height:19px;white-space:normal;font-size:13px;">Imagine we have a blog named 'The Cutest Kittens'. We will insert a row when the first post is made as follows:</span>
require 'rubygems'
require 'cassandra/0.7'
include SimpleUUID

multiblog = Cassandra.new('Multiblog')

multiblog.insert(:Blogs, 'The Cutest Kittens',
  { UUID.new =&gt;
    '{"title":"Say Hello to Buttons Cat","body":"Buttons is a cute cat."}' })</pre>
<p><code>UUID.new</code> generates a unique, sortable column name, and the JSON hash contains the post details. Let&#8217;s insert another:</p>
<pre>multiblog.insert(:Blogs, 'The Cutest Kittens',
  { UUID.new =&gt;
    '{"title":"Introducing Commie Cat","body":"Commie is also a cute cat"}' })</pre>
<p>Now we can find the latest post with the following query:</p>
<pre>post = multiblog.get(:Blogs, 'The Cutest Kittens', :reversed =&gt; true).to_a.first</pre>
<p>On our website, we can build links based on the readable representation of the UUID:</p>
<pre>guid = post.first.to_guid
# =&gt; "b06e80b0-8c61-11de-8287-c1fa647fd821"</pre>
<p>If the user clicks this string in a permalink, our app can find the post directly via:</p>
<pre>multiblog.get(:Blogs, 'The Cutest Kittens', :start =&gt; UUID.new(guid), :count =&gt; 1)</pre>
<p>For comments, we&#8217;ll use the post UUID as the outermost key:</p>
<pre>multiblog.insert(:Comments, guid,
  {UUID.new =&gt; 'I like this cat. - Evan'})
multiblog.insert(:Comments, guid,
  {UUID.new =&gt; 'I am cuter. - Buttons'})</pre>
<p>Now we can get all comments (oldest first) for a post by calling:</p>
<pre>multiblog.get(:Comments, guid)</pre>
<p>We could paginate them by passing <code>:start</code> with a UUID. See <a href="http://www.slideshare.net/Eweaver/efficient-pagination-using-mysql">this presentation</a> to learn more about token-based pagination.</p>
<p>We have sidestepped two problems with this data model: we don&#8217;t have to maintain separate indexes for any lookups, and the posts and comments are stored in separate files, where they don&#8217;t cause as much write contention. Note that we didn&#8217;t need to use any super columns, either.</p>
<h2>storage layout and api comparison</h2>
<p>The storage strategy for Cassandra&#8217;s standard model is the same as BigTable&#8217;s. Here&#8217;s a comparison chart:</p>
<table class="tt">
<tbody>
<tr>
<th></th>
<th colspan="2">multi-file</th>
<th>per-file</th>
<th class="tt_right" colspan="4">intra-file</th>
</tr>
<tr>
<th>Relational</th>
<td>server</td>
<td>database</td>
<td>table*</td>
<td>primary key</td>
<td>column value</td>
<td></td>
<td class="tt_right"></td>
</tr>
<tr>
<th>BigTable</th>
<td>cluster</td>
<td>table</td>
<td>column family</td>
<td>key</td>
<td>column name</td>
<td>column value</td>
<td class="tt_right"></td>
</tr>
<tr>
<th>Cassandra, standard model</th>
<td>cluster</td>
<td>keyspace</td>
<td>column family</td>
<td>key</td>
<td>column name</td>
<td>column value</td>
<td class="tt_right"></td>
</tr>
<tr>
<th class="tt_footer">Cassandra, super column model</th>
<td class="tt_footer">cluster</td>
<td class="tt_footer">keyspace</td>
<td class="tt_footer">column family</td>
<td class="tt_footer">key</td>
<td class="tt_footer">super column name</td>
<td class="tt_footer">column name</td>
<td class="tt_footer tt_right">column value</td>
</tr>
</tbody>
</table>
<p style="font-size:12px;text-align:right;margin-top:0;">* With fixed column names.</p>
<p>Column families are stored in <strong>column-major</strong> order, which is why people call BigTable a column-oriented database. This is not the same as a column-oriented OLAP database like Sybase IQ—it depends on whether your data model considers keys to span column families or not.</p>
<div style="text-align:center;"><a href="http://evanweaver.files.wordpress.com/2010/12/row_oriented.jpg"><img src="http://evanweaver.files.wordpress.com/2010/12/row_oriented_small.jpg" alt="Click to enlarge" /></a></div>
<p>In row-orientation, the column names are the <strong>structure</strong>, and you think of the column families as <strong>containing keys</strong>. This is the convention in relational databases.</p>
<div style="text-align:center;"><a href="http://evanweaver.files.wordpress.com/2010/12/column_oriented.jpg"><img src="http://evanweaver.files.wordpress.com/2010/12/column_oriented_small.jpg" alt="Click to enlarge" /></a></div>
<p>In column-orientation, the column names are the <strong>data</strong>, and the column families are the structure. You think of the key as <strong>containing the column family</strong>, which is the convention in BigTable. (In Cassandra, super columns are also stored in column-major order—all the sub columns are together.)</p>
<p>In Cassandra&#8217;s Ruby API, parameters are expressed in storage order, for clarity:</p>
<table class="tt">
<tbody>
<tr>
<th>Relational</th>
<td class="tt_right"><code>SELECT `column` FROM `database`.`table` WHERE `id` = key;</code></td>
</tr>
<tr>
<th>BigTable</th>
<td class="tt_right"><code>table.get(key, "column_family:column")</code></td>
</tr>
<tr>
<th>Cassandra: standard model</th>
<td class="tt_right"><code>keyspace.get("column_family", key, "column")</code></td>
</tr>
<tr>
<th class="tt_footer">Cassandra: super column model</th>
<td class="tt_footer tt_right"><code>keyspace.get("column_family", key, "super_column", "column")</code></td>
</tr>
</tbody>
</table>
<p>Note that Cassandra&#8217;s internal Thrift interface mimics BigTable in some ways, but this is being changed.</p>
<h2>going to production</h2>
<p>Cassandra is an alpha product and could, theoretically, lose your data. In particular, if you change the schema specified in the <code>storage-conf.xml</code> file, you must follow <a href="https://issues.apache.org/jira/browse/CASSANDRA-44">these instructions</a> carefully, or corruption will occur (this is going to be fixed). Also, the on-disk storage format is subject to change, making upgrading a bit difficult.</p>
<p>The biggest deployment is at Facebook, where hundreds of terabytes of token indexes are kept in about a hundred Cassandra nodes. However, their use case allows the data to be rebuilt if something goes wrong. Proceed carefully, keep a backup in an <a href="http://mashable.com/2009/01/30/magnolia-data-loss/">unrelated storage engine</a>&#8230;and submit patches if things go wrong. (Some other production deployments are listed <a href="http://www.dbms2.com/2010/07/06/riptano-and-cassandra-adoption/">here</a>.)</p>
<p>That aside, here is a guide for deploying a production cluster:</p>
<ul>
<li><strong>Hardware</strong>: get a handful of commodity Linux servers. 16GB memory is good; Cassandra likes a big filesystem buffer. You don&#8217;t need RAID. If you put the commitlog file and the data files on separate physical disks, things will go faster. Don&#8217;t use EC2 or friends without being aware that the virtualized I/O can be slow, especially on the small instances.</li>
<li><strong>Configuration</strong>: in the <code>storage-conf.xml</code> schema file, set the replication factor to 3. List the IP address of one of the nodes as the seed. Set the listen address to the empty string, so the hosts will resolve their own IPs. Now, adjust the contents of <code>cassandra.in.sh</code> for your various paths and JVM options—for a 16GB node, set the JVM heap to 4GB.</li>
<li><strong>Deployment</strong>: build a package of Cassandra itself and your configuration files, and deliver it to all your servers (I use <a href="http://en.wikipedia.org/wiki/Capistrano">Capistrano</a> for this). Start the servers by setting <code>CASSANDRA_INCLUDE</code> in the environment to point to your <code>cassandra.in.sh</code> file, and run <code>bin/cassandra</code>. At this point, you should see join notices in the Cassandra logs:
<pre>Cassandra starting up...
Node 10.224.17.13:7001 has now joined.
Node 10.224.17.14:7001 has now joined.</pre>
<p>Congratulations! You have a cluster. Don&#8217;t forget to turn off debug logging in the <code>log4j.properties</code> file.</li>
<li><strong>Visibility</strong>: you can get a little more information about your cluster via the tool <code>bin/nodetool</code> included:
<pre>$ bin/nodetool --host 10.224.17.13 ring
Token(124007023942663924846758258675932114665)  3 10.224.17.13  |&lt;--|
Token(106858063638814585506848525974047690568)  3 10.224.17.19  |   ^
Token(141130545721235451315477340120224986045)  3 10.224.17.14  |--&gt;|</pre>
<p>Cassandra also exposes various statistics over <a href="http://en.wikipedia.org/wiki/Java_Management_Extensions">JMX</a>.</li>
</ul>
<p>Note that your client machines (not servers!) must have accurate clocks for Cassandra to resolve write conflicts properly. Use <a href="http://en.wikipedia.org/wiki/Network_Time_Protocol">NTP</a>.</p>
<h2>conclusion</h2>
<p>There is a misperception that if someone advocates a non-relational database, they either don&#8217;t understand SQL optimization, or they are generally a hater. This is not the case.</p>
<p>It is reasonable to seek a new tool for a new problem, and database problems have changed with the rise of web-scale distributed systems. This does not mean that SQL as a general-purpose runtime and reporting tool is going away. However, at web-scale, it is more flexible to separate the concerns. Runtime object lookups can be handled by a low-latency, strict, self-managed system like Cassandra. Asynchronous analytics and reporting can be handled by a high-latency, flexible, un-managed system like <a href="http://hadoop.apache.org/core/">Hadoop</a>. And in neither case does SQL lend itself to sharding.</p>
<p>I think that Cassandra is the most promising current implementation of a distributed OLTP database, but much work remains to be done. </p>
<p>Cassandra has excellent performance. There some benchmark results for version 0.5 at the end of the <a href="http://www.brianfrankcooper.net/pubs/ycsb-v4.pdf"> Yahoo performance study</a>.</p>
<h2>further resources</h2>
<ul>
<li><a href="http://wiki.apache.org/cassandra/">Cassandra wiki</a></li>
<li>Presentation by Avinash Lakshman about Cassandra: <a href="http://www.slideshare.net/Eweaver/cassandra-presentation-at-nosql">slides</a>, <a href="http://vimeo.com/5185526">video</a></li>
<li>The <a href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/">cassandra-user</a> and <a href="http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/">cassandra-dev</a> mailing lists</li>
<li>The #cassandra IRC channel on <a href="irc://irc.freenode.net/cassandra">irc.freenode.net</a></li>
<li>Cassandra&#8217;s <a href="http://issues.apache.org/jira/browse/CASSANDRA">bug tracker</a></li>
<li>Twitter&#8217;s Ruby client: <a href="http://evanweaver.files.wordpress.com/2010/12/doc/fauna/cassandra_client">docs</a>, <a href="http://github.com/fauna/cassandra_client/">source</a></li>
</ul>
<p>At this point, there are many better resources around the web than above. Check the official <a href="http://cassandra.apache.org/">Cassandra website</a>.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/evanweaver.wordpress.com/112/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/evanweaver.wordpress.com/112/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/evanweaver.wordpress.com/112/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/evanweaver.wordpress.com/112/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/evanweaver.wordpress.com/112/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/evanweaver.wordpress.com/112/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/evanweaver.wordpress.com/112/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/evanweaver.wordpress.com/112/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/evanweaver.wordpress.com/112/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/evanweaver.wordpress.com/112/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/evanweaver.wordpress.com/112/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/evanweaver.wordpress.com/112/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/evanweaver.wordpress.com/112/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/evanweaver.wordpress.com/112/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=blog.evanweaver.com&amp;blog=18067431&amp;post=112&amp;subd=evanweaver&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://blog.evanweaver.com/2009/07/06/up-and-running-with-cassandra/feed/</wfw:commentRss>
		<slash:comments>26</slash:comments>
	
		<media:content url="" medium="image">
			<media:title type="html">evanweaver</media:title>
		</media:content>

		<media:content url="http://evanweaver.files.wordpress.com/2010/12/twitter_small.jpg" medium="image">
			<media:title type="html">Click to enlarge</media:title>
		</media:content>

		<media:content url="http://evanweaver.files.wordpress.com/2010/12/row_oriented_small.jpg" medium="image">
			<media:title type="html">Click to enlarge</media:title>
		</media:content>

		<media:content url="http://evanweaver.files.wordpress.com/2010/12/column_oriented_small.jpg" medium="image">
			<media:title type="html">Click to enlarge</media:title>
		</media:content>
	</item>
	</channel>
</rss>
