mcable gaming edition review

I got a Nintendo Switch and was very frustrated with the graphics quality. Aliasing and poor texture filtering abound in Zelda BOTW and Mario Kart 8. I think this is primarily an issue with the games, not a limitation of the Tegra X1 platform, but we need to fix it all the same.

bring in the cables

This led me to purchase a strange device I had wondered about for some time, the Marseilles Networks mCable Gaming Edition. The mCable is a video processor similar to a Darbee, but in a very unusual form factor: a “smart” HDMI cable powered by USB.
mCable_Gaming__46472.1501756796The cable includes an ASIC that applies anti-aliasing at pixel edge boundaries, and slightly modifies the color balance. There may also be some softening filters to reduce posterization.

1080p 60fps sources are post-processed but remain 1080p. The mcable does upscale 60fps sources at resolutions below 1080p to 1080p, and apparently will upscale 30fps or lower sources to UHD. UHD/4K sources are passed through with no processing. There is no configuration ability whatsoever.

zelda botw test

I captured a scene in Kakariko Village (direct HDMI capture with an AVerMedia LGP 2) to compare:

Regular - BOTW - 1

Regular - BOTW - 1 - Zoom


MCable - BOTW - 1

MCable - BOTW - 1 - Zoom.png


mario kart 8 test

The mCable also upscales 720p input to 1080p. If you happen to have a source that renders to a 720p framebuffer, the mCable’s scaler may be an improvement over the source’s or TV’s scaler. BOTW, though, renders to an internal 900p framebuffer, so setting the Switch HDMI output to 720p is strictly worse than 1080p, even with the mCable.

Mario Kart 8 appears to be also rendering above 720p internally. Here’s a similar set of video captures, including 720p input:

Regular - Mario Kart 8 - 2 - 720p

Regular - Mario Kart 8 - 2 - 1080p

MCable - Mario Kart 8 - 2 - 720p

MCable - Mario Kart 8 - 2 - 1080p


As you can see (especially in the zoomed-in Zelda shots and the Mario Kart 1080p video), the mCable does a surprisingly good job given the limited input information (no geometry, no supersampling, no temporality). To me it looks better than the typical FXAA implementation.

The effect is even more pronounced on the Xbox 360 than the Switch, and I imagine the PS3 as well. The effect is less…effective…on softer and more photo-realistic images, and it can’t do anything at all to UHD sources. The most noticeable defect introduced is some pixel-width ringing around menu text and other extremely high-contrast lines.

I measured input lag with a Leo Bodnar device and did not see any material difference over an unprocessed connection.


I like the mCable. It makes games with missing AA look much better on big screens, increasing the apparent detail and reducing annoying shimmer and crawling lines.  I don’t like the fragile cable form factor or the lack of configurability compared to a traditional video processor. Nevertheless I bought two, one for the Switch and one for an Xbox in a different room my son uses to play Kinect.

I haven’t tried the mCable with movies (it switches to a different processing configuration on 24fps sources) or the PS4 Pro. I don’t really see how the mCable processing could improve on the native upconversion in my projector for Blu-ray, or in the PS4 Pro for games, but maybe someday I will test it.

If you don’t mind the price, and you have a big TV or projector, buy an mCable and your Switch games really will look better.

standalone sinatra jar with jruby

For Fauna, we needed to migrate the website (a Sinatra app) from Heroku to our own servers in AWS us-west-2. I wanted to get off MRI and bundle it up as a JAR as part of this process.

dad, why

We already deploy the core database as a monolithic JAR, so it made sense to deploy the website as a monolithic JAR as well. This conforms to our constraint-driven development methodology.

We also wanted to avoid having to set up a J2EE webserver to host a single WAR, but rather stick to a self-contained JVM per app. It turned out to be within the realm of straightforward.

the jar

Building a JAR from a Rack app requires three things:

  1. Put your code into /lib so Warbler adds it to the $LOAD_PATH.
  2. Rename your rackup so Warbler can’t find it and builds a JAR instead of a WAR. I put mine at /config/
  3. Copy the startup script from your webserver into /bin. (I wanted to use Thick, since we already use Netty, but ultimately Puma worked best.)

Now you can run jruby -S warbler and get a dependency-free JAR with a bundled app, webserver, and copy of JRuby.

the deployment

Deployment is a little strange because Warbler/JRuby do not expose the JAR contents as a true filesystem.

Basic file operations like and require will work, but Rack::File does not because it relies on send_file. Neither does OpenSSL certificate loading. I tried various workarounds but ended up having the deployment script (Ansible) unzip the public folder from the JAR and managed Sinatra’s :public_folder with an environment variable.

You may think your local JAR is working fine, but move it into an empty directory and try it from there before you deploy. Your app may be picking up the local filesystem and not the JAR for various non-Ruby dependencies. The rackup file suffered from this issue also and needed to be unpacked.

your turn

I put a stub app on Github so you can try it out. JAR away and forget about the Linux/rbenv/nginx/passenger flavor of the week. It doesn’t affect you!


JARs are nice. JRuby needs to implement a lower-level file interface for the JAR contents, though. I’d love to see JAR support robustly embedded in JRuby itself.

a programmer’s guide to healing RSI

I am not a doctor. This blog is not intended to substitute for professional medical advice. See your general practitioner to discuss your symptoms and treatment, as well as an orthopedic specialist and a licensed physical therapist if you are able, before making any changes that may impact your health.

I have struggled on two separate occasions with ongoing and debilitating repetitive stress injury in both my hands. For me, healing my RSI required solving a relatively complex interaction of ergonomic, physiological and psychological problems.

My understanding is that this combination of factors is very common, in varying degrees. Maybe you have just one cause. Or maybe two, or maybe three (like me). This makes the process of getting well much more mysterious and stressful than it otherwise would be.

RSI is not a specific injury, but rather a chronic pain syndrome. Once the syndrome is in place, there may no longer be a single sustaining cause. All factors of the syndrome must be addressed concurrently or the pain can persist. Trying solutions serially, looking for a magic bullet–an app, a fancy keyboard, a specific stretch, an exotic medical intervention–will rarely get results.

My hope is that by outlining the process I went through, other people can be empowered to find their own cures.

before we begin

There is a distinct lack of practical or comprehensive information on RSI treatment available online. RSI’s multiple-cause, multiple-symptom characteristics, common to many pain syndromes, explain why it can so difficult to treat–and also why you find so much fear-mongering, contradictory advice, and implausibly specific “miracle cures” floating around. Were this not the case, I would never consider offering non-professional medical advice.

That aside, listen to your body! (Not necessarily normal behavior for a programmer.) Even if you feel fine, adopting a more deliberate attitude to your work environment and physical health will help keep things that way as you age.

Don’t exacerbate your RSI by trying to power through. If you are taking painkillers frequently or are avoiding everyday tasks because of hand pain, you need to do something about it. By acting early you can avoid lost work, painful or ineffective surgery, and lifelong stress and discomfort.

Your health is important–don’t fuck around. But don’t be afraid, either. Your body is not a delicate flower that once damaged cannot be repaired. Athletes, for example, recover from far worse injuries all the time. You just have to be practical and put in the work.

a little history

In the beginning of 2008 I began to have some minor ergonomic issues. At the time I was working over 10 hours a day, 6 days a week. I was already using a Microsoft Natural Keyboard, an Aeron chair, and a Wacom pen tablet as a mouse, because they were the most comfortable (more on device choice later). The Wacom tablet was positioned in my lap with a keyboard tray. This encouraged me to lean on my mouse hand to support my upper body, and I developed a painful tingling sensation on the outer side of my hand. To mitigate this I switched to working primarily on my laptop at the kitchen counter, and worked fewer hours.

The past.

This solved the first problem, but about 6 months later I started to experience tingling, numbness, and pain in my outer two fingers on both hands–in hindsight, a clear case of ulnar nerve entrapment. Switching back to the desktop setup did not help. As the problem got worse, my ability to work (as well as enjoy playing music and videogames) was severely impacted. I saw an orthopedic specialist briefly, and made some self-directed lifestyle and ergonomic changes that mitigated the pain, but remained at risk.

At the end of 2009, the pain returned. My company had moved offices, and months went by before I got an ergonomic setup back. Instead, I worked on my laptop again. In addition my stress was elevated by recently increased responsibility, a.k.a. a promotion.

By January 2010 the problem was so severe that I was unable to comfortably use a phone, play a video game, or mouse or type any sustained amount. Again, shooting pains, tingling, and numbness in my outer two fingers (pinky and ring) predominated, along with some shoulder, elbow, and neck tension and pain. Scared that my career and hobbies were permanently in jeopardy, I filed for worker’s compensation with the company, saw a workplace ergonomic specialist and a physical therapist (both of whom were helpful), and set out to find a permanent fix for the problem.

Rather than bore you with my search for solutions, I am simply going to describe what worked.

fixing your environment

The first thing you need to focus on is fixing the physical ergonomics of your workspace. The keyboard, mouse, and screen are frustrating and archaic devices, but there is a lot we can do.

The general principle is to keep your body aligned according to its natural and comfortable angles. Your back should be straight, feet flat on the ground, knees at or below your waist, your eyes looking straight ahead at the top of the screen, your shoulders low and relaxed, your elbows at a loose right angle, your hands at a 45° to the horizontal plane, fingers curved, and your wrists straight in all axes (for both keyboard and mouse–pivot laterally with your shoulder and elbow, not your wrist).

Your body should feel natural and at rest.

Note that all of these positions are physically impossible when using a laptop. Instead, your back is curved in, your wrists bend outwards and up, your hands are flat, your eyes look down, and your shoulders are hunched up. Totally horrendous.

There is lots of equipment you can buy to help your situation. Your employer has a moral (and usually legal) obligation to buy reasonable equipment for you, even if you work from home, so don’t be shy. If you are on worker’s comp you may have to get it prescribed first which is a big hassle.

Here’s what I like:

The space age.

  • The OfficeMaster chair (YS88) is fully adjustable in every direction and not crazy-expensive. Much better than Aerons.
  • The desk height is important because it controls the angle of your forearms. You can use any old adjustable desk as long at it doesn’t have any hardware in the way of your knees. However, a motorized standing desk is nice because it lets you vary your position throughout the day. (I have the Ergo Depot AD117.)
  • The Kinesis Advantage is a great keyboard. It lets you keep your wrists straight and your hands fall naturally into the key wells. In addition it puts modifier keys under the thumbs, which are stronger than your pinkies and require less stretching. It takes a little getting used to.
  • Because the Kinesis is so deep, the Wacom Bamboo tablet is elevated next to it. I use it with my left hand and find the pen motion very comfortable. Note that the keyboard is off-center to make room for the tablet.
  • In front of the keyboard and tablet is an elevated, angled homemade foam pad. This was recommended by my ergonomics specialist and supports the forearms. Without such a pad, because the Kinesis is so high, your wrists will fall. Make sure the pad follows the exact curve of your forearms in relation to the keyboard, otherwise you won’t get even support. Mine was wrong for a while and made my shoulder bunch up. (For some people a chair with arms can fulfill the support function of this pad, but I always hated them.)
  • The awkwardly named “theBoom” headset, which I bought for speech recognition, is now most useful for phone calls.
  • After this picture was taken, I replaced the reflection-prone Thunderbolt Display with a Samsung 27A850D, which has has a great anti-glare coating, especially with the small dot pitch.

Note also that the screen is well-elevated to keep your head upright. Learn to touch type in the unlikely event that you don’t already–otherwise you’ll keep looking down.

While working, when you feel discomfort, take a short break, stretch your arms and shoulders, and walk around.

Your body is unique, so experiment to find what feels best for you.

I’ve used speech recognition extensively and it sucks. It’s not directly usable for anything except chatting because of the extent of correction required. You can try it if you want, but I found it better to simply avoid low-value typing tasks like chat and email. Learn to be terse. Also make an effort to learn your editor shortcuts, so you don’t have to retype common keywords.

(I did switch to a Dvorak keyboard layout, which I use to this day. I don’t think it made a difference; maybe because typing code is different from typing the English sentences Dvorak is optimized for. Some people like it.)

Some people also find that changing their sleeping position can help, or that they have some other non-work bad habit that is compressing a nerve in the arm.

One last thing, if you play console video games, the XtendPlay makes the controller much more comfortable.

fixing your personal health

The second thing to focus on is getting your personal, physical health in line.

First see your general practitioner and get a blood test to indicate if you have any specific vitamin deficiency; then see an orthopedic specialist to rule out any acute physical problem. Also try to see a physical therapist for a few months (especially if it’s covered by your health insurance or worker’s comp). They can help identify and reduce the tension or inflammation in your muscles and nerves. Often the source of the tension is remote from the location of the pain, so it can be quite enlightening.

I have been deliberately avoiding addressing the internal mechanisms of RSI. However, the basic recovery principle of “stretch and strengthen” applies. Many muscles, tendons, and nerves operate together for proper upper body function. If they are not elastic enough, or too atrophied to function properly, you will have pain, which may be local, may be remote, or may be induced in some other part of the body that has to over-compensate. For example, my ulnar nerve compression appeared to originate around the shoulders.

Unless specific and severe physical trauma has been clearly identified, like in some carpal tunnel cases, surgery won’t accomplish anything except atrophy your muscles further.

Even once you switch to an ergonomic setup, you will not heal until you stretch and strengthen your body.

For me it was hard to overestimate the impact of exercising my shoulders and back. There’s a lot of ways to stretch and strengthen. Yoga is good. Various kinds of massage can help with the stretch, although not the strengthen. Low to mid-intensity upper body weight training is very, very good. I used to go to the gym and use the weight machines. These days I use Your Shape 2012 with the Xbox Kinect, which has a wide variety of exercises and supports the use of free weights.

Since you’re taking up a regular exercise program, add some cardio to the mix to help your health generally. Follow the usual recommendations for exercise: 20-45 minutes, 4 days a week.

You also should try to get plenty of rest and eat healthy food. The older you are, the slower you will heal, so prepare to keep up the exercise indefinitely to stave off future problems.

If you don’t regularly stretch and strengthen your body, you won’t get better.

fixing your brain

The third thing you need to fix is your mental health. No joke. One thing that keeps a lot of people locked into chronic RSI pain, even after they have solved their ergonomic or physiological problems, is psychosomatic stress expression.

You were probably stressed and working extra hard to begin with–otherwise you wouldn’t have strained yourself. On top of that, having RSI causes stress! It becomes your constant companion. You can’t enjoy your hobbies; you can’t perform common tasks like use the phone or even brush your teeth; your career suffers; you are afraid that everything you touch is making your problem worse; and you worry about what your life will become if you never get well.

The result is that your mind starts to avoid all this stress by sublimating it. In particular, your intense fear of RSI pain now triggers psychosomatic pain at the RSI sites! How inconvenient.

To clarify, the pain is real, not imaginary. The cause is simply no longer solely an external physical interaction, but also internal tension or inflammation induced by your unconscious mind. The induced pain conforms to the socially validated diagnosis, RSI, successfully sublimating the stress.

If you experienced sudden-onset RSI, without obvious physical trauma, at a stressful time in your life or career, it is likely to be at least partially psychosomatic. Other potential indicators of a psychosomatic preoccupation are pain that has lasted for more than several months, pain that “moves around”, intense fear and feelings of hopelessness, and utilization of a wide variety of healthcare resources without relief.

The good news is that this is pretty easy to fix; you just have to bring the stress response to conscious awareness. Then as you begin to re-engage in tasks with an open mind instead of with fear, you find that much of the pain is gone. I highly recommend reading John Sarno’s book, The Divided Mind. For me simply reading and understanding the book’s theory resolved the psychosomatic component (which was layered onto the original physical problems) in about 4 weeks.

Your mother would be proud.

Once you stop the psychosomatic pain, then the ergonomic improvements/exercise outlined above can rapidly improve any remaining externally induced pain.

I have previous experience with cognitive therapy, so a book of applied Freudian psychology was not so much of a stretch. It might be a bit of a mind bender for you. Sarno has an earlier book, The Mindbody Prescription, which is not as theoretically robust, but more therapy-oriented. (In hindsight, I suspect that the reason some people find success with various exotic behavior modification regimens is their inclusion of a crude mind/body component.)

All you have to bring to the table is an openness to the idea that your mind can affect your body.


To solve my problem, I had to repair all three types of issue simultaneously. In the past, addressing them one at a time, I would always backslide. Even when addressing them all at once, it took a good year for me to get back in shape. Your recovery may be faster, depending on which causes are contributing the most.

Once well, you will always be more sensitive to ergonomic issues, and at some risk of a relapse–or maybe you are just more aware of it now.  For example, laptops still irritate my hands and shoulders after sustained use. But if I must use one I do ok, and don’t have any lingering discomfort as long as I exercise.

Remember that in the end, you are responsible for figuring out how to get well. It may not be easy–but it is possible.

hello heroku world

I’ve been investigating various platform-as-a-service providers, and did some basic benchmarking on Heroku.

I deployed a number of HTTP hello-world apps on the Cedar stack and hammered them via autobench. The results may be interesting to you if you are trying to maximize your hello-world dollar.


Each Heroku dyno is an lxc container with 512MB of ram and an unclear amount of CPU. The JVM parameters were -Xmx384m -Xss512k -XX:+UseCompressedOops -server -d64.

The driver machine was an EC2 m1.large in us-east-1a, running the default 64-bit Amazon Linux AMI. A single httperf process could successfully generate up to 25,000 rps with the given configuration. Timeouts were set high enough to allow any intermediate queues to stay flooded.

throughput results

In the below graphs, the response rate is the solid line ━━ and the left y axis; connection errors as a percentage are the dashed line ---- and the right y axis. The graphs are heavily splined, as suits a meaningless micro-benchmark.

Note that as the response rates fall away from the gray, dashed x=y line, the server is responding increasingly late and thus would shed sustained load, regardless of the measured connection error rate.

Finagle and Node made good throughput showings—Node had the most consistent performance profile, but Finagle’s best case was better. Sinatra (hosted by Thin) and Tomcat did OK. Jetty collapsed when pushed past its limit, and Bottle (hosted by wsgiref) was effectively non-functional. Finally, the naive C/Accept “stack” demonstrated an amusing combination of poor performance and good scalability.

As the number of dynos increases, the best per-dyno response rate declines from 2500 to below 1000, and the implementations become less and less differentiated. This suggests that there is a non-linear bottleneck in the routing layer. There also appears to be a per-app routing limit around 12,000 rps that no number of dynos can overcome (data not shown). For point of reference,  12,000 rps is the same velocity as the entire Netflix API.

The outcome demonstrates the complicated interaction between implementation scalability, per-dyno scalability, and per-app scalability, none of which are linear constraints.

latency results

Latency was essentially equivalent across all the stacks—C/Accept and Bottle excepted. Again, we see a non-linear performance falloff as dynos increases.

The latency ceiling at 100ms in the first two graphs is caused by Heroku’s load-shedding 500s; ideally httperf would exclude those from the report.

using autobench

Autobench is a tool that automates running httperf at increasing rates. I like it a lot, despite (or because) of its ancient Perl-ness. A few modifications were necessary to make it more useful.

  • Adding a retry loop around the system call to httperf, because sometimes it would wedge, get killed by my supervisor script, and then autobench would return empty data.
  • The addition of --hog to the httperf arguments.
  • Fixing the error percentage to divide by connections only, instead of by HTTP responses, which makes no sense when issuing multiple requests per connection.
  • Only counting HTTP status code 200 as a successful reply.

I am not sure what happens that wedges httperf. Bottle was particularly nasty about this. At this point I should probably rewrite autobench in Ruby and merge it with my supervisor script, and also have it drive hummingbird which appears to be more modern and controllable than httperf.

using httperf

Httperf is also old and crufty, but in C. In order to avoid lots of fd-unavail errors, you have to make sure that your Linux headers set FD_SETSIZE large enough. For the Amazon AMI:

/usr/include/bits/typesizes.h:#define __FD_SETSIZE 65535
/usr/include/linux/posix_types.h:#define __FD_SETSIZE 65535

Fix the --hog bug in httperf itself, and also drop the sample rate to 0.5:

src/httperf.c:#define RATE_INTERVAL 0.5

You can grab the pre-compiled tools below if you don’t want to bother with updating the headers or source manually.

Finally, you need to make sure that your ulimits are ok:

/etc/security/limits.conf:* hard nofile 65535


My benchmark supervisor, pre-compiled tools, and charting script, as well as all the raw data generated by the test, are on Github. Please pardon my child-like R.

Hello-world sources are here:

All the testing ended up costing me $48.52 on Heroku and $23.25 on AWS. I would advise against repeating it to avoid troubling the Heroku ops team, but maybe if you have a real application to test…

ideal hdtv settings for xbox 360

My XBox 360 broke, and since my new one supported HDMI, I reworked the connection to the TV (a Samsung PN50A450 plasma). It’s tricky to get the best performance out of the combination so I wanted to mention it here.


Even though the HDMI connection is digital, both the XBox and the TV have hardware scalers that degrade the signal. The conversion chain works like this:

Game resolution (for Battlefield 3, 704p) → XBox HD resolution (for standard HD, 720p) → TV native resolution (for this Samsung, 768p)

Remember the XBox is essentially a Windows PC and games can choose whatever resolution they please. Now, in a normal PC, the resolution requested by the game would be transmitted directly to the monitor, and the monitor’s scaler would scale it. If the game chooses the monitor’s native resolution then there is no scaling.

Also remember that a 720p-labeled TV doesn’t necessarily mean the TV’s native resolution is 720p. It’s just 720p “class”.

We can’t eliminate scaling entirely because we can’t change the game’s resolution, but we can still remove one scaler from the chain by having the XBox scale to the native TV resolution. When that happens, the Samsung shuts off its internal scaler (which also handles some post-processing effects). This gives us much sharper detail and reduces the notorious HDTV display latency.

How to do it:

  1. Connect the XBox to the TV via HDMI on HDMI channel 2.
  2. Go into SettingsConsole SettingsDisplayHDTV Settings on the XBox and choose 1360 x 768. If the setting isn’t available, it means you’re connected to the wrong HDMI port.

Crispy pixels! This configuration also works for VGA output if your XBox doesn’t support HDMI.

You can also tell that you’re running at the native resolution via the TV’s menu, because the Detailed Settings option will be grayed out. This is because the TV’s scaler is not running.


Now we need to open up the color response range of the TV while it’s in native resolution in order to return to high contrast. Since the scaler/post-processor is not running, my TV at least can’t do the usual 16-235 levels remapping for video signals.

Go into the service menu on the Samsung. (Dangerous! Stay away from anything that says “calibration” or your TV can become unusable).

  1. With the TV off, press MUTE 1 8 2 POWER on the remote.
  2. Using the up and down menu keys, choose ADC Target.
  3. Use the following settings for 1st PC, 2nd PC, and 2nd HDMI:
    1. Low: 0
    2. High: 255
    3. Delta: 0
  4. Press MUTE MUTE POWER on the remote to save your settings.

Go into SettingsConsole SettingsDisplayReference Levels on the XBox and set it to Expanded. Also set HDMI Color Space to RGB.


Also note that the Samsung service menu will display the resolution the TV is running at, which is handy.

memcached gem performance across VMs

Thanks to Evan Phoenix, memcached.gem 1.3.2 is compatible with Rubinius again. I have added Rubinius to the release QA, so it will stay this way. 

The master branch is compatible with JRuby, but a JRuby segfault (as well as a mkmf bug) prevents it from working for most people.

vm comparison

Memcached.gem makes an unusual benchmark case for VMs. The gem is highly optimized in general, and specially optimized for MRI. This means it will tend to not reward speedups of “dumb” aspects of MRI because it doesn’t exercise them—contrary to many micro-benchmarks.

                                          user     system      total        real
set: libm:ascii                       2.440000   1.760000   4.200000 (  8.284000)
get: libm:ascii                       [SEGFAULT]

set: libm:ascii                       1.387198   1.590912   2.978110 (  6.576674)
get: libm:ascii                       2.076829   1.705302   3.782131 (  7.237497)

REE 1.8.7-2011.03
set: libm:ascii                       1.130000   1.530000   2.660000 (  6.331992)
get: libm:ascii                       1.250000   1.540000   2.790000 (  6.142529)

Ruby 1.9.2-p290
set: libm:ascii                       0.860000   1.490000   2.350000 (  5.917467)
get: libm:ascii                       1.030000   1.580000   2.610000 (  6.238965)

JRuby’s performance is surprisingly OK, but only once Hotspot has been convinced to JIT the function to native code (which the benchmark does ahead of time). Rubinius’s performance is good. Ruby 1.9.2 is the fastest.

jruby client comparison

Curiously, memcached.gem is the fastest Ruby memcached client on every VM including JRuby. It is 70% faster than jruby-memcache-client, which wraps Whalin’s Java client via JRuby’s Java integration:

memcached 1.3.3
remix-stash 1.1.3
jruby-memcache-client 1.7.0
dalli 1.1.2
                                          user     system      total        real
set: dalli:bin                       10.720000   7.250000  17.970000 ( 17.859000)
set: libm:ascii                       2.440000   1.760000   4.200000 (  8.284000)
set: libm:bin                         2.280000   1.960000   4.240000 (  8.600000)
set: mclient:ascii                    4.150000   3.010000   7.160000 ( 11.879000)
set: stash:bin                        5.870000   2.970000   8.840000 ( 13.677000)


This is great performance for C extensions in JRuby and Rubinius both. It’s handy that MRI’s extension interface is so simple.

One possible performance improvement remains in memcached.gem itself, which is rewriting the bundled copy of libmemcached to talk directly to Ruby instead of via SWIG, which introduces memory copy overhead.

Also, someone needs to write a faster client for JRuby; there’s no reason why binding to a good native library like Whalin’s or xmemcached should be slow. It should be possible to equal the speed of memcached.gem on Ruby 1.9.


Maximizing simplicity is the only guaranteed way to minimize software maintenance. Other techniques exist, but are situational. No complex system will be cheaper to maintain than a simple one that meets the same goals.

‘Simple’, pedantically, means ‘not composed of parts’. However! Whatever system you are working on may already be a part of a whole. Your output should reduce the number and size of parts over all, not just in your own project domain.

Electra at the Tomb of Agamemnon, Frederic Leighton

I’ve started asking myself, “does this add the least amount of new code?” A system in isolation may be quite simple, but if it duplicates existing functionality, it has increased complexity. The ideal change is subtractive, reducing the total amount of code: by collapsing features together, removing configuration, or merging overlapping components.

Better to put your configuration in version control you already understand, than introduce a remote discovery server. Better to use the crufty RPC library you already have, than introduce a new one with a handy feature—unless you entirely replace the old one.

Beware the daughter that aspires not to the throne of her mother.

performance engineering at twitter

A few weeks ago I gave a performance engineering talk at QCon Beijing/Tokyo. The abstract and slides are below.


Twitter has undergone exponential growth with very limited staff, hardware, and time. This talk discusses principles by which the wise performance engineer can make dramatic improvements in a constrained environment. Of course, these apply to any systems architect who wants to do more with less. Principles will be illustrated with concrete examples of successes and lessons learned from Twitter’s development and operations history.


Performance Engineering at Twitter on Prezi

This is the first time I’ve used Prezi; the non-linear flow is compelling.

see it again sam

I will be giving the same talk this fall at QCon São Paulo and QCon San Francisco, so you can catch it there, and I think eventually the video will be online. This was also my first time speaking publicly in two years. Tons of new things to share with the world!

distributed systems primer, updated

Well, it’s been a long time. But! I have five papers to add to my original distributed systems primer:


CRDTs: Consistency Without Concurrency Control, Mihai Letia, Nuno Preguiça, and Marc Shapiro, 2009.

Guaranteeing eventual consistency by constraining your data structure, rather than adding heavyweight distributed algorithms. FlockDB works this way.


The Little Engines That Could: Scaling Online Social Networks, Josep M. Pujol, Vijay Erramilli, Georgos Siganos, Xiaoyuan Yang Nikos Laoutaris, Parminder Chhabra, and Pablo Rodriguez, 2010.

Optimally partitioning overlapping graphs through lazy replication. Think of applying this technique at a cluster level, not just a server level.

Feeding Frenzy: Selectively Materializing Users’ Event Feeds, Adam Silberstein, Jeff Terrace, Brian F. Cooper, and Raghu Ramakrishnan, 2010.

Judicious session management and application of domain knowledge allow for optimal high-velocity mailbox updates in a memory grid. Twitter’s timeline system works this way.

systems integration

Dapper, a Large-Scale Distributed Systems Tracing Infrastructure, Benjamin H. Sigelman, Luiz André Barroso, Mike Burrows, Pat Stephenson, Manoj Plakal, Donald Beaver, Saul Jaspan, and Chandan Shanbhag, 2010.

Add a transaction-tracking, sampling profiler to a reusable RPC framework and get full stack visibility without performance degradation.

Forecasting MySQL Scalability with the Universal Scalability Law, Baron Schwartz and Ewen Fortune, 2010.

An example of data-driven scalability modeling in a concurrent system, via a least-squares regression approach.

Happy scaling. Make sure to read the original post if you haven’t.

object allocations on the web

How many objects does a Rails request allocate? Here are Twitter’s numbers:

  • API: 22,700 objects per request
  • Website: 67,500 objects per request
  • Daemons: 27,900 objects per action

I want them to be lower. Overall, we burn 20% of our front-end CPU on garbage collection, which seems high. Each process handles ~29,000 requests before getting killed by the memory limit, and the GC is triggered about every 30 requests.

In memory-managed languages, you pay a performance penalty at object allocation time and also at collection time. Since Ruby lacks a generational GC (although there are patches available), the collection penalty is linear with the number of objects on the heap.

a note about structs and immediates

In Ruby 1.8, Struct instances use fewer bytes and allocate less objects than Hash and friends. This can be an optimization opportunity in circumstances where the Struct class is reusable.

A little bit of code shows the difference (you need REE or Sylvain Joyeux’s patch to track allocations):

def sizeof(obj)
  puts "#{GC.num_allocations} allocations"
  puts "#{GC.allocated_size} bytes"

Let’s try it:

>>"Test", :a, :b, :c)
>> struct =,2,3)
=> #<struct Struct::Test a=1, b=2, c=3>
>> sizeof(struct)
1 allocations
24 bytes

>> hash = {:a => 1, :b => 2, :c => 3}
>> sizeof(hash)
5 allocations
208 bytes

Watch out, though. The Struct class itself is expensive:

>> sizeof(Struct::Test)
29 allocations
1216 bytes

In my understanding, each key in a Hash is a VALUE pointer to another object, while each slot in a Struct is merely a named position.

Immediate types (Fixnum, nil, true, false, and Symbol) don’t allocate, except for Symbol. Symbol is interned and keeps its string representations on a special heap that is not garbage-collected.

your turn

If you have allocation counts from a production web application, I would be delighted to know them. I am especially interested in Python, PHP, and Java.

Python should be about the same as Ruby. PHP, though, discards the entire heap per-request in some configurations, so collection can be dramatically cheaper. And I would expect Java to allocate fewer objects and have a more efficient collection cycle.