hello heroku world

I’ve been investigating various platform-as-a-service providers, and did some basic benchmarking on Heroku.

I deployed a number of HTTP hello-world apps on the Cedar stack and hammered them via autobench. The results may be interesting to you if you are trying to maximize your hello-world dollar.

setup

Each Heroku dyno is an lxc container with 512MB of ram and an unclear amount of CPU. The JVM parameters were -Xmx384m -Xss512k -XX:+UseCompressedOops -server -d64.

The driver machine was an EC2 m1.large in us-east-1a, running the default 64-bit Amazon Linux AMI. A single httperf process could successfully generate up to 25,000 rps with the given configuration. Timeouts were set high enough to allow any intermediate queues to stay flooded.

throughput results

In the below graphs, the response rate is the solid line ━━ and the left y axis; connection errors as a percentage are the dashed line ---- and the right y axis. The graphs are heavily splined, as suits a meaningless micro-benchmark.

Note that as the response rates fall away from the gray, dashed x=y line, the server is responding increasingly late and thus would shed sustained load, regardless of the measured connection error rate.

Finagle and Node made good throughput showings—Node had the most consistent performance profile, but Finagle’s best case was better. Sinatra (hosted by Thin) and Tomcat did OK. Jetty collapsed when pushed past its limit, and Bottle (hosted by wsgiref) was effectively non-functional. Finally, the naive C/Accept “stack” demonstrated an amusing combination of poor performance and good scalability.

As the number of dynos increases, the best per-dyno response rate declines from 2500 to below 1000, and the implementations become less and less differentiated. This suggests that there is a non-linear bottleneck in the routing layer. There also appears to be a per-app routing limit around 12,000 rps that no number of dynos can overcome (data not shown). For point of reference,  12,000 rps is the same velocity as the entire Netflix API.

The outcome demonstrates the complicated interaction between implementation scalability, per-dyno scalability, and per-app scalability, none of which are linear constraints.

latency results

Latency was essentially equivalent across all the stacks—C/Accept and Bottle excepted. Again, we see a non-linear performance falloff as dynos increases.

The latency ceiling at 100ms in the first two graphs is caused by Heroku’s load-shedding 500s; ideally httperf would exclude those from the report.

using autobench

Autobench is a tool that automates running httperf at increasing rates. I like it a lot, despite (or because) of its ancient Perl-ness. A few modifications were necessary to make it more useful.

  • Adding a retry loop around the system call to httperf, because sometimes it would wedge, get killed by my supervisor script, and then autobench would return empty data.
  • The addition of --hog to the httperf arguments.
  • Fixing the error percentage to divide by connections only, instead of by HTTP responses, which makes no sense when issuing multiple requests per connection.
  • Only counting HTTP status code 200 as a successful reply.

I am not sure what happens that wedges httperf. Bottle was particularly nasty about this. At this point I should probably rewrite autobench in Ruby and merge it with my supervisor script, and also have it drive hummingbird which appears to be more modern and controllable than httperf.

using httperf

Httperf is also old and crufty, but in C. In order to avoid lots of fd-unavail errors, you have to make sure that your Linux headers set FD_SETSIZE large enough. For the Amazon AMI:

/usr/include/bits/typesizes.h:#define __FD_SETSIZE 65535
/usr/include/linux/posix_types.h:#define __FD_SETSIZE 65535

Fix the --hog bug in httperf itself, and also drop the sample rate to 0.5:

src/httperf.c:#define RATE_INTERVAL 0.5

You can grab the pre-compiled tools below if you don’t want to bother with updating the headers or source manually.

Finally, you need to make sure that your ulimits are ok:

/etc/security/limits.conf:* hard nofile 65535

sources

My benchmark supervisor, pre-compiled tools, and charting script, as well as all the raw data generated by the test, are on Github. Please pardon my child-like R.

Hello-world sources are here:

All the testing ended up costing me $48.52 on Heroku and $23.25 on AWS. I would advise against repeating it to avoid troubling the Heroku ops team, but maybe if you have a real application to test…

2 responses

  1. Is that the exact source you used for Bottle? It isn’t configured to reply on /, but on /hello/:name:

    @route('/hello/:name')
    def index(name='World'):
        return 'Hello %s!' % name
    

    If you adjusted this, sorry, but it may be worth retrying with:

    @route('/')
    def index(name='World'):
        return 'Hello World!'
    
  2. It’s not the exact source; I removed the dynamic elements and changed the route similar to your suggestion. I am a little mystified by its poor performance, but it seemed to be running correctly.

Follow

Get every new post delivered to your Inbox.

Join 356 other followers