I’ve been investigating various platform-as-a-service providers, and did some basic benchmarking on Heroku.
I deployed a number of HTTP hello-world apps on the Cedar stack and hammered them via
autobench. The results may be interesting to you if you are trying to maximize your hello-world dollar.
Each Heroku dyno is an lxc container with 512MB of ram and an unclear amount of CPU. The JVM parameters were
-Xmx384m -Xss512k -XX:+UseCompressedOops -server -d64.
The driver machine was an EC2
us-east-1a, running the default 64-bit Amazon Linux AMI. A single
httperf process could successfully generate up to 25,000 rps with the given configuration. Timeouts were set high enough to allow any intermediate queues to stay flooded.
In the below graphs, the response rate is the solid line
━━ and the left
y axis; connection errors as a percentage are the dashed line
---- and the right
y axis. The graphs are heavily splined, as suits a meaningless micro-benchmark.
Note that as the response rates fall away from the gray, dashed
x=y line, the server is responding increasingly late and thus would shed sustained load, regardless of the measured connection error rate.
Finagle and Node made good throughput showings—Node had the most consistent performance profile, but Finagle’s best case was better. Sinatra (hosted by Thin) and Tomcat did OK. Jetty collapsed when pushed past its limit, and Bottle (hosted by wsgiref) was effectively non-functional. Finally, the naive C/Accept “stack” demonstrated an amusing combination of poor performance and good scalability.
As the number of dynos increases, the best per-dyno response rate declines from 2500 to below 1000, and the implementations become less and less differentiated. This suggests that there is a non-linear bottleneck in the routing layer. There also appears to be a per-app routing limit around 12,000 rps that no number of dynos can overcome (data not shown). For point of reference, 12,000 rps is the same velocity as the entire Netflix API.
The outcome demonstrates the complicated interaction between implementation scalability, per-dyno scalability, and per-app scalability, none of which are linear constraints.
Latency was essentially equivalent across all the stacks—C/Accept and Bottle excepted. Again, we see a non-linear performance falloff as dynos increases.
The latency ceiling at 100ms in the first two graphs is caused by Heroku’s load-shedding 500s; ideally
httperf would exclude those from the report.
Autobench is a tool that automates running
httperf at increasing rates. I like it a lot, despite (or because) of its ancient Perl-ness. A few modifications were necessary to make it more useful.
- Adding a retry loop around the system call to
httperf, because sometimes it would wedge, get killed by my supervisor script, and then
autobenchwould return empty data.
- The addition of
- Fixing the error percentage to divide by connections only, instead of by HTTP responses, which makes no sense when issuing multiple requests per connection.
- Only counting HTTP status code 200 as a successful reply.
I am not sure what happens that wedges
httperf. Bottle was particularly nasty about this. At this point I should probably rewrite
autobench in Ruby and merge it with my supervisor script, and also have it drive hummingbird which appears to be more modern and controllable than
/usr/include/bits/typesizes.h:#define __FD_SETSIZE 65535 /usr/include/linux/posix_types.h:#define __FD_SETSIZE 65535
--hog bug in httperf itself, and also drop the sample rate to 0.5:
src/httperf.c:#define RATE_INTERVAL 0.5
You can grab the pre-compiled tools below if you don’t want to bother with updating the headers or source manually.
Finally, you need to make sure that your
ulimits are ok:
/etc/security/limits.conf:* hard nofile 65535
Hello-world sources are here:
- C/Accept (A libevent implementation would be more representative.)
All the testing ended up costing me $48.52 on Heroku and $23.25 on AWS. I would advise against repeating it to avoid troubling the Heroku ops team, but maybe if you have a real application to test…