Memcached is generally treated as a black box. But what if you really need to know what’s in there? Not for runtime purposes, but for optimization and capacity planning?
demo
$ sudo peep --pretty 2479 time | exptime | nbytes | clsid | key | exprd | flushd 8658 | 613458 | 272 | 5 | "c3RhdH:171:5" | false | false 8658 | 0 | 6 | 1 | "current_c3RhdH:3" | false | false 8658 | 613458 | 281 | 5 | "c3RhdH:171:26" | false | false 8678 | 95078 | 6 | 1 | "User:1:auth:m4Uq" | false | false 8658 | 0 | 8 | 2 | "user_dGltZWxp:4" | false | false 8686 | 613486 | 1278 | 9 | "User:1:6" | false | false 8658 | 613458 | 1286 | 9 | "User:1:4" | false | false ... 8658 | 613458 | 283 | 5 | "c3RhdH:171:28" | false | false 8658 | 613458 | 277 | 5 | "c3RhdH:171:30" | false | false
Peep uses ptrace
to freeze a running memcached server, dump the internal key metadata, and return the server to a running state. If you have a good host ejection mechanism in your client, such as in the Twitter libmemcached builds, you won’t even have to change the production server pool. The instance is not restarted, and no data is lost.
Installation instructions are here. Requires Linux.
a note about heap management
Memcached has two separate memory management strategies:
- On read, if a key is past its expiry time, return
NOT FOUND
. - On write, choose an appropriate slab class for the value; if it’s full, replace the oldest-used (read or written) key with the new one.
Note that the second strategy, LRU eviction, does not check the expiry time at all. This makes eviction an O(1) operation, because the tail of the LRU linked list is immediately available; checking within that list for evicted keys first would raise the order to O(N). This optimization has subtle implications.
the problem
In the bad old days, we added a page cache to Twitter via memcached. We used a generational key scheme: a piece of user metadata was incremented on every change, and used to construct dependent keys. This eliminated direct invalidation concerns.
However, because of the orthogonal expiry and LRU strategies, we knew that the generational keys were affecting our evictions of direct keys. The continual massive onslaught of new generational keys would sweep out older direct keys, even when the generational keys were expired, but the direct keys were fresh.
But how could we prove it was happening? Eventually we ran peep on a representative server. Aggregate results showed that 75% of our cache pool was spent on expired page caches, and our cache horizon was only 5 hours. Moving the generational keys to a separate pool raised our cache horizon to two days, a 10x improvement.
aggregating results
The best way to aggregate peep results is to use MySQL’s excellent grouping abilities. Run peep with the --ugly
(unformatted) switch, and pipe it to a file. Then you can load it into an approprate MySQL schema:
CREATE TABLE entries ( lru_time int(11) default NULL, expires_at_time int(11) default NULL, value_size int(11) default NULL, suffix_size int(11) default NULL, it_flag varchar(255) default NULL, class_id int(11) default NULL, key_size int(11) default NULL, key_name varchar(255) default NULL, is_expired varchar(255) default NULL, is_flushed varchar(255) default NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8; LOAD DATA LOCAL INFILE '/tmp/peep.txt' INTO TABLE entries FIELDS TERMINATED BY ' | ' LINES TERMINATED BY '\n';
This will load very fast, even for millions of keys on a laptop; LOAD LOCAL INFILE
is an optimized pipeline.
Here are some handy queries for revealing cache properties:
fresh vs. expired values, by count and size
SELECT is_expired, COUNT(*), SUM(value_size) FROM entries GROUP BY is_expired; +------------+----------+-----------------+ | is_expired | COUNT(*) | SUM(value_size) | +------------+----------+-----------------+ | false | 688883 | 759401296 | | true | 472755 | 2430092660 | +------------+----------+-----------------+
histogram of values by size
SELECT ROUND(ROUND(LOG10(value_size) * 5, 0) / 5, 1) AS log, ROUND(AVG(value_size), 0) AS size, COUNT(*), RPAD('', COUNT(*) / 6500, '*') AS bar FROM entries GROUP BY log ORDER BY log DESC; +------+--------+----------+----------------------------------------+ | log | size | COUNT(*) | bar | +------+--------+----------+----------------------------------------+ | 5.6 | 422238 | 3 | | ... | 1.6 | 39 | 3652 | * | | 1.4 | 23 | 103434 | ************************************** | | 1.2 | 14 | 6369 | * | | 1.0 | 11 | 36588 | ****** | | 0.8 | 6 | 55795 | ********* | | 0.6 | 4 | 90332 | ************** | | 0.4 | 2 | 239 | | +------+--------+----------+----------------------------------------+
histogram of values by LRU age
SELECT ROUND(ROUND(LOG10( (SELECT MAX(lru_time) FROM entries) - lru_time) * 5, 0) / 5, 1) AS log, ROUND(AVG( (SELECT MAX(lru_time) FROM entries) - lru_time), -2) AS lru_age, COUNT(*), RPAD('', COUNT(*) / 6500, '*') AS bar FROM entries GROUP BY log ORDER BY log DESC; +------+-----------+----------+-------------------------------------+ | log | lru_age | COUNT(*) | bar | +------+-----------+----------+-------------------------------------+ | 4.6 | 34800 | 18064 | *** | | 4.4 | 24200 | 96739 | *************** | | 4.2 | 15700 | 212865 | ********************************* | | 4.0 | 10200 | 224703 | *********************************** | | 3.8 | 6500 | 158067 | ************************ | | 3.6 | 4100 | 108034 | ***************** | ... | 0.4 | 0 | 672 | | | 0.0 | 0 | 319 | | | NULL | 0 | 13400 | ** | +------+-----------+----------+-------------------------------------+
histogram of values by size for a namespace
SELECT key_name, ROUND(ROUND(LOG10(value_size) * 5, 0) / 5, 1) AS log, ROUND(AVG(value_size), 0) AS size, COUNT(*), RPAD('', COUNT(*) / 10000, '*') AS bar FROM entries WHERE key_name LIKE '%chunk:%' GROUP BY log ORDER BY log desc; +----------------------+------+-------+----------+----------------------+ | key_name | log | size | COUNT(*) | bar | +----------------------+------+-------+----------+----------------------+ | reply_chunk:69913 | 3.4 | 2511 | 1032 | ** | | user_chunk:1868495 | 3.2 | 1530 | 972 | ** | ... | user_chunk:1405137 | 1.6 | 40 | 2822 | ****** | | reply_chunk:2084579 | 1.4 | 24 | 4361 | ********* | | user_chunk:2455162 | 1.2 | 14 | 6141 | ************ | | user_chunk:5989656 | 1.0 | 12 | 2477 | ***** | | user_chunk:2268781 | 0.8 | 6 | 16527 | ******************** | +----------------------+------+-------+----------+----------------------+
slab allocation counts by class
SELECT class_id AS slab_class, MAX(value_size) AS slab_size, CEIL(COUNT(*) / ((1024 * 1024) / MAX(value_size))) AS allocated, ( (SELECT COUNT(*) FROM entries WHERE class_id = slab_class AND is_expired = "true") * 100 / (SELECT COUNT(*) FROM entries WHERE class_id = slab_class) ) AS `%_expired` FROM entries GROUP BY class_id ORDER BY class_id; +------------+-----------+-----------+-----------+ | slab_class | slab_size | allocated | %_expired | +------------+-----------+-----------+-----------+ | 1 | 8 | 1 | 99.8938 | | 2 | 32 | 5 | 6.0320 | | 3 | 71 | 9 | 36.4859 | ... | 34 | 187783 | 28 | 61.2903 | | 35 | 234205 | 130 | 1.8998 | | 36 | 286110 | 7 | 0.0000 | | 37 | 309832 | 2 | 75.0000 | | 38 | 397500 | 1 | 100.0000 | | 39 | 496014 | 1 | 100.0000 | +------------+-----------+-----------+-----------+
This is comparable to the memcached stats slabs
command, but lets you investigate things like expired ratio by slab, etc.
Play around and see what else you can come up with.
conclusion
Generational keys are often presented as the easy way to cache, but consider their implications. How else are you abusing your cache? Some suggested things to check:
- Keys without expiry times (which will bite you when you alter the server pool)
- Excessive use of very large slabs
- Uneven LRU age by slab (indicating poor slab allocation)
- Unexpected namespaces (revealing invalidation bugs)
The only unfortunate thing about peep right now is that it blocks the memcached server. I could do an incremental version, but it would be quite slow. The current version is already CPU-intensive, taking 15 minutes to dump a 4GB memcached on modern hardware (turn off your client-facing processes if you run mixed front-ends!). A version in C would be much quicker, if someone is up for it, but for now the Ruby version is sufficient.
Happy peeping.
Pure awesomeness, thanks for sharing this.
Nice! Thanks for sharing. I was considering writing an ugly hack in the form of a wrapper to keep tabs. Clearly, this is a much better way.
By doing this, you have won many beers, bought by me.
Out of curiosity: how long does it take to dump, with say, a 16GB memcached instance?
I assume it’s linear. To be safe, you could remove the server from your cache pool ahead of time. The queries will still be correct, since they use the server’s most-recently-used time, not the wall clock, for comparisons.
Here’s what happens when you peep in production:
No big deal. :-)
Great post; please keep them coming.
It’s interesting to see how few tools like peep we have today for memcached, the most popular cache solution in today’s big sites.
Why does memcached evict non-expired LRU keys before expired keys? That surprises me.
Why is checking two properties (expiry and timestamp) an expensive operation? I assume that (if memcached tracks keys with linked lists) checking for expired keys (as well as LRU keys) would be O(1).
Chris: I was wrong on this issue. The problem is that the free (expired) linked list is passive—elements are only inserted if they happen to be expired on an attempted read. This avoids a GC thread.
However, it means a guaranteed search for an expired key would require walking the entire LRU linked list, an O(N) operation. Instead, if the free list is empty, the LRU tail is used; that process finds a freeable item in O(1).
I updated the post.
Thanks for the explanation! You learn something new every day. <:)
Can you explain what the feature in your libmemcached builds that makes this better is? Are the changes you’ve made documented somewhere?
The big change is
AUTO_EJECT_HOSTS
support, which lets libmemcached itself manage temporary server failure (useful in this case because peep induces such a failure).This work is getting integrated into the next version of libmemcached, finally.
If you have further issues, please put them in the peep tracker.