bcache.py: Grouped Cache-keys for quick and easy invalidation.
Before I get started, I'd like to point out that the code i'm going to demonstrate was inspired by Eric Florenzano's post Tagging Cache Keys for O(1) Batch Invalidation.
The concept is pretty simple. Everyone knows that the hardest thing about working with a cache is dealing with invalidation. when something in the database is changed, you want to clear the old invalid data out of the cache. but to do that, you need to know exactly which cache keys need to be removed. What's worse, if you have thousands of dynamically generated keys. Not only is it difficult to track them all, but the invalidation could take quite some time too.
The solution to both of the problem is quite brilliant really, and I want to again thank Eric Florenzano for his post which inspired this one. go check out his blog, its filled with great django tips.. or course, finish reading this first, What we're going to do is explain how the technique works, and then we're going to wrap it up in a nice module that you can import in place of django.core.cache.cache and use it just as you always would, but with the added benefit of this awesome key-grouping technique.
The "Trick" here, is that we aren't going to actually delete anything from the cache. We're just going to let it be purged when the cache fills up. Instead of deleting data, we need to alter the cache keys going into all of the standard cache functions and insert a small code that is unique to the group that you want this key to belong to. and then when you want to invalidate those entries, you simply change that unique code for the group, and all of a sudden the cache keys are different and the lookups will fail.
We're going to store the mapping of these group names to codes directly in the cache so that we can simply delete or overwrite that single key for a group in order to invalidate the whole set of related keys. We're also going to use uuid's, because that's what eric's example used, and they're guaranteed unique, and easy to generate.
The interface I'm going to offer is is just a set of wrappers around the standard django cache functions which all accept a single 'group' keyword argument.
from django.core.cache import cache def get(key, default=None, group=None): if group: key = _make_key(group, key) return cache.get(key, default=default) def set(key, value, timeout=0, group=None): if group: key = _make_key(group, key) return cache.set(key, value, timeout=timeout) def add(key, value, timeout=0, group=None): if group: key = _make_key(group, key) return cache.add(key, value, timeout=timeout) def delete(key, group=None): if group: key = _make_key(group, key) return cache.delete(key) def get_many(keys, group=None): hashkey = _get_hashkey(group) keys = [_make_key(group, k, hashkey) for k in keys] return cache.get_many(keys) def incr(key, delta=1, group=None): if group: key = _make_key(group, key) return cache.incr(key, delta=delta) def decr(key, delta=1, group=None): if group: key = _make_key(group, key) return cache.decr(key, delta=delta)
as you can see here, we dont do anything at all unless the group keyword argument is pass in, that allows this module to swapped in place of the standard django cache with no problems. if the group argument is passed into one of these functions, we use it to look up or generate a new UUID for the group.. the function which does this is defined as such:
from django.conf import settings import uuid # This prefix is appended to the group name to prevent cache key clashes. _KEY_PREFIX = getattr(settings, 'BCACHE_KEY_PREFIX', "bcache__") def _get_hashkey(group_name): hashkey = cache.get("%s%s" % (_KEY_PREFIX, group_name), None) if not hashkey: hashkey = uuid.uuid4() cache.set("%s%s" % (_KEY_PREFIX, group_name), hashkey) return hashkey def _make_key(group_name, cache_key, hashkey=None): """ Generates a new cache key which belongs to a group """ # This can be useful sometimes if you're doing a very large number # of operations and you want to avoid all of the extra cache hits. if not hashkey: hashkey = _get_hashkey(group_name) return "%s__%s-%s" % (group_name, cache_key, hashkey)
We also need a convenient way to invalidate groups... so we have this function:
def invalidate_group(group_name): """ Invalidates all cache keys belonging to group_name """ cache.delete("%s%s" % (_KEY_PREFIX, group_name))
Pretty simple eh? Thats all there is to it. To demo our new cache group functions, here is an example view which caches results for various search terms.
def search(request): query = request.GET.get('q','') cache_key = "results-for-%s" % query results = cache.get(cache_key, group='search-results') if not results: results = Thing.objects.filter(text__icontains=query) cache.set(cache_key, results, 999999, group='search-results')
Then maybe in our model's custom save handler, we might want to invalidate the whole group when something changes.
class Thing(models.Model): name = models.CharField(max_length=255) text = models.TextField() def save(self, *args, **kwargs): cache.invalidate_group('search-results') super(Thing, self).save(*args, **kwargs)
This has saved me a lot of hassle. if offers a nice clean interface that is just as easy to use as regular cache functions, which is great. as always, suggestions and improvements are appreciated.
RSS
2009 May 13, 11:07 AM
2009 May 14, 10:42 AM
2009 July 16, 4:34 PM