Latest Blog Posts

Before I get started, I'd like to point out that the code i'm going to demonstrate was inspired by Eric Florenzano's post Tagging Cache Keys for O(1) Batch Invalidation.

The concept is pretty simple. Everyone knows that the hardest thing about working with a cache is dealing with invalidation. when something in the database is changed, you want to clear the old invalid data out of the cache. but to do that, you need to know exactly which cache keys need to be removed. What's worse, if you have thousands of dynamically generated keys. Not only is it difficult to track them all, but the invalidation could take quite some time too.

The solution to both of the problem is quite brilliant really, and I want to again thank Eric Florenzano for his post which inspired this one. go check out his blog, its filled with great django tips.. or course, finish reading this first, What we're going to do is explain how the technique works, and then we're going to wrap it up in a nice module that you can import in place of django.core.cache.cache and use it just as you always would, but with the added benefit of this awesome key-grouping technique.

The "Trick" here, is that we aren't going to actually delete anything from the cache. We're just going to let it be purged when the cache fills up. Instead of deleting data, we need to alter the cache keys going into all of the standard cache functions and insert a small code that is unique to the group that you want this key to belong to. and then when you want to invalidate those entries, you simply change that unique code for the group, and all of a sudden the cache keys are different and the lookups will fail.

We're going to store the mapping of these group names to codes directly in the cache so that we can simply delete or overwrite that single key for a group in order to invalidate the whole set of related keys. We're also going to use uuid's, because that's what eric's example used, and they're guaranteed unique, and easy to generate.

The interface I'm going to offer is is just a set of wrappers around the standard django cache functions which all accept a single 'group' keyword argument.

from django.core.cache import cache

def get(key, default=None, group=None):
    if group:
        key = _make_key(group, key)
    return cache.get(key, default=default)

def set(key, value, timeout=0, group=None):
    if group:
        key = _make_key(group, key)
    return cache.set(key, value, timeout=timeout)

def add(key, value, timeout=0, group=None):
    if group:
        key = _make_key(group, key)
    return cache.add(key, value, timeout=timeout)

def delete(key, group=None):
    if group:
        key = _make_key(group, key)
    return cache.delete(key)

def get_many(keys, group=None):
    hashkey = _get_hashkey(group)
    keys = [_make_key(group, k, hashkey) for k in keys]
    return cache.get_many(keys)

def incr(key, delta=1, group=None):
    if group:
        key = _make_key(group, key)
    return cache.incr(key, delta=delta)

def decr(key, delta=1, group=None):
    if group:
        key = _make_key(group,  key)
    return cache.decr(key, delta=delta)

as you can see here, we dont do anything at all unless the group keyword argument is pass in, that allows this module to swapped in place of the standard django cache with no problems. if the group argument is passed into one of these functions, we use it to look up or generate a new UUID for the group.. the function which does this is defined as such:

from django.conf import settings
import uuid

# This prefix is appended to the group name to prevent cache key clashes.
_KEY_PREFIX = getattr(settings, 'BCACHE_KEY_PREFIX', "bcache__")

def _get_hashkey(group_name):
    hashkey = cache.get("%s%s" % (_KEY_PREFIX, group_name), None)
    if not hashkey:
        hashkey = uuid.uuid4()
        cache.set("%s%s" % (_KEY_PREFIX, group_name), hashkey)
    return hashkey

def _make_key(group_name, cache_key, hashkey=None):
    """ 
    Generates a new cache key which belongs to a group 
    
    """
    # This can be useful sometimes if you're doing a very large number 
    # of operations and you want to avoid all of the extra cache hits.
    if not hashkey:
        hashkey = _get_hashkey(group_name)
    return "%s__%s-%s" % (group_name, cache_key, hashkey)

We also need a convenient way to invalidate groups... so we have this function:

def invalidate_group(group_name):
    """
    Invalidates all cache keys belonging to group_name
    
    """
    cache.delete("%s%s" % (_KEY_PREFIX, group_name))

Pretty simple eh? Thats all there is to it. To demo our new cache group functions, here is an example view which caches results for various search terms.

def search(request):
    query = request.GET.get('q','')
    cache_key = "results-for-%s" % query
    results = cache.get(cache_key, group='search-results')
    if not results:
        results = Thing.objects.filter(text__icontains=query)
        cache.set(cache_key, results, 999999, group='search-results')        

Then maybe in our model's custom save handler, we might want to invalidate the whole group when something changes.

class Thing(models.Model):
    name = models.CharField(max_length=255)
    text = models.TextField()
    def save(self, *args, **kwargs):
        cache.invalidate_group('search-results')
        super(Thing, self).save(*args, **kwargs)

This has saved me a lot of hassle. if offers a nice clean interface that is just as easy to use as regular cache functions, which is great. as always, suggestions and improvements are appreciated.

Flexible Query-String-Building Template Tag

Posted: April 20, 2009
Comments: 0
Tags: django

You've probably been in a situation where you need to generate a bunch of links, all of which have only slight variations in get parameters. maybe for pagination, or some advanced list filters. Today we're going to make a template tag that will generate a querystring from a dict... Now hold on.. There's already a QueryDict class which has a .urlencode() method which we can use right? Well, that doesn't give us any flexibility to swap out variables in the querystring. This is probably best demonstrated with an example.

Let's say for example you have a dict in your template context called 'filters'

filters = {'page': 1, 'name__istartswith': 'a'}

now, say we want to generate a series of page links which will also include the name__istartswith argument (as well as any others that might be in that dict). Assuming we have another context variable called 'pages' which contains a list of page numbers, we can simply do this in our template.

{% for p in pages %}
  <a href="{% qs filters %}page={{ p }}{% endqs %}">{{ p }}</a>
{% endfor %}

This would generate the following html code

<a hef="?page=1&name__istartswith=a">1</a>
<a hef="?page=2&name__istartswith=a">2</a>
<a hef="?page=3&name__istartswith=a">3</a>

You can pass in multiple dicts to the tag which will be merged. You can also re-assign as many variables as you like, unset variables, or add new ones. here is a more advanced example

<!-- assuming params = {'other': 100} -->
<a href="{% qs filters params %}page=&name__istartswith=b&sort=1{% endqs %}">View All</a>
<!-- produces -->
<a href="?name__istartswith=b&sort=1&other=100">View All</a>

This would be even cooler if it was able to support multiple values like a real QueryDict. The problem then becomes how do you determine whether you are adding a value, or changing the value of a variable. If anyone has any thoughts on that, please let me know.

The full code for the template tag is below. Note that if a value in the dict is a boolean, it is converted to 1 or 0 before being put into the querystring.

from django import template
from django.http import QueryDict
register = template.Library()

def querystring(qsdict, extra=None):
    qs = QueryDict("").copy()
    qs.update(qsdict)
    if extra: 
        for bit in extra.split("&"):
            k, v = bit.split("=")
            if v:
                qs[k] = v
            elif k in qs:
                del(qs[k])
    for k in qs.keys():
        if isinstance(qs[k], bool):
            qs[k] = 1 if qs[k] else 0
        elif qs[k] is None:
            del(qs[k])
    return qs.urlencode()

@register.tag('qs')
def do_qs(parser, token):
    nodelist = parser.parse(('endqs',))
    parser.delete_first_token()
    dicts = token.split_contents()
    del(dicts[0])
    return QSNode(dicts, nodelist)

class QSNode(template.Node):
    def __init__(self, qdicts, nodelist):
        self.nodelist = nodelist
        self.qdicts = [template.Variable(qdict) for qdict in qdicts]

    def render(self, context):
        qdicts = [qdict.resolve(context) for qdict in self.qdicts]
        qdict = qdicts.pop(0)
        while qdicts:
            qdict.update(qdicts.pop(0))
        inner = self.nodelist.render(context).strip()
        qs = querystring(qdict, extra=inner)
        if qs: qs = "?" + qs
        return qs

Generic export to excel for any django model

Posted: April 09, 2009
Comments: 5
Tags: django excel

This post will show you how to extend the django admin to allow for any model to be exported to an excel spreadsheet. it uses the xlwt module to generate the spreadsheet.

views.py

def admin_export_xls(request, app, model):
    mc = ContentType.objects.get(app_label=app, model=model).model_class()
    wb = xlwt.Workbook()
    ws = wb.add_sheet(unicode(mc._meta.verbose_name_plural))
    for i, f in enumerate(mc._meta.fields):
        ws.write(0,i, f.name)
    qs = mc.objects.all()
    for ri, row in enumerate(qs):
        for ci, f in enumerate(mc._meta.fields):
            ws.write(ri+1, ci, unicode(getattr(row, f.name)))
    fd, fn = tempfile.mkstemp()
    os.close(fd)
    wb.save(fn)
    fh = open(fn, 'rb')
    resp = fh.read()
    fh.close()
    response = HttpResponse(resp, mimetype='application/ms-excel')
    response['Content-Disposition'] = 'attachment; filename=%s.xls' % \
          (unicode(mc._meta.verbose_name_plural),)
    return response

because the xlwt module doesn't allow you to write to an open file handle, we have to first create a temporary file using tempfile.mkstemp() and subsequently close the file, then save spreadsheet with the returned filename. This might not be the most efficient way, but it works.

in the urls file we need to point to the new view, be sure to put this line before the regular admin include since it acts as a sort of catch-all. [Update: Wrapped the view in admin.site.admin_view decorator.] urls.py

...

    (r'^admin/([^\/]+)/([^\/]+)/xls/$', admin_export_xls),
    (r'^admin/(.*)', admin.site.admin_view(admin.site.root)),
...

Now we need to add a button to the top of each change list template that we want to have this functionality.

change_list.html

{% extends 'admin/change_list.html' %}{% load i18n %}
{% block object-tools %}
<ul class="object-tools">
  {% if has_add_permission %}
  <li>
<a href="add/{% if is_popup %}?_popup=1{% endif %}" class="addlink">
      {% blocktrans with cl.opts.verbose_name as name %}Add {{ name }}{% endblocktrans %}
    </a>
</li>
  {% endif %}
  <li>
<a href="xls/" class="viewsitelink">Export to .XLS</a>
</li>
</ul>
{% endblock %}

Comments/suggestions appreciated.

Secure Downloads

Posted: March 24, 2009
Comments: 1
Tags: apache

When developing a website which deals with digital media such as E-Books, MP3s, Videos etc. you often need to be able to restrict access to these files only to users who have authenticated somehow. There are a few different approaches to solving this problem, but I've recently come across what I consider to be the simplest and most efficient way, and since it took me a while to actually find it, I thought I would document it here in hopes that I can save someone some googling time in the future.

The key to this solution is the X-SendFile header, which was originally developed for lighttpd, however an apache2 module is available as well. Installation of the module is beyond the scope of this blog post, but its a very straight forward process, and there is even a nice .deb package to make the process even easier on a debian or ubuntu host.

Once the module is installed, configuration is a snap. The module has 2 config options:

  • XSendFile on|off - Enable or disable processing of the X-SendFile header.
  • XSendFileAllowAbove on|off - Enable or disable serving files above the requested URI

The way the X-sendfile header works is so simple its brilliant, the application returns an HttpResponse which has the X-SendFile header pointing to a file on the local filesystem. When apache is returning this response to the user, it notices this header, and will serve the specified file instead of the body of the original response, so the trick is to make the file otherwise inaccessible to the user, but as long as the apache process has read access to the file, it will be able to serve it up via this directive.

a simple django view which serves up a pdf file only to authenticated users:

@login_required
def get_pdf(request, id):
    pdf = get_object_or_404(Pdf, pk=id)
    response = HttpResponse()
    response['X-SendFile'] = pdf.file.path  #The full path to the attached file.
    response['Content-Type'] = 'application/pdf'
    return response

in this example the file is only available if the user has logged in, but we can do any logic we want before deciding whether or not to grant access to the file.

There are other ways to accomplish this, such as having your application read the file from the filesystem and return it to the user, but that ties up the python threads and uses a lot of extra memory. Apache, lighty, and other web servers are much more efficient at serving static files, so we really want to let them handle it and free up the resources for more dynamic page requests.

Choosing randomly based on weight

Posted: March 20, 2009
Comments: 0
Tags: python

So, I was coding a django site which required displaying random ads on a page, each ad is given a weight which determines the frequency at which the ad is displayed. I will admit that my first thought was far from optimal, but I later came across a post on stackoverflow.com, with a more appropriate algorithm. The code on stackoverflow was written in C#, which is just... ugly, so here is the python version.

def random_by_weight(items, attr='weight'):
    """
    Randomly selects one item from 'items', using the
    'attr' property as a weight field.

    """
    
    total_weight = sum((getattr(i, attr) for i in items))
    rweight = random.randint(0, total_weight)
    for item in items:
        if rweight <= getattr(item, attr):
            return item
        rweight -= getattr(item, attr)

...and there you have it.