This post is in response to episode #1 of the ‘Scaling Rails’ screencast series presented by Gregg Pollack, which discusses various ways one can improve page load times by focusing on network issues, for example by caching static content or minifying files. All good stuff, but it’s worth going into a little more detail on a few points.
The first point is covered in the discussion here and here around using query strings as a way of expiring cached content, which is what Rails does by default (API docs). Essentially the behaviour of the cache for GET requests with a query string (something after the ‘?’ in the URL) is browser dependent. It seems that for Internet Explorer and Firefox these requests are served from the cache without reference to the origin server as long as the document is fresh as per the server supplied caching headers. For Opera and Safari however, the browser will only use the cached version of the document after sending a validation request to the origin server using an If-Modified-Since: header, which results in a server round trip which would have been avoided if the query string had been omitted altogether. Note that I have only confirmed this on IE 8 (rc1) and Opera 9.63 running on Vista. I also believe that the assertion in the Think Vitamin article that Opera-like behaviour is required by a strict reading of the HTTP specification is incorrect; as far as I can tell RFC2616 only specifies this when no explicit expiry time is given (Section 13.9).
Of course, this may well not be serious enough an issue to make it worth implementing a full solution when Rails gives you a good approximation for free. With IE and Firefox you’ll get the desired behaviour and therefore cover the vast majority of your users. Those who use Opera or Safari will still save some time, but not the maximum possible, as they’ll be served a 304 response rather than full content for many of the requests they make. As described in the above articles, a solution which works for all browsers is to put the versioning information in the path rather than the query string, and use Apache rewrite rules (or the equivalent for your server) to serve the right content.
The second point is related, and is actually covered in the Rails API documentation itself: using timestamps as versioning information requires all of your backend servers to agree on which timestamp to use. This is potentially an issue if you have mongrels spread over multiple machines and your deployment mechanism can’t guarantee that the modified times of all files will be the same on all machines (which is pretty much impossible unless all of the machines share a filesystem). One solution which works if you always deploy from git (or similar) via Capistrano is to use the id of the commit as the version number, although this makes minor patches of individual files without a full deploy harder. Unfortunately, I don’t know of any out-of-the-box solution to this problem, so you’ll probably have to roll your own.
The final point is that all of these solutions get a lot more complex when stylesheets come into play. If you refer to static content in your stylesheets, which is more or less unavoidable if you use background images, you’ll need to implement a mechanism for updating version numbers within those too. This could either be by generating them dynamically in the same way that HTML content is generated (typically with large amounts of page caching), or generating them as a separate step during deployment. In either case, be careful that you get the expiry process right – remember that when an image is updated you’ll need to expire both the image file itself and any stylesheets which refer to it. Again, I don’t know of a pre-rolled solution to this, so you may need to invest a bit of time putting something together, although Sass is potentially a useful technology here.
(Update, 15 Feb 2009: It seems like the limitations of cache expiry via query strings may be more severe than I had thought. Will update this post one I have a greater understanding of the situation.)
see also http://jamescrisp.blogspot.com/2007/04/improve-rails-performance-through.html I suppose