Feeds:
Posts
Comments

This post is about using pagelets to structure dynamic web applications. I’m currently experimenting with ways to do this in Rails, and have published some of my code, but hopefully some of the ideas covered here will be applicable to other frameworks and languages.

One of the patterns described below is Hierarchical Model-View-Controller (HMVC), which is excellently described in this post from Sam de Freyssinet. The PHP web framework Kohana (which I haven’t used) is based around HMVC.

Many of the techniques discussed below have long been used in large scale applications. Part of the aim of this work is to make it easier to extract benefit at smaller scales, and provide a path to more sophisticated configurations as the application grows. This project is a bit of an academic exercise at the moment and the code I’ve written barely been used in production. Hopefully that will change soon, but until then please treat the content of this post as a discussion point rather than a description of what’s been proven to work.

Pagelets

Pagelets are chunks of a web page. I’ve taken the term from Facebook’s description of their BigPipe infrastructure, but you could alternatively call them modules, components, subpages, cells or one of a number of other terms. An example might be a list of related items on the product pages of shopping site. Pagelets are a useful architectural construct from the point of view of maintainability (as they help keep responsibilities separate) and scalability (as they provide more flexibility in scaling out).

In this post I have a fairly specific definition of the term in mind, and not all page components fit into this category. To me, the characteristics of a good pagelet are something like:

  • Substantial content: Something like ‘the pagelet is complex enough to implement a meaningful domain model’.
  • Independence from the containing page: A component which can be rendered using just the ID of the primary page object is a better candidate than something which shares a complex, dynamically computed data structure with another part of the page.
  • Potential to be handled by a back-end service: If the content can be provided by a separate back-end service, pagelets potentially provide a useful tool for scaling.
  • Cachability: If the content of a pagelet can be cached more aggressively than the containing page (or vice versa), there are potentially scalability and perfomances wins to be had.
  • Localisation: Pagelets should be embedded into the containing page at a single insertion point. It may be possible to generalise the ideas in this post to relax this, but I’d like to keep things simple for now.

These are guidelines rather than rules, and invariably exactly what should and shouldn’t be considered a pagelet will depend very much on the application in question and the context in which it’s being run.

I believe most applications will have reusable interface components which it don’t really make sense to consider as pagelets, for example repeated visual elements, standard representations of domain objects and ‘widgets’ implementing common interaction patterns. It’s definitely a good idea to structure these, but they present a different set of problems to those that the pagelet pattern is trying to solve. In some cases the HTML will already provides sufficient structure, so trying to abstract them further will end up making the code less readable.

The Facebook post referenced above includes this diagram showing the pagelets on the Facebook home page, which gives some indication of the scope I have in mind.

Implementation Patterns

Once pagelets have been identified, there are multiple ways to approach the process of rendering them, assembling the full page and delivering it to the user. For example:

  • HMVC: The pagelet is inserted into the containing page directly as it is rendered through delegation to a secondary MVC triad. All execution happens in process and sequentially, so execution of the containing page stops while the pagelet is being rendered.
  • Parallel execution: Pagelets are rendered in process but in parallel with the main page. Rendering of the main page halts at each insertion point until the relevant content is ready.
  • BigPipe: See this post on the Facebook engineering blog. Similar to parallel execution, except pagelets are delivered to the client after the main body and positioned with Javascript. This prevents blocking during rendering of the main page.
  • Forwarded: The pagelet content is requested from a separate back-end server. This can be used in conjunction with any of the above three approaches.
  • Edge side includes: Pagelets are inserted either by caching proxy (such as Varnish) in front of the application, or on a content delivery network.
  • XHR: Pagelets are loaded client-side using Javascript to make additional HTTP reqests.

Which of these is most appropriate presumably depends on the content of the pagelet, the application it’s part of, the application’s usage patterns, the infrastructure supporting it, the skills of the team responsible for maintaining it and so on.

Rails Support

I started off with a quick search for existing projects in this area. I’d imagine that a lot of this has been done in an ad hoc fashion as required for specific products, but there have been a few publicly released tools:

  • ActionController::Components: Rails up to 2.2 supported nested rendering out of the box, so you could define a separate controller and view for the pagelet. This was deprecated since 2.0 and dropped entirely in 2.3.0, apparently for performance reasons. The functionality does seems to exist as a plugin, although as far as I can tell it isn’t maintained.
  • Cells: A similar idea, but uses a separate type of object for the pagelet’s ‘controller’.
  • Embedded Actions: Another plugin providing functionality similar to that dropped from Rails, but with support for caching and (slightly unconventional) use of the ‘respond_to’ method to allow different behaviour when a controller is embedded.

None of these seemed flexible enough for what I wanted. Specifically, I wanted a solution which would make internal and external handling of pagelets as similar as possible and would require minimal deviation from a standard Rails application. To me this means that pagelets should use standard Rails controllers, and should be addressed by URL internally and externally. I also didn’t consider the performance issues in the original Rails implementation to fundamentally invalidate the approach: slow code can be optimised, and the dispatch infrastructure has changed a lot with the introduction of Rack in Rails 2.3.

ActionEmbedding

Enter ActionEmbedding. This is a prototype Rails plugin I’m using as a test platform. The particular design goals are:

  • To make it easy to benefit from pagelets in regular Rails application without the need for additional infrastructure or significant changes to the configuration of the application.
  • To make it as easy as possible to change between pagelet delivery methods as the application evolves.

See the GitHub page for technical details, but essentially inline pagelets are implemented as regular controller actions with routable URLs. Using it is as simple as including an additional module in the ApplicationHelper class:

module ApplicationHelper
  include ActionEmbedding::Helpers
end

This provides a simple method to embed a pagelet within the view:

<%=embed_pagelet('/pagelets/two') %>

Changing the embedding method is done by providing an option hash for this method.

<%=embed_pagelet('/pagelets/two', :method => :proxy, :proxy_host => 'backendcluster.internal.net') %>

The plugin is very simple, but should work for Rails 2.2 and 2.3.

What Next?

I’ve got a lot of ideas, but really want to start using this in a real project to figure out what’s important. If you fancy giving this a go, any feedback or suggestions would be appreciated.

Repurposing Layouts

(I wrote this a while ago, but never got around to posting it anywhere, so there’s a chance Rails has moved on since then.)

I recently found myself in the position of needing to generate a large number of static pages using the application.html.erb layout. Normally this happens within the context of a web request, so the support functions provided by the controller are used. In this case however it didn’t make sense to treat each page independently for efficiency reasons. Instead I decided to use ERB directly, which required a bit of digging inside Rails to figure out how to reproduce the correct behaviour with respect to templates. The script I ended up with is run through script/runner, so basic Rails functionality is available.

The main take home point is that Rails does not use ERB#result to render templates. Instead, ERB is used to compile the template to Ruby code, which is then extracted (via ERB#src), manipulated and then executed. As well as allowing for post-processing of the code (which is used to set local variables), this has the advantage that the execution context can be more easily controlled.

To make it work you first need to construct a new class which will act as an execution context for the code generated from the template. Any instance variables referenced in the template need to be defined as appropriate on an instance of the class. You quite probably also want to mix in your application helper module so that methods defined there are available. Next up, load the layout code through ERB and extract the source. This code is then used to define an instance method on the new class. This method should take a block so that yield can be used in the normal way. Add another method to the class to call the one generated from the layout, and use that to call the new one, supplying a block which renders the page contents (possibly by running eval on source extracted from a second template). Ultimately you end up with something like this:

$layout = ERB.new(File.read("#{RAILS_ROOT}/app/views/layouts/application.html.erb"), nil, "-")
$page_template = ERB.new(File.read("..."), nil, "-")

class Renderer
  include ApplicationHelper

  def render
    @instance_var = "something"
    generate_page_with_layout do |sbl|
      eval($page_template.src)
    end
  end
end

Renderer.module_eval("def generate_page_with_layout()\n#{$layout.src}\nend")

Note that the third argument to ERB.new needs to be “-” if you use <%= … -%> notation to suppress line breaks in your templates – this seems to be an undocumented feature of ERB.

There are presumably a number of ways of improving on this – the code here was very much written for a one off task, so probably isn’t as flexible as it might be, so let me know if you have any suggestions.

This post is in response to episode #1 of the ‘Scaling Rails’ screencast series presented by Gregg Pollack, which discusses various ways one can improve page load times by focusing on network issues, for example by caching static content or minifying files. All good stuff, but it’s worth going into a little more detail on a few points.

The first point is covered in the discussion here and here around using query strings as a way of expiring cached content, which is what Rails does by default (API docs). Essentially the behaviour of the cache for GET requests with a query string (something after the ‘?’ in the URL) is browser dependent. It seems that for Internet Explorer and Firefox these requests are served from the cache without reference to the origin server as long as the document is fresh as per the server supplied caching headers. For Opera and Safari however, the browser will only use the cached version of the document after sending a validation request to the origin server using an If-Modified-Since: header, which results in a server round trip which would have been avoided if the query string had been omitted altogether. Note that I have only confirmed this on IE 8 (rc1) and Opera 9.63 running on Vista. I also believe that the assertion in the Think Vitamin article that Opera-like behaviour is required by a strict reading of the HTTP specification is incorrect; as far as I can tell RFC2616 only specifies this when no explicit expiry time is given (Section 13.9).

Of course, this may well not be serious enough an issue to make it worth implementing a full solution when Rails gives you a good approximation for free. With IE and Firefox you’ll get the desired behaviour and therefore cover the vast majority of your users. Those who use Opera or Safari will still save some time, but not the maximum possible, as they’ll be served a 304 response rather than full content for many of the requests they make. As described in the above articles, a solution which works for all browsers is to put the versioning information in the path rather than the query string, and use Apache rewrite rules (or the equivalent for your server) to serve the right content.

The second point is related, and is actually covered in the Rails API documentation itself: using timestamps as versioning information requires all of your backend servers to agree on which timestamp to use. This is potentially an issue if you have mongrels spread over multiple machines and your deployment mechanism can’t guarantee that the modified times of all files will be the same on all machines (which is pretty much impossible unless all of the machines share a filesystem). One solution which works if you always deploy from git (or similar) via Capistrano is to use the id of the commit as the version number, although this makes minor patches of individual files without a full deploy harder. Unfortunately, I don’t know of any out-of-the-box solution to this problem, so you’ll probably have to roll your own.

The final point is that all of these solutions get a lot more complex when stylesheets come into play. If you refer to static content in your stylesheets, which is more or less unavoidable if you use background images, you’ll need to implement a mechanism for updating version numbers within those too. This could either be by generating them dynamically in the same way that HTML content is generated (typically with large amounts of page caching), or generating them as a separate step during deployment. In either case, be careful that you get the expiry process right – remember that when an image is updated you’ll need to expire both the image file itself and any stylesheets which refer to it. Again, I don’t know of a pre-rolled solution to this, so you may need to invest a bit of time putting something together, although Sass is potentially a useful technology here.

(Update, 15 Feb 2009: It seems like the limitations of cache expiry via query strings may be more severe than I had thought. Will update this post one I have a greater understanding of the situation.)

Follow

Get every new post delivered to your Inbox.