I have made some changes to my build script to change up the caching strategy for this site.

For my purposes, caching rules are defined in three places:

  1. S3 metadata (the origin). This is set to 64 days by default, with a few exceptions.
  2. Cloudfront time-to-live. This is set to one year maximum.
  3. In the HTTP header set by Cloudfront. This is set to 36 hours.

The cloudfront documentation contains this helpful table which explains the cloudfront caching behaviour. You can choose to defer to the origin for all behaviour, a simple strategy, but to add a complication files can be cached in two different places: in the CDN and in the browser. Ideally you want things cached for a very long time in the CDN, and for a shorter time in the user’s browser. Because you can’t control the browser cache as easily,1 and the fact that these things compound.

Most files on this site don’t change often (if at all) and can be safely cached for months without anyone noticing. The only problem is that the front page and the RSS feed do change every time a new post is added.

I switched to only invalidating these changeable files in the build script, but that didn’t seem to work. For about a month new posts weren’t appearing on the front page or on the feed. The cache was stuck somewhere, and because caching behaviour is subject to so many different overlapping rules it was difficult to track down the issue.

I reverted to flushing the entire cache whenever I build the site, which fixed the problem but still feels unnecessary. Probably a challenge to solve for the future. These files also now have their own max-age rules with limits at four and two hours, which should apply at all levels (hopefully).

In addition to all the above, I’ve turned on Cloudfront’s new Origin Shield feature. By putting an extra layer of caching in front of the origin, it promises a higher cache hit ratio, although in my case the origin is an S3 bucket in the same region. I’m sceptical about whether the additional cache would actually increase performance here.

Much of the performance tweaking I do now comes down to marginal gains. Still, I feel this is a much tighter setup than I had before.

  1. This is without setting entity tag headers on everything. Another strategy for immutable content is to just change the url along with it.