Caching and Fingerprinting

A short story about a great caching strategy

Security! Security!

With a good chunk of the web moving to HTTPS for good, my previous team and I decided to join the trend and put all our front end applications behind the now fashionable security layer. You don’t wanna miss the S parteeeeeeey!

We used to host our front end applications on Amazon S3, which is just a file system. As such, it doesn’t offer TLS/SSL on its own; you need to complement it with Amazon CloudFront if you want to have a static website behind HTTPS. When it comes to multicontinental CDNs like CloudFront, the first words that come to my mind are cache and invalidation; whenever those two words come together in a sentence, my soul is inescapably taken by petrifying memories and I’m dragged into a state of paralysis for a few moments. Just imagine a website that looks all right to users located in Eritrea but not as much to those located in Japan… Imagine how many VPN sessions you’d need to go through in order to fix that sort of bug. So, no cache invalidation for me, gracias.

What’s in a filename

An effective and well known way of circumventing cache invalidation is hashing file names, also known as fingerprinting. The technique is ingenious: you change a file, it gets a new name. Done. The hash that’s used to name the file is based on the content of the file itself, which guarantees a unique hash and, consequently, a different file name every time said file is modified. For example, after updating styles.css, you’d end up with something like styles-O02LUkjfskj394cJuaw.css (this is just a silly example).

Of course, you don’t want to push that monstrosity to source control—that would result in a new file in Git on every update, which is crazy. Instead, what most frameworks that support this feature actually do is apply fingerprinting only during the build process, which generates minimized, or ready-to-be-deployed, files that shouldn’t be part of a project’s codebase (.gitignore is your friend).

Then, when you send the new file to a CDN, the old one won’t be pointed to anymore, because, hopefully, your super smart application framework will have updated the HTMLs to import the new CSS file (65036e67d90fb0bbb02d2ac7ee93eb9e.css) instead. The old 277b8aa9ccaf157c4d21fc249c333676.css ultimately gets forgotten and is sent to the CDN’s purgatory.

/img/2018/07/2018-07-17-caching-and-fingerprinting/md5-fingerprints-thumb.jpg

Example of fingerprinting with MD5.

Ember CLI is an example of such a framework. And Ember CLI is what we were using. And we were happy. Or almost.

Tweaking, and breaking, Ember CLI

By default, Ember CLI will apply fingerprinting only when you build your project for production (--env production). But we wanted the functionality for all environments, because it would simplify our deployment pipeline. Also, we were frequently showcasing our beta product to clients in non-production environments, and we wanted HTTPS on those too. In short, we needed all versions and all environments behind HTTPS, and, due to the S3 limitations mentioned above, we had no choice but put S3 behind CloudFront.

This is what I did in our ember-cli-build.js:

fingerprint: {
    enabled: true
}

And we got our files hashed when building (ember build) the project. Yay! And we also got our files hashed when serving (ember serve) the project. Nay…

How to fix it

You don’t want your files fingerprinted when running the app locally—it will break e-ve-ry-thing. So, I had to tell Ember to only fingerprint when building, not serving. Back to ember-cli-build.js, this is how I fixed it:

var isServing = false;
for (let argv of process.argv) {
    if (["s", "serve", "server"].indexOf(argv) > -1) {
        isServing = true;
        break;
    }
}

fingerprint: {
    enabled: !isServing
}

The loop above is used to make sure fingerprinting is disabled when the project is being served and enabled otherwise.

Conclusion

The code examples shown here are for Ember CLI, but the basic idea is applicable to any other framework that does file fingerprinting. I hope that now you have a better idea of how to deploy your application to CloudFront (or any other CDN) without having to worry about cache invalidation.