This website is built using Remix, deployed using Fly, and sits behind Cloudflare. Cloudflare provides DDoS protection and caches everything.
Or at least I thought it cached everything. I realised yesterday that once you had landed on the website and were navigating within it, none of the extra content you saw would be cached.
For example, if you had landed on /posts
- that route itself would be cached, but when you clicked onto one of the listed posts you would actually hit the server to get the data for that post, even if someone else had just viewed the same page! Disaster. I spotted this by looking at the network tab and seeing that the returned headers included the following:
Cache-Control: public, max-age=300, s-maxage=3600
Cf-Cache-Status: DYNAMIC
We want to see a cache status of “HIT”; instead, we see the very ominous “DYNAMIC” - which can be translated as “not cached”.
Why does this happen? It’s an unfortunate confluence of the way Remix works and the way Cloudflare works.
In Remix, when you first land on a page, your browser will make a request and receive a rendered HTML document, along with the necessary JavaScript and CSS files. Once you’re already on the site, if you click an internal link, your browser will instead just request any new JavaScript scripts required for that route, and the JSON data to render it - the HTML will be rendered locally.
So:
johnwhiles.com/posts/foo
, we get an HTML document.johnwhiles.com/posts
to johnwhiles.com/posts/foo
, we instead request a blob of JSON.Remix lets you set the headers that will be sent along with these two possible responses seperately. In my case I was returning the same cache headers for both, but the blobs of JSON were not being cached by Cloudflare.
To understand why this happened, we need to understand a bit about how Cloudflare caches data. This is something I had neglected to do, assuming it would #JustWork.
Cloudflare first looks at the cache headers returned from your server. Because I returned public, max-age=300, s-maxage=3600
from most routes of this website, I assumed Cloudflare would cache basically everything.
However, Cloudflare also uses their own heuristics to decide what to cache. Even if you tell them to publicly cache a request, they may decide not to if they think the request is “dynamic”. By dynamic, they mean something that is likely to be different for each user. A request for JSON is assumed to be dynamic.
There have been a lot of cases where sensitive data has been revealed by improperly caching responses in public caches, so this is probably a sensible default for Cloudflare to have - even if it means fewer cache hits for all the people who don’t bother to learn how Cloudflare actually works.
I found the answer to how to fix this at the bottom of a page called Create Edge Cache TTL page rules - essentially, we need to specifically tell Cloudflare if there are some routes, where we are happy to cache dynamic data.
I went to the dashboard and added a page rule like so:
This page rule tells cloudflare that any route under my /posts
can be cached. Instantly, I could see that JSON requests were indeed being cached. Hopefully, this means that my site will perform better and be cooler. Because the only private routes on my site are in the Top Secret Admin Zone (TSAZ), I am probably going to set a page rule which will cache everything except routes starting with /admin - but we’ll see.
You should be careful with this. You don’t want to accidentally cache private data! Because my site is almost entirely public, this is less risky. But remember, it’s better to have your CDN not cache things than to leak important user data.