A place to cache linked articles (think custom and personal wayback machine)
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

index.md 11KB

4 years ago
12345678910111213141516171819202122232425262728293031323334353637
  1. title: I think I might put my whole site behind a CDN
  2. url: https://www.peterbe.com/plog/i-think-i-might-put-my-whole-site-behind-a-cdn
  3. hash_url: 42c30a5bf07ac03b4d7ad1fd1017371c
  4. <p><strong>tl;dr; I'm going to put this blog behind KeyCDN and I expect a 2-4x performance boost (on Time To First Byte).</strong></p><p>
  5. Right now, requests to my blog go straight to an Nginx server in DigitalOcean in NYC, USA. The Nginx server, 99% of the time, serves the blog posts (and static assets) as <code>index.html</code> files straight from disk. If the request is <code>GET /plog/some-slug</code> it will search for a file called <code>/path/to/cached/files/plog/some-slug/index.html</code> (or <code>index.html.br</code> or <code>index.html.gz</code> depending on the user agent's <code>Accept-Encoding</code> header). Only if the file doesn't exist on disk, it goes through to Django (via uWSGI built into Nginx). All of it is done with HTTP/2 and uses LetsEncrypt for SSL.</p><p><strong>This has been working great but it's time to step it up. It's time to put the whole site behind a CDN. And I think I'm going to use <a href="https://www.keycdn.com/">KeyCDN</a> for it.</strong></p><p>In the past, it <em>used</em>
  6. to be best-practice that you serve your HTML document from your smart server (e.g. Django) and then, for the static assets, you put in a CDN. Like this:</p><div class="highlight"><pre><span class="p">&lt;</span><span class="nt">html</span><span class="p">&gt;</span>
  7. <span class="p">&lt;</span><span class="nt">link</span> <span class="na">rel</span><span class="o">=</span><span class="s">"stylesheet"</span> <span class="na">href</span><span class="o">=</span><span class="s">"https://myaccount123.cloudakamaifastlyflare.com/static/main.d910ef9a33.css"</span><span class="p">&gt;</span>
  8. ...
  9. <span class="p">&lt;</span><span class="nt">body</span><span class="p">&gt;</span>
  10. <span class="p">&lt;</span><span class="nt">img</span> <span class="na">src</span><span class="o">=</span><span class="s">"https://myaccount123.cloudakamaifastlyflare.com/images/hero.jpg"</span><span class="p">&gt;</span>
  11. ...
  12. </pre></div><p>
  13. But with HTTP/2, this becomes an anti-pattern for web performance because your client has already made an expensive HTTP/2 connection (and SSL negotiation) to <code>https://yourcooldomain.com</code> and now it's cheap to just download the rest. I used to do it like that too and I don't regret it. As a matter of fact, on <a href="https://songsear.ch">https://songsear.ch</a> is straight to Nginx but all its images are (lazy) loaded via <code>songsearch-2916.kxcdn.com</code>. But I think, when time allows, I'll put all of Song Search behind a CDN too.</p><p>Basically, it's time to put the whole site behind a CDN. With smart purging techniques and smarter CDNs respecting your dynamic content cache control headers, it's time to share the load. ...all over the world.</p><h2>CDN Choices</h2><p>There are many sites that want to compare CDNs. But many are affiliated or even made by one of them. So it's hard to get comparisons. For example, <a href="https://www.keycdn.com/cdn-comparison">
  14. KeyCDN demonstrates they're the cheapest</a> by comparing themselves with 5 others that they picked. (But mind you, that seems reasonably backed up by <a href="https://cdn.reviews/cdn-pricing-comparison/">this comparison</a> on <code>cdn.reviews</code>).</p><p><a href="https://www.cdnperf.com/cdn-compare?type=performance&amp;location=world&amp;cdn=akamai-cdn,aws-cloudfront-cdn,cloudflare-cdn,fastly-cdn,google-cloud-cdn,keycdn">CDNPerf</a> does a decent job with cool graphs and stuff. Incidentally, they rank my current favorite (KeyCDN) as the slowest compared to the well known giants that I compared it to.</p><p><a href="https://www.peterbe.com/cache/ba/f3/baf32e8600eff760e6a0007a78576e24.png"><img src="https://www.peterbe.com/cache/3d/7d/3d7dfdd0ffbbc4d8d9a0016d9d243285.png" alt="CDNPerf graph"/></a></p><p>But mind you, the perf difference between KeyCDN and the winner (topmost in the graph as of today) is 36ms vs 47ms which are both fantastic numbers.</p><p><a href="https://www.peterbe.com/cache/8a/f1/8af1e977b43230f211c58b9f6f62a8fe.png"><img src="https://www.peterbe.com/cache/80/39/8039c5c53bd55c280f777e3c363052f9.png" alt="CDNPerf list"/></a></p><p>It's hard to compare CDNs because they're all pretty fast, and actually, they're all reasonably cheap. What really matters is the features and that's a lot harder to compare. CloudFlare often comes up as a CDN provider with stellar features that impress me. I've never actually used them but at least they mention "Fast cache purge" and "API programmability" are their key features. But they also don't mention Brotli caching which I know is a feature KeyCDN supports.</p><p><a href="https://www.keycdn.com">KeyCDN</a> has been great to me in the past when I've used it to CDN host static assets. I'm familiar with their interface and they recently launched an API to do things like purge-by-tag and purge-by-URL. They're cheap, which matters because in this context it's all side-project stuff I want to put behind a CDN. They have a <a href="https://pypi.org/project/keycdn/">Python library</a> which, although very rough around the edges, it works. And also very important; I've communicated very successfully with them through their support and they've been responsive and helpful. <strong>So I'll go with KeyCDN</strong>.</p><h2>The Opportunity</h2><p>Before I move my domain <code>www.peterbe.com</code> to become a CNAME for one of their CDN domains, I wanted to experiment a little and see how it works and what performance numbers I get for comparison. So I set up <code>beta.peterbe.com</code> and did some Django and Nginx wiring so it would work the same but with the difference that it goes through a CDN for everything.</p><p>Then I picked a <a href="https://www.peterbe.com/about">random page</a> and set up a <a href="https://hyperping.io">Hyperping monitor</a>
  15. from all of its available regions and let it brew for a while. Unfortunately, Hyperping doesn't let you compare two monitors side-by-side so you're going to have to use your own eyes to compare the graphs:</p><p><a href="https://www.peterbe.com/i-think-i-might-put-my-whole-site-behind-a-cdn/www.png"><img src="https://www.peterbe.com/cache/43/f3/43f39859b55222734938e53227907860.png" alt="www means no CDN, just the origin Nginx"/></a><br/><em>NOT behind a CDN (server is New York, USA)</em></p><p><a href="https://www.peterbe.com/i-think-i-might-put-my-whole-site-behind-a-cdn/beta.png"><img src="https://www.peterbe.com/cache/ed/f5/edf53729903322df2188b6cf63aa00ad.png" alt="beta means with a CDN in front"/></a><br/><em>Behind a CDN</em></p><p>The "total Response Time" in Hyperping doesn't really make sense. They're an average across <em>all regions</em>
  16. it pings from. If you live in, for example, Germany; the only response time that matters to you is 1,215 ms versus 40 ms. Equally, if you live somewhere in New York, the only response time that matters to you is 20 ms versus 64 ms.</p><p>I actually ran another benchmark. I used Python like this:</p><div class="highlight"><pre>
  17. <span class="n">t0</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span>
  18. <span class="n">r</span> <span class="o">=</span> <span class="n">requests</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">'https://www.peterbe.com/plog/some-slug'</span><span class="p">)</span>
  19. <span class="n">t1</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span>
  20. <span class="k">print</span><span class="p">(</span><span class="s2">"Took"</span><span class="p">,</span> <span class="n">t1</span> <span class="o">-</span> <span class="n">t0</span><span class="p">)</span>
  21. </pre></div><p>I did this from South Carolina which means my nearest <a href="https://www.keycdn.com/network">KeyCDN edge location</a>
  22. could be Atlanta, Miami, or New York. Either way, I'm reasonably near New York (compared to the rest of the world) so it'd be a fair performance comparison for all US east coast traffic. (Insert disclaimer here). It <em>downloads</em> the most recent blog posts, <em>in repeated cycles</em>, which gives the CDN a solid chance to warm up and then it compares the <em>median</em> of the last 100 downloads. The output of this is as follows:</p><pre>beta
  23. COUNT 1854 (but only using the last 100)
  24. HIT RATIO 100.0%
  25. AVERAGE (all) 63.12ms
  26. MEDIAN (all) 61.89ms
  27. www
  28. COUNT 1856 (but only using the last 100)
  29. HIT RATIO 100.0%
  30. AVERAGE 136.22ms
  31. MEDIAN 135.61ms</pre><p><em>("HIT RATIO" for the non-CDN URL means it was served entirely without Djando server rendering)</em></p><p>What it means is that the <em>median</em>, <em>with</em> a CDN is: <strong>62ms</strong> and <strong>135.6ms
  32. </strong> without. That's a <strong>2x boost</strong>.</p><p>The crawler stats script is available here: <a href="https://github.com/peterbe/peterbecom-cdn-crawl">github.com/peterbe/peterbecom-cdn-crawl</a> and I would be thrilled if you can clone it and run it and report what numbers you get and where you're running it from.</p><h2>Notes and Conclusion</h2><p>Mind you, 62ms vs. 136ms might sound like a silly difference if <a href="https://www.webpagetest.org/result/190423_KK_75a1e7c276197615ebe4a40e0c6e53e9/">Webpagetest</a> says it takes 700ms until the page is interactive (on an LTE connection). And this is a tiny super-optimized page. But never forget A) <strong>we can't all live in the US east-coast area</strong> and B) if the HTML can download marginally faster it allows the browser to parse it sooner and start downloading all the other stuff much sooner. It'll make a big difference! I'm sure you've all seen graphs like this:</p><p><a href="https://www.peterbe.com/cache/e7/46/e7461ea9b550d1646283682b91ea7566.png"><img src="https://www.peterbe.com/cache/df/25/df254c3ad66f5f6e50c2b6322267097d.png" alt="Cold-cache MDN page on 4G"/></a><br/><em>Imagine if all those static asset downloads could have started a whole second "to the left"</em></p><p>Of course a CDN is faster. It's no news. But it's also a hassle and it costs money. It's 2019 and most good CDNs now support Brotli, fast purge-by-url, and HTTP/2. It's time to make the switch! It's not like cache-invalidation is hard.</p><p><strong>UPDATE April 23 2019 (same day)</strong></p><p>KeyCDN has a neat looking tool that is similar to Hyperping but more of a one-off kinda deal. It's called <a href="https://tools.keycdn.com/performance">Performance Test</a> and I wouldn't be surprised it's biased as heck because they probably run these pings from the same location'ish as where they have the edge locations. Anyway, the results are nevertheless juicy. Note the last, <strong>TTFB column numbers</strong>.
  33. </p><p><a href="https://www.peterbe.com/cache/1b/c4/1bc4e68f5f84ea72b843ff6d2c8182f3.png"><img src="https://www.peterbe.com/cache/99/bc/99bc06105114c99f70ea6896847ba1a6.png" alt="Performance Test without CDN"/></a><br/><em>Performance Test <strong>without</strong> CDN</em></p><p><a href="https://www.peterbe.com/cache/f8/89/f8894c26f31b75dbbe80808320f713bc.png"><img src="https://www.peterbe.com/cache/07/6c/076c151d9552cc3feeda3deb95dfeb78.png" alt="Performance Test with CDN"/></a><br/><em>Performance Test <strong>with</strong> CDN</em></p><p/>