A place to cache linked articles (think custom and personal wayback machine)
Nelze vybrat více než 25 témat Téma musí začínat písmenem nebo číslem, může obsahovat pomlčky („-“) a může být dlouhé až 35 znaků.

index.md 46KB

před 4 měsíci
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693
  1. title: Engineering for Slow Internet
  2. url: https://brr.fyi/posts/engineering-for-slow-internet
  3. hash_url: 9e9c6f97d732010e14201f1624782ddc
  4. archive_date: 2024-05-31
  5. og_image: https://brr.fyi/media/engineering-for-slow-internet/engineering-for-slow-internet-icon.png
  6. description: How to minimize user frustration in Antarctica.
  7. favicon: https://brr.fyi/favicon-32x32.png
  8. language: en_US
  9. <p><em>Hello everyone! I got partway through writing this post while I was still in Antarctica, but I departed
  10. before finishing it.</em></p>
  11. <p><em>I’m going through my old draft posts, and I found that this one was nearly complete.</em></p>
  12. <p><em>It’s a bit of a departure from the normal content you’d find on brr.fyi, but it reflects my
  13. software / IT engineering background.</em></p>
  14. <p><em>I hope folks find this to be an interesting glimpse into the on-the-ground reality of using the Internet
  15. in bandwidth-constrained environments.</em></p>
  16. <p><em>Please keep in mind that I wrote the majority of this post ~7 months ago, so it’s likely that the
  17. IT landscape has shifted since then.</em></p>
  18. <hr>
  19. <p>Welcome back for a <strong>~~bonus post~~</strong> about Engineering for Slow Internet!</p>
  20. <p>For a 14-month period, while working in Antarctica, I had access to the Internet only through an
  21. extremely limited series of satellite links provided by the United States Antarctic Program.</p>
  22. <p>Before I go further, this post requires a special caveat, above and beyond my standard disclaimer:</p>
  23. <hr>
  24. <p><em>Even though I was an IT worker within the United States Antarctic Program,</em> <strong><em>everything</em></strong> <em>I am going to
  25. discuss in this post is based on either publicly-available information, or based on my own observations as a
  26. regular participant living on ice.</em></p>
  27. <p><em>I have not used any internal access or non-public information in writing this post.</em></p>
  28. <p><em>As a condition of my employment, I agreed to a set of restrictions regarding public disclosure
  29. of non-public Information Technology material. I fully intend to honor these restrictions.
  30. These restrictions are ordinary and typical of US government contract work.</em></p>
  31. <p><em>It is
  32. unlikely that I will be able to answer additional questions about matters I discuss in this post.
  33. I’ve taken great care to write as much as I am able to, without disclosing non-public information regarding
  34. government IT systems.</em></p>
  35. <hr>
  36. <p>Good? Ok, here we go.</p>
  37. <p><em>… actually wait, sorry, one more disclaimer.</em></p>
  38. <hr>
  39. <p><em>This information reflects my own personal experience in Antarctica, from August 2022 through December
  40. 2022 at McMurdo, and then from December 2022 through November 2023 at the South Pole.</em></p>
  41. <p><em>Technology moves quickly, and I make no claims that the circumstances of my own specific experience
  42. will hold up over time. In future years, once I’ve long-since forgotten about this post,
  43. please do not get mad at me when the on-the-ground IT experience in Antarctica evolves away from the snapshot
  44. presented here.</em></p>
  45. <hr>
  46. <p>Ok, phew. Here we go for real.</p>
  47. <p>It’s a non-trivial feat of engineering to get <strong>any</strong> Internet at the South Pole! If you’re bored,
  48. check out the <a href="https://www.usap.gov/technology/sctnsouthpolesats.cfm">South Pole Satellite Communications</a>
  49. page on the public USAP.gov website, for an overview of the limited selection of satellites available for
  50. Polar use.</p>
  51. <p>
  52. <a href="/media/engineering-for-slow-internet/south-pole-radomes-01.jpg">
  53. <picture>
  54. <source srcset="/media/engineering-for-slow-internet/south-pole-radomes-01-small.webp" type="image/webp"></source>
  55. <source srcset="/media/engineering-for-slow-internet/south-pole-radomes-01-small.jpg" type="image/jpg"></source>
  56. <img src="/media/engineering-for-slow-internet/south-pole-radomes-01-small.jpg" alt="Radomes 01">
  57. </picture>
  58. <em>South Pole's radomes, out in the RF sector. These radomes contain the equipment
  59. necessary to communicate with the outside world using our primary satellites.</em>
  60. </a>
  61. </p>
  62. <p>If you’re interested, perhaps also look into the
  63. <a href="https://www.usap.gov/news/4685/">2021 Antarctic Subsea Cable Workshop</a>
  64. for an overview of some hurdles associated with running traditional fiber to the continent.</p>
  65. <p><strong><em>I am absolutely not in a position of authority to speculate on the future of Antarctic connectivity!</em></strong>
  66. Seriously. I was a low-level, seasonal IT worker in a large, complex organization.
  67. Do not email me your
  68. ideas for improving Internet access in Antarctica – I am not in a position to do anything with them.</p>
  69. <p>I do agree with the widespread consensus on the matter: There is <strong>tremendous interest</strong> in
  70. improving connectivity to US research stations in Antarctica. I would timidly conjecture that, at some point,
  71. there will be engineering solutions to these problems.
  72. Improved connectivity will eventually arrive in Antarctica,
  73. either through enhanced satellite technologies or through the arrival of fiber to the continent.</p>
  74. <p>But – that world will only exist at some point in the future. Currently, Antarctic connectivity is
  75. <em>extremely limited</em>. What do I mean by that?</p>
  76. <p>Until very recently, at McMurdo, nearly <strong>a thousand people</strong>, plus numerous scientific
  77. projects and operational workloads, all relied on a series of links
  78. that provided max, aggregate speeds of a few dozen megabits
  79. per second to the <strong>entire station</strong>.
  80. For comparison, that’s less bandwidth shared by everyone <strong>combined</strong> than what
  81. everyone <strong>individually</strong> can get on a typical 4g cellular network in an American suburb.</p>
  82. <p>Things <strong>are</strong> looking up! The NSF recently
  83. <a href="https://www.nsf.gov/news/news_summ.jsp?cntn_id=307974&amp;org=OPP">announced</a> some important developments
  84. regarding Starlink at McMurdo and Palmer.</p>
  85. <p>I’m aware that the on-the-ground experience in McMurdo and Palmer is better now than it was even just a year ago.</p>
  86. <p>But – as of October 2023, the situation was still pretty dire at the South Pole.
  87. As far as I’m aware, similar developments regarding Starlink have <strong>not</strong> yet been announced for South Pole Station.</p>
  88. <p>As of October 2023, South Pole had the limitations described above,
  89. <strong>plus</strong> there was only connectivity for a few hours a day, when the satellites rose above the horizon
  90. and the station was authorized to use them.
  91. The satellite schedule generally shifts forward (earlier) by about 4 minutes per day, due to
  92. the <a href="https://en.wikipedia.org/wiki/Sidereal_time">difference between Sidereal time and Solar (Civil) time</a>.</p>
  93. <p>The current satellite schedule can be found online, on the
  94. <a href="https://www.usap.gov/technology/sctnsouthpolesats.cfm">South Pole Satellite Communications</a> page
  95. of the public USAP.gov website. Here’s an example of the schedule from October 2023:</p>
  96. <p>
  97. <a href="/media/engineering-for-slow-internet/satellite-schedule-01.png">
  98. <picture>
  99. <source srcset="/media/engineering-for-slow-internet/satellite-schedule-01.png" type="image/png"></source>
  100. <img src="/media/engineering-for-slow-internet/satellite-schedule-01.png" alt="Satellite Schedule 01">
  101. </picture>
  102. <em>South Pole satellite schedule, for two weeks in October 2023.</em>
  103. </a>
  104. </p>
  105. <p>These small intermittent links to the outside world are shared by <strong>everyone at Pole</strong>, for operational,
  106. science, and community / morale usage.</p>
  107. <p>Complicating matters further is the unavoidable physics of this connectivity.
  108. These satellites are in a high orbit, thousands of miles up. This means high latency. If you’ve used
  109. a consumer satellite product such as HughesNet or ViaSat, you’ll understand.</p>
  110. <p>From my berthing room at the South Pole, it was about <strong>750 milliseconds</strong>, round trip,
  111. for a packet to get to and from a terrestrial US destination.
  112. This is about <strong>ten times</strong> the latency of a round trip
  113. between the US East and West coasts (up to 75 ms).
  114. And it’s about <strong>thirty times</strong> the expected
  115. latency of a healthy connection from your home, on a terrestrial cable or fiber connection,
  116. to most major content delivery networks (up to 25 ms).</p>
  117. <p>Seriously, I can’t emphasize how jarring this is. At my apartment back home, on GPON fiber,
  118. it’s about 3 ms roundtrip to Fastly, Cloudflare, CloudFront, Akamai, and Google.
  119. At the South Pole, the latency was over <strong>two hundred and fifty times greater</strong>.</p>
  120. <p>I can’t go into more depth about how USAP does prioritization, shaping,
  121. etc, because I’m not authorized to share these details. Suffice to say, if you’re an enterprise network
  122. engineer used to working in a bandwidth-constrained environment, you’ll feel right at home with the
  123. equipment, tools, and techniques used to manage Antarctic connectivity.</p>
  124. <p>Any individual trying to use the Internet for community use at the South Pole, as of October 2023,
  125. likely faced:</p>
  126. <ul>
  127. <li>Round-trip latency averaging around 750 milliseconds, with jitter between
  128. packets sometimes exceeding several seconds.</li>
  129. <li>Available speeds, to the end-user device, that range from a couple kbps (yes, you read that right),
  130. up to 2 mbps on a <strong>really good</strong> day.</li>
  131. <li>Extreme congestion, queueing, and dropped packets, far in excess of even the worst oversaturated ISP links
  132. or bufferbloat-infested routers back home.</li>
  133. <li>Limited availability, frequent dropouts, and occasional service preemptions.</li>
  134. </ul>
  135. <p>These constraints <em>drastically</em> impact the modern web experience! Some of it is unavoidable. The link characteristics
  136. described above are truly bleak. But – a lot of the end-user
  137. impact is caused by web and app engineering which fails to take slow/intermittent links
  138. into consideration.</p>
  139. <p>If you’re an app developer reading this, can you tell me, off the top of your head, how your app behaves
  140. on a link with 40 kbps available bandwidth, 1,000 ms latency, occasional jitter of up to 2,000 ms,
  141. packet loss of 10%, and a complete 15-second connectivity dropout every few minutes?</p>
  142. <p>It’s probably not great! And yet – these are real-world performance parameters that I encountered,
  143. under certain conditions, at the South Pole.
  144. It’s normally better than this, but this does occur, and it occurs often enough
  145. that it’s worth taking seriously.</p>
  146. <p>This is what happens when you have a tiny pipe to share among high-priority
  147. operational needs, plus dozens of community users. Operational needs are aggressively prioritized,
  148. and the community soaks up whatever is left.</p>
  149. <p>I’m not expecting miracles here! Obviously no amount of client engineering can make, say, real-time video
  150. conferencing work under these conditions. But – getting a few bytes of text in and out <strong>should</strong> still be possible!
  151. I know it is possible, because some apps are still able to do it. Others are not.</p>
  152. <h2 id="detailed-real-world-example">Detailed, Real-world Example</h2>
  153. <p>One day at the South Pole, I was trying to load the website of <strong><em>&lt;$enterprise_collaboration_platform&gt;</em></strong>
  154. in my browser. It’s <em>huge</em>! It needed to load nearly 20 MB of Javascript, <em>just</em> to render the main screen!
  155. And of course, the app had been updated since last time I loaded it, so all of my browser’s cached assets were
  156. stale and had to be re-downloaded.</p>
  157. <p>Fine! It’s slow, but at least it will work… eventually, right? Browsers do a decent job of handling
  158. slow Internet. Under the hood, the underlying protocols do a decent job at congestion control.
  159. I should get a steady trickle of data. This will be
  160. subject to the negotiated send and receive windows between client and server,
  161. which are based on the current level of congestion on the link, and which are further influenced by any
  162. shaping done by middleware along the way.</p>
  163. <p>It’s a complex webapp, so the app developer would also need to implement some
  164. of their own retry logic. This allows for recovery in the event that individual assets fail,
  165. especially for those long, multi-second total connectivity dropouts.
  166. But eventually, given enough time, the transfers should complete.</p>
  167. <p>Unfortunately, this is where things broke down and got really annoying. <em>The developers implemented
  168. a global failure trigger somewhere in the app.</em>
  169. If the app didn’t fully load within the parameters specified by the developer
  170. (time? number of retries? I’m not sure.), then the app
  171. <strong>stopped, gave up, redirected you to an error page, dropped all the loading progress you’d made, and
  172. implemented aggressive cache-busting countermeasures for next time you retried.</strong></p>
  173. <p>
  174. <a href="/media/engineering-for-slow-internet/load-error-01.png">
  175. <picture>
  176. <source srcset="/media/engineering-for-slow-internet/load-error-01.png" type="image/png"></source>
  177. <img src="/media/engineering-for-slow-internet/load-error-01.png" alt="Load Success">
  178. </picture>
  179. <em>The app wasn't loading fast enough, and the developers decided that the app should give up instead
  180. of continuing to load slowly.</em>
  181. </a>
  182. </p>
  183. <p>I cannot tell you how frustrating this was! Connectivity at the South Pole was never going to meet the
  184. performance expectations set by engineers using a robust terrestrial Internet connection.
  185. It’s not a good idea to hardcode a single, static, global expectation for how long
  186. 20 MB of Javascript should take to download.
  187. Why not let me load it at my own pace? I’ll get there
  188. when I get there. <em>As long as data is still moving, however slow, just let it run.</em></p>
  189. <p>But – the developers decided that if the app didn’t load within the parameters they set,
  190. I couldn’t use it at all.
  191. And to be clear – this was primarily a <strong>messaging</strong> app. The actual content payload here, when
  192. the app is running and I’m chatting with my friends, is measured in <em>bytes</em>.</p>
  193. <p>As it turns out, our Internet performance at the South Pole was <em>right on the edge</em> of what the app
  194. developers considered “acceptable”. So, if I kept reloading the page, and if I kept
  195. letting it re-download the same 20 MB of Javascript, and if I kept putting up with the
  196. developer’s cache-busting shenanigans, <em>eventually</em> it finished before the artificial failure criteria.</p>
  197. <p>What this means is that I wasted <em>extra</em> bandwidth doing all these useless reloads, and it took sometimes
  198. <strong>hours</strong> before I was able to use the app. All of this hassle, even though, if left alone,
  199. I could complete the necessary data transfer in 15 minutes.
  200. Several hours (and a shameful amount of retried Javascript) later, I was finally able to send a short,
  201. text-based message to my friends.</p>
  202. <p>
  203. <a href="/media/engineering-for-slow-internet/load-success-01.png">
  204. <picture>
  205. <source srcset="/media/engineering-for-slow-internet/load-success-01.png" type="image/png"></source>
  206. <img src="/media/engineering-for-slow-internet/load-success-01.png" alt="Load Success">
  207. </picture>
  208. <em>A successful webapp load, after lots of retrying. 809 HTTP requests, 51.4 MB of data transfer,
  209. and 26.5 minutes of loading... </em>
  210. </a>
  211. </p>
  212. <p>
  213. <a href="/media/engineering-for-slow-internet/chat-success-01.png">
  214. <picture>
  215. <source srcset="/media/engineering-for-slow-internet/chat-success-01.png" type="image/png"></source>
  216. <img src="/media/engineering-for-slow-internet/chat-success-01.png" alt="Chat Success">
  217. </picture>
  218. <em>...all so that I could send a 1.8 KB HTTPS POST...</em>
  219. </a>
  220. </p>
  221. <p>
  222. <a href="/media/engineering-for-slow-internet/chat-content-01.png">
  223. <picture>
  224. <source srcset="/media/engineering-for-slow-internet/chat-content-01.png" type="image/png"></source>
  225. <img src="/media/engineering-for-slow-internet/chat-content-01.png" alt="Chat Content">
  226. </picture>
  227. <em>...containing a 6-byte message.</em>
  228. </a>
  229. </p>
  230. <p>Does this webapp <strong>really need</strong> to be 20 MB? What all is
  231. being loaded that could be deferred until it is needed, or included in an “optional” add-on bundle?
  232. Is there a possibility of a “lite” version, for bandwidth-constrained users?</p>
  233. <p>In my 14 months in Antarctica, I collected <strong>dozens</strong> of examples of apps like this, with artificial
  234. constraints built in that rendered them unusable or borderline-unusable.</p>
  235. <p>For the rest of this post, I’ll outline some of my major frustrations, and what I would have liked
  236. to see instead that would mitigate the issues.</p>
  237. <p>I understand that not every app is in a position to implement all of these! If you’re
  238. a tiny app, just getting off the ground, I don’t expect you to spend all of your development time optimizing
  239. for weirdos in Antarctica.</p>
  240. <p>Yes, Antarctica is an edge case! Yes, 750 ms / 10% packet loss / 40 kbps <strong>is</strong> rather extreme.
  241. But the South Pole was not <strong>uniquely</strong> bad. There are entire commercial marine vessels that rely on older
  242. <a href="https://www.inmarsat.com/">Inmarsat</a> solutions for a few hundred precious kbps of data while at sea.
  243. There’s someone at a remote research site deep in the mountains right now, trying to load your app on
  244. a <a href="https://www.iridium.com/products/thales-missionlink-700/">Thales MissionLink</a> using the Iridium
  245. Certus network at a few dozen kbps.
  246. There are folks behind misconfigured routers, folks with flaky
  247. wifi, folks stuck with fly-by-night WISPs delivering sub-par service. Folks who still use dial-up
  248. Internet connections over degraded copper phone lines.</p>
  249. <p>These folks are worthy of your consideration. At the very least, you should make an effort to avoid
  250. <strong>actively interfering</strong> with their ability to use your products.</p>
  251. <p>So, without further ado, here are some examples of development patterns that routinely caused me grief at
  252. the South Pole.</p>
  253. <h2 id="hardcoded-timeouts-hardcoded-chunk-size">Hardcoded Timeouts, Hardcoded Chunk Size</h2>
  254. <p>As per the above example, <strong>do not hardcode your assumptions about how long a given payload will take to
  255. transfer, or how much you can transfer in a single request.</strong></p>
  256. <ol>
  257. <li>If you have the ability to measure whether bytes are flowing, and they are, <strong>leave them alone</strong>, no
  258. matter how slow. Perhaps show some UI indicating what is happening.</li>
  259. <li>If you are doing an HTTPS call,
  260. fall back to a longer timeout if the call fails. Maybe it just needs more time under current network
  261. conditions.</li>
  262. <li>If you’re having trouble moving large amounts of data in a single HTTPS call, break it up. Divide the
  263. content into chunks, transfer small chunks at a time, and <strong>diligently keep track of the progress</strong>, to
  264. allow resuming and retrying small bits without losing all progress so far. Slow, steady,
  265. incremental progress is better than a one-shot attempt to transfer a huge amount of data.</li>
  266. <li>If you can’t get an HTTPS call done successfully, do some troubleshooting. Try DNS, ICMP,
  267. HTTP (without TLS), HTTPS to a known good status endpoint, etc. This information might be helpful
  268. for troubleshooting, and it’s better than blindly retrying the same end-to-end HTTPS call.
  269. This HTTPS call requires a bunch of under-the-hood stuff to be working properly. Clearly it’s not,
  270. so you should make an effort to figure out why and let your user know.</li>
  271. </ol>
  272. <p>A popular desktop application tries to download some configuration information from the vendor’s website
  273. at startup. There is a hardcoded timeout for the HTTPS call. <strong>If it fails, the app will not load.</strong> It’ll
  274. just keep retrying the same call, with the same parameters, forever. It’ll sit on the loading page,
  275. without telling you what’s wrong. I’ve confirmed this is what’s happening by reading the logs.</p>
  276. <p>
  277. <a href="/media/engineering-for-slow-internet/hardcoded-timeout-02.png">
  278. <picture>
  279. <source srcset="/media/engineering-for-slow-internet/hardcoded-timeout-02.png" type="image/png"></source>
  280. <img src="/media/engineering-for-slow-internet/hardcoded-timeout-02.png" alt="Hardcoded Timeout 02">
  281. </picture>
  282. <em>Excerpt from debug log for a commercial desktop application, showing a request timing out
  283. after 15 seconds.</em>
  284. </a>
  285. </p>
  286. <p>Luckily, if you kept trying, the call would eventually make it through under network conditions I
  287. experienced at the South Pole.</p>
  288. <p>It’s frustrating
  289. that just a single hardcoded timeout value, in an otherwise perfectly-functional and enterprise-grade
  290. application, can render it almost unusable. The developers could have:</p>
  291. <ol>
  292. <li>Fallen back to increasingly-long timeouts to try and get a successful result.</li>
  293. <li>Done some connection troubleshooting to infer more about the current network environment, and
  294. responded accordingly.</li>
  295. <li>Shown UX explaining what was going on.</li>
  296. <li>Used a cached or default-value configuration, if it couldn’t get the live one, instead of simply
  297. refusing to load.</li>
  298. <li>Provided a mechanism for the user to manually download and install the required data, bypassing the
  299. app’s built-in (and naive) download logic.</li>
  300. </ol>
  301. <h3 id="example-2---chat-apps">Example 2 - Chat Apps</h3>
  302. <p>A popular chat app (“app #1”) maintains a websocket for sending and receiving data.
  303. The initialization process for that websocket uses a <strong>hardcoded 10-second timeout</strong>.
  304. Upon cold boot, when network conditions are especially congested, that websocket setup can sometimes take
  305. more than 10 seconds! We have to do a full TCP handshake, then set up a TLS session, then set up the
  306. websocket, then do initial signaling over the websocket.
  307. Remember – under some conditions, each individual roundtrip at the South Pole took multiple seconds!</p>
  308. <p>If the 10-second timeout elapses, the app simply does not work. It enters a very long backoff state
  309. before retrying. The UX does not clearly show what is happening.</p>
  310. <p>
  311. <a href="/media/engineering-for-slow-internet/hardcoded-timeout-01.png">
  312. <picture>
  313. <source srcset="/media/engineering-for-slow-internet/hardcoded-timeout-01.png" type="image/png"></source>
  314. <img src="/media/engineering-for-slow-internet/hardcoded-timeout-01.png" alt="Hardcoded Timeout 01">
  315. </picture>
  316. <em>Excerpt from debug log for chat app #1, showing the hardcoded 10-second timeout. We did indeed
  317. have Internet access at this time -- it was just too congested to complete an entire TCP handshake +
  318. TLS negotiation + websocket setup + request within this timeframe.
  319. With a few more seconds, it may have finished.</em>
  320. </a>
  321. </p>
  322. <p>On the other hand, a competitor’s chat app (“app #2”) does <em>very well</em> in extremely degraded network
  323. conditions! It has multiple strategies for sending network requests, for resilience against
  324. certain types of degradation.
  325. It aggressively re-uses open connections. It dynamically adjusts timeouts.
  326. In the event of a failure, it intelligently chooses a retry cadence. And, throughout all of this,
  327. it has clear UX explaining current network state.</p>
  328. <p>The end result is that I could often use app #2 in network conditions when I could not use app #1. Both of
  329. them were just transmitting plain text! Just a few actual bytes of
  330. content! And even when I could not use app #2, it was at least telling
  331. me what it was trying to do. App #1 is written naively, with baked-in assumptions
  332. about connectivity that simply did not hold true at the South Pole.
  333. App #2 is written well, and it responds gracefully to the conditions it encounters in the wild.</p>
  334. <h3 id="example-3---incremental-transfer">Example 3 - Incremental Transfer</h3>
  335. <p>A chance to talk about my own blog publishing toolchain!</p>
  336. <p>The site you’re reading right now is a static Jekyll blog.
  337. Assets are stored on S3 and served through CloudFront. I build the static files locally here on my laptop,
  338. and I upload them directly to S3. Nothing fancy. No servers, no QA environment, no build system,
  339. no automated hooks, nothing dynamic.</p>
  340. <p>Given the extreme connectivity constraints at the South Pole,
  341. I wrote a Python script for publishing to S3 that
  342. worked well in the challenging environment. It uses the S3 API to upload assets in small chunks.
  343. It detects and resumes failed uploads without losing progress. It waits until everything is safely uploaded
  344. before publishing the new version.</p>
  345. <p>If I can do it, unpaid, working alone, for my silly little hobby blog, in 200 lines of Python…
  346. surely your team of engineers can do so for your flagship webapp.</p>
  347. <p>It’s amazing the usability improvements that come along with some proactive engineering. I had friends at Pole
  348. with blogs on commercial platforms, and who routinely shared large files to social
  349. media sites. They had
  350. to carefully time their day to maximize the likelihood of a successful “one-shot” upload, during a
  351. satellite window, using their platform’s poorly-engineered publishing tools.
  352. Often it took several retries, and it’s not always clear what was happening at every step of the process
  353. (Is the content live? Did the upload finish? Is it safe / should I hit “Post” again?).</p>
  354. <p>Meanwhile, I was able to harvest whatever connectivity I could find.
  355. I got a few kilobytes uploaded here and there,
  356. whenever it was convenient. If a particular chunked POST failed, no worries!
  357. I could retry or resume, with minimal lost progress, at a later time.
  358. Once it was all done and staged, I could safely publish the new version.</p>
  359. <p>
  360. <a href="/media/engineering-for-slow-internet/upload-process-01.png">
  361. <picture>
  362. <source srcset="/media/engineering-for-slow-internet/upload-process-01.png" type="image/png"></source>
  363. <img src="/media/engineering-for-slow-internet/upload-process-01.png" alt="Upload Process">
  364. </picture>
  365. <em>My custom publishing script for this blog, to handle intermittent and unreliable Internet.</em>
  366. </a>
  367. </p>
  368. <h2 id="bring-your-own-download">Bring-Your-Own-Download</h2>
  369. <p><strong>If you’re going to build in a downloader into your app, you have a high bar for quality that you have
  370. to meet.</strong> Otherwise, it’s going to fail in profoundly annoying or catastrophic ways.</p>
  371. <p>If I had to give one piece of advice:
  372. <strong><em>Let the user break out of your in-app downloader and use their own, if at all possible.</em></strong></p>
  373. <p>Provide a manual download link, ideally one that leads to whatever differential patch file the app was going
  374. to download. Don’t punish the user by making them download the full installer, just because your in-app patch
  375. downloader doesn’t meet their needs.</p>
  376. <p>This has the following benefits:</p>
  377. <ol>
  378. <li>If your downloader fails, the user can still get the file manually, using a more robust downloader
  379. of their choice, such as a web browser.</li>
  380. <li>The user can download the file one time, and share it with multiple devices.</li>
  381. <li>The user can download the file on a different computer than the one running the application.</li>
  382. <li>The user has the flexibility to schedule or manage the download based on whatever constraints they face.</li>
  383. </ol>
  384. <p><strong>Why is this all so important when considering users at the South Pole?</strong></p>
  385. <p>Downloads <strong>are</strong> possible at the South Pole, but they are subject to unique constraints. The biggest
  386. constraint is the lack of 24x7 Internet. While I was there, I <em>knew</em> we would lose Internet access at a
  387. certain time!</p>
  388. <p>It’s a frustrating reality: with most apps that do their own downloads, we were powerless to do
  389. anything about this known break in connectivity. We just had to sit there and watch it fail, and
  390. often watch all our progress be lost.</p>
  391. <p>Let’s say I had a 4-hour window, every day, during which I could do (very slow!!) downloads. If the total
  392. amount of data I could download in those 4 hours was <strong>less</strong> than the total size of the payload
  393. I was downloading, then there is <em>no way</em> I could complete the download in one shot!
  394. I’d <em>have</em> to split it over multiple Internet windows. Often the app wouldn’t let me do so.</p>
  395. <p>And that’s not even considering the fact that access might be unreliable during that time! What if the
  396. underlying connection dropped and I had to resume the download? What if my plans changed and I needed
  397. to pause? I didn’t want to waste whatever precious little progress I’d made so far.</p>
  398. <p>A lot of modern apps include their own homegrown, artisanal, in-app downloaders for large payloads.
  399. By “in-app downloader”, I’m referring to the system that obtains the content for automatic updates,
  400. patches, content database updates, etc. The common theme here is
  401. that the app transparently downloads content for you, without you being exposed to the underlying
  402. details such as the URL or raw file. This includes UI patterns such as <em>Check for updates</em>,
  403. <em>Click here to download new version</em>, etc.</p>
  404. <p>
  405. <a href="/media/engineering-for-slow-internet/in-app-downloader-01.png">
  406. <picture>
  407. <source srcset="/media/engineering-for-slow-internet/in-app-downloader-01.png" type="image/png"></source>
  408. <img src="/media/engineering-for-slow-internet/in-app-downloader-01.png" alt="In-App Downloader 01">
  409. </picture>
  410. <em>An in-app download notification for a popular chat application. This app apparently wants to
  411. download 83 MB of data in the background! This is tough at the South Pole. Will the UI be
  412. accommodating of the unique constraints at Pole?</em>
  413. </a>
  414. </p>
  415. <p>Unfortunately, most of these in-app downloaders are woefully ill-equipped for the task! Many of them lack
  416. pause/resume functionality, state notifications, retry logic, and progress tracking. Many of them have
  417. frustrating restrictions, such as time limits for downloading the payload. While most of these issues
  418. are mere annoyances in the land of fast Internet, at the South Pole, they can make or break
  419. the app entirely.</p>
  420. <p>
  421. <a href="/media/engineering-for-slow-internet/in-app-downloader-progress-01.png">
  422. <picture>
  423. <source srcset="/media/engineering-for-slow-internet/in-app-downloader-progress-01.png" type="image/png"></source>
  424. <img src="/media/engineering-for-slow-internet/in-app-downloader-progress-01.png" alt="In-App Downloader Progress 01">
  425. </picture>
  426. <em>Unfortunately, that's a resounding "no". There's no speed indication, no ETA, no pause button,
  427. no cancel button, no URL indication (so we can download manually), and no way to get at the underlying file.</em>
  428. </a>
  429. </p>
  430. <p>It was always frustrating to face down one of these interfaces, because I knew how much time,
  431. and data transfer, was going to be wasted.</p>
  432. <p>
  433. <a href="/media/engineering-for-slow-internet/in-app-downloader-failure-01.png">
  434. <picture>
  435. <source srcset="/media/engineering-for-slow-internet/in-app-downloader-failure-01.png" type="image/png"></source>
  436. <img src="/media/engineering-for-slow-internet/in-app-downloader-failure-01.png" alt="In-App Downloader Failure 01">
  437. </picture>
  438. <em>Darn, it failed! This was expected -- an uninterrupted 83 MB download is tough.
  439. Unfortunately, all progress has been lost, and now it's not even offering a patch on the
  440. retry -- the download size has ballooned to 133 MB, the size of the full installer.</em>
  441. </a>
  442. </p>
  443. <p>Every app that includes an in-app downloader has to compete with an extraordinarily high bar for usability:
  444. <strong>web browsers</strong>.</p>
  445. <p>Think about it! Every modern web browser includes a download manager that contains
  446. Abort, Pause, and Resume functionality. It allows you to retry failed
  447. downloads (assuming the content URL doesn’t include an expiring token).
  448. It clearly shows you current status, download speed, and estimated time remaining. It allows you to choose
  449. where you save the underlying file, so you can copy it around if needed. And – it doesn’t include arbitrary
  450. performance cutoffs! If you really want to download a multi-gigabyte file at 60 kbps, go for it!</p>
  451. <p>
  452. <a href="/media/engineering-for-slow-internet/browser-download-progress-01.png">
  453. <picture>
  454. <source srcset="/media/engineering-for-slow-internet/browser-download-progress-01.png" type="image/png"></source>
  455. <img src="/media/engineering-for-slow-internet/browser-download-progress-01.png" alt="Browser Download Progress 01">
  456. </picture>
  457. <em>A fully-featured download experience, from a major web browser. Downloading an app installer from
  458. the vendor's website. Note the status, speed, estimated
  459. time remaining, full URL, and pause / cancel buttons.</em>
  460. </a>
  461. </p>
  462. <p>
  463. <a href="/media/engineering-for-slow-internet/browser-partial-file-01.png">
  464. <picture>
  465. <source srcset="/media/engineering-for-slow-internet/browser-partial-file-01.png" type="image/png"></source>
  466. <img src="/media/engineering-for-slow-internet/browser-partial-file-01.png" alt="Browser Partial File 01">
  467. </picture>
  468. <em>A partially-downloaded file, for the above-mentioned download.</em>
  469. </a>
  470. </p>
  471. <p>Here are a few more examples of where in-app downloaders caused us grief.</p>
  472. <h3 id="example-1---macos-updates">Example 1 - macOS Updates</h3>
  473. <p>It’s no secret that macOS updates are huge. This is sometimes even annoying back home, and it was
  474. much worse at the South Pole.</p>
  475. <p>The patch size for minor OS updates is usually between 0.5 and 1.5 gigabytes. Major OS upgrade patches are
  476. sometimes 6+ gigabytes. Additional tools, such as Xcode, are often multiple gigabytes.</p>
  477. <p>
  478. <a href="/media/engineering-for-slow-internet/macos-update-prompt-01.png">
  479. <picture>
  480. <source srcset="/media/engineering-for-slow-internet/macos-update-prompt-01.png" type="image/png"></source>
  481. <img src="/media/engineering-for-slow-internet/macos-update-prompt-01.png" alt="macOS Update Prompt 01">
  482. </picture>
  483. <em>Sigh, yet another 1 GB patch for my personal macOS device at the South Pole.</em>
  484. </a>
  485. </p>
  486. <p>If every single macOS device at the South Pole downloaded these updates, directly from Apple, we would
  487. have wasted a tremendous amount of bandwidth. And the built-in macOS downloader certainly wanted
  488. us to do this! Look at this interface – few controls, no way to break out and easily get
  489. the underlying patch files. If I canceled the download, or if it failed for some reason, it didn’t always
  490. intelligently resume. Sometimes, I lost all my progress.</p>
  491. <p>
  492. <a href="/media/engineering-for-slow-internet/apple-download-progress-01.png">
  493. <picture>
  494. <source srcset="/media/engineering-for-slow-internet/apple-download-progress-01.png" type="image/png"></source>
  495. <img src="/media/engineering-for-slow-internet/apple-download-progress-01.png" alt="Apple Download Progress 01">
  496. </picture>
  497. <em>The macOS updater. No pause button! Was I expected to leave my laptop on, connected to the Internet,
  498. and untouched, for 15 days??</em>
  499. </a>
  500. </p>
  501. <p>Now – Apple <em>does</em> have a caching server feature built into macOS. In theory, this should alleviate some
  502. of the burden! We should be able to leverage this feature to ensure each patch is only downloaded to
  503. the South Pole one time, and then client Macs will hit the cache.</p>
  504. <p>I experimented with this feature in my spare time, with a handful of my own Apple devices.
  505. In practice, this feature still
  506. required each client Macbook to make a successful HTTPS call directly to Apple, to negotiate cache parameters.
  507. If this call failed, which it often did (<em>because of hardcoded short timeouts!!!</em>), then
  508. the client Mac just fetched the patch from public Apple servers. No retry, no notification. The client
  509. Mac just made a unilateral decision to bypass the cache, without any recourse or even a notification for
  510. the user. In practice, this initial cache negotiation call failed often enough
  511. at the South Pole that the caching feature wasn’t useful.</p>
  512. <p>What we <em>could</em> do was to fetch the full installer (12 gigabytes!) from Apple. Links to the full installer
  513. packages are conveniently aggregated on the
  514. <a href="https://mrmacintosh.com/macos-ventura-13-full-installer-database-download-directly-from-apple/">Mr. Macintosh</a>
  515. blog. We could pull the full installer down to the South Pole slowly
  516. and conscientiously: throttled, at low,
  517. background priority, using robust, interrupt-tolerant
  518. tooling, with support for caching and resumption of paused or failed transfers.
  519. Once we had the file, we could distribute it on station.
  520. This process could take several days, but it was reliable.</p>
  521. <p>
  522. <a href="/media/engineering-for-slow-internet/macos-installer-01.png">
  523. <picture>
  524. <source srcset="/media/engineering-for-slow-internet/macos-installer-01.png" type="image/png"></source>
  525. <img src="/media/engineering-for-slow-internet/macos-installer-01.png" alt="macOS Installer 01">
  526. </picture>
  527. <em>The macOS full installer, painstakingly and conscientiously downloaded to the South Pole.</em>
  528. </a>
  529. </p>
  530. <p>But <em>even this</em> didn’t solve the problem! If the client Mac is Apple Silicon, it <em>still insisted</em> on
  531. downloading additional content directly from Apple, <em>even if</em> you ran the update using the full, 12 GB
  532. installer.
  533. There is no way to bypass or cache this. If the OS update required certain types of
  534. firmware updates or Rosetta updates, then <em>every Apple Silicon client Mac</em> would <em>still</em>
  535. download 1-2 GB of data directly from Apple during the install process.</p>
  536. <p>Even worse, the download process was sometimes farmed out to a separate component in
  537. macOS, which didn’t even report progress to the installer! Installing a macOS update at
  538. the South Pole meant staring at a window that said “installing, 32 minutes remaining”,
  539. for <em>several hours</em>, while a subcomponent of macOS downloaded a gigabyte of un-cacheable
  540. data in the background.</p>
  541. <p>Apple naively assumed that the 1 GB download would be so fast that they didn’t bother
  542. incorporating download speed feedback into the updater’s time estimate.
  543. They did not anticipate people installing
  544. macOS updates from a location where a gigabyte of downloads can take <strong>several hours</strong>, if not <strong>days</strong>.</p>
  545. <p>You can’t cache it, and you can’t download it directly using a web browser (or other mechanism). You have
  546. to let Apple’s downloader do it directly. And, of course, there’s no pause button. It is a major
  547. inconvenience to users, and a major waste of bandwidth, for each individual client Mac to
  548. download 1-2 GB of data in a single, uninterrupted shot.</p>
  549. <p><strong>Ways that Apple could make this significantly better for users with slow or otherwise-weird Internet:</strong></p>
  550. <ol>
  551. <li>Compute the required patch, and then give us a download link, so we can download it outside of
  552. Apple’s downloader.</li>
  553. <li>Improve the built-in update download tool with pause/resume functionality and intelligent state
  554. management, to ensure progress isn’t lost.</li>
  555. <li>Fix the full installer, so it includes <em>everything</em>, including all the currently-excluded items
  556. such as firmware and Rosetta updates for Apple Silicon Macs.
  557. It would be much more useful if it included everything. I
  558. could download it once, and then distribute it, without worrying about each Mac <em>still</em> needing to fetch
  559. additional data from Apple.</li>
  560. <li>Improve the Apple Caching Server feature, so it’s more reliable in situations where direct Internet
  561. access is unreliable. Give us more controls so that we can force a Mac to use it, and so that we can force
  562. the caching server to proactively download an item that we know will be needed in the future.</li>
  563. </ol>
  564. <p>As it stands, it was a huge hassle for me to help people with macOS updates at the South Pole.</p>
  565. <h2 id="example-2---samsung-android-phone-os-updates">Example 2 - Samsung Android Phone OS Updates</h2>
  566. <p>My Samsung Android phone receives periodic OS updates. These updates include relevant Android
  567. patches, as well as updates to the Samsung UI and other OS components.</p>
  568. <p>The updater is a particularly bad example of an app that fails to consider slow / intermittent Internet
  569. use cases.</p>
  570. <p>
  571. <a href="/media/engineering-for-slow-internet/phone-download-progress-01.png">
  572. <picture>
  573. <source srcset="/media/engineering-for-slow-internet/phone-download-progress-01.png" type="image/png"></source>
  574. <img src="/media/engineering-for-slow-internet/phone-download-progress-01.png" alt="Phone Download Progress 01">
  575. </picture>
  576. <em>Downloading an OS update for my Samsung Android phone at the South Pole.</em>
  577. </a>
  578. </p>
  579. <p>First, the basics. There is no speed indicator, no numeric progress indicator (good luck counting pixels on
  580. the moving bar), no pause button, no cancel button, no indicator of the file size, and no way to get at
  581. the underlying file to download separately.</p>
  582. <p>Second – if the download fails, it cannot be resumed. It will restart from the beginning.</p>
  583. <p>In practice, at the South Pole, the phone could not download an entire OS update on a single satellite
  584. pass. So – it inevitably failed as soon as connectivity dropped, and I had to restart it from the
  585. beginning.</p>
  586. <p>The <strong>only way</strong> I was able to get this done was by <strong>turning off the phone entirely</strong>, right before
  587. Internet access dropped, and then turning it back on when Internet access resumed at the next satellite pass.
  588. This tricked the phone into not giving up on the download, because it was totally off during the period
  589. without Internet. It never had a chance to fail.
  590. By doing this, I was able to spread out the download across multiple satellite passes, and I could complete
  591. the download.</p>
  592. <p>This is an absurd workaround! I should not have had to do this.</p>
  593. <p>My US carrier (Verizon) does offer a downloadable application for macOS and Windows which should, in theory,
  594. allow me to flash the OS updates from my computer, instead of relying on the phone to download the patches.
  595. In practice, the Verizon app is even worse. Buggy, unreliable, and also insists on using its own in-app
  596. downloader to fetch the update files (sigh…).</p>
  597. <p>I’m sure there’s a way I could have gotten the images and flashed them manually.
  598. This is not an invitation for a bunch
  599. of Android enthusiasts to email me and explain bootloaders and APKs and ROMs and sideloading and whatever else is
  600. involved here.
  601. That’s not the point. The point is – the mainstream tools that vendors ship are <em>hopelessly deficient</em>
  602. for users on slow Internet, and that’s a bummer.</p>
  603. <h2 id="example-3---small-app-auto-updater">Example 3 - Small App Auto-Updater</h2>
  604. <p>A small desktop app has an in-app downloader for updates. Can you spot the issues?</p>
  605. <p>
  606. <a href="/media/engineering-for-slow-internet/in-app-downloader-02.png">
  607. <picture>
  608. <source srcset="/media/engineering-for-slow-internet/in-app-downloader-02.png" type="image/png"></source>
  609. <img src="/media/engineering-for-slow-internet/in-app-downloader-02.png" alt="In-App Downloader 02">
  610. </picture>
  611. <em>Downloading an in-app update.</em>
  612. </a>
  613. </p>
  614. <p>Let’s count them:</p>
  615. <ol>
  616. <li>No pause button.</li>
  617. <li>No cancel button.</li>
  618. <li>No progress indicator of any kind.</li>
  619. <li>No speed / time remaining indicator.</li>
  620. <li>No way to get at the underlying URL, so I can use my own downloader.</li>
  621. <li>No progress tracking and no graceful resumption of an interrupted download.</li>
  622. </ol>
  623. <p>This is actually one of my favorite desktop apps! It’s a shame to call them out like this.
  624. A quick, easy way to make this MUCH better for users at the South Pole would be to provide a manual download
  625. link. Then, the developers wouldn’t need to reimplement all the nice download features that my
  626. browser provides. I could just use my browser.</p>
  627. <h2 id="example-4---yet-another-app-auto-updater">Example 4 - Yet Another App Auto-Updater</h2>
  628. <p>Here’s another one!</p>
  629. <p>
  630. <a href="/media/engineering-for-slow-internet/in-app-downloader-03.png">
  631. <picture>
  632. <source srcset="/media/engineering-for-slow-internet/in-app-downloader-03.png" type="image/png"></source>
  633. <img src="/media/engineering-for-slow-internet/in-app-downloader-03.png" alt="In-App Downloader 03">
  634. </picture>
  635. <em>Downloading another in-app update.</em>
  636. </a>
  637. </p>
  638. <p>Let’s count the issues:</p>
  639. <ol>
  640. <li>No pause button.</li>
  641. <li>No numeric progress / speed indicator.</li>
  642. <li>No way to get at the underlying URL, so I can use my own downloader.</li>
  643. <li>No progress tracking and no graceful resumption of an interrupted download.</li>
  644. </ol>
  645. <p>It does have a few things going for it:</p>
  646. <ol>
  647. <li>Cancel button.</li>
  648. <li>Visual progress indicator.</li>
  649. </ol>
  650. <p>But – overall, still a frustrating user experience for users with slow or intermittent Internet access.</p>
  651. <h2 id="example-5---microsoft-office-for-mac">Example 5 - Microsoft Office for Mac</h2>
  652. <p>Credit where credit is due – Microsoft has a GREAT auto-updater built into Office for Mac! Check it out:</p>
  653. <p>
  654. <a href="/media/engineering-for-slow-internet/microsoft-autoupdate-01.png">
  655. <picture>
  656. <source srcset="/media/engineering-for-slow-internet/microsoft-autoupdate-01.png" type="image/png"></source>
  657. <img src="/media/engineering-for-slow-internet/microsoft-autoupdate-01.png" alt="Microsoft Autoupdate 01">
  658. </picture>
  659. <em>Downloading Office for macOS updates at the South Pole.</em>
  660. </a>
  661. </p>
  662. <p>Look at all these nice features!</p>
  663. <ol>
  664. <li>Pause button!</li>
  665. <li>Cancel buttons!</li>
  666. <li>Progress indicator!</li>
  667. <li>Speed and time remaining indicators!</li>
  668. <li>Graceful resumption of interrupted downloads!</li>
  669. </ol>
  670. <p>The only thing that could have made this better is a link to get at the underlying URL, so I could use
  671. my own downloader. But, given how good this interface is, I didn’t mind using it, even at
  672. the South Pole.</p>
  673. <h1 id="conclusion">Conclusion</h1>
  674. <p>I hope the examples I’ve shown in this post have been a helpful illustration of how minor oversights
  675. or under-developed features back home can become <strong>major issues</strong> in a place with slow Internet.</p>
  676. <p>Again, I’m not asking that every app developer spend a huge amount of time optimizing for edge cases like
  677. the South Pole.</p>
  678. <p>And I’m also definitely not asking for people to work miracles. Internet access at the South Pole, as of
  679. October 2023, was <strong><em>slow</em></strong>. I don’t expect immersive interactive streaming media to work
  680. under the conditions I described here, but it would be nice if apps were resilient enough to
  681. get a few bytes of text up or down. Unfortunately, what
  682. often ended up happening is that apps got stuck in a loop because of an ill-advised hardcoded timeout.</p>
  683. <p>I hope everyone found this helpful, or at least interesting.</p>
  684. <p>And thank you again to everyone who followed along with me on my Antarctic journey! I’ve been off-ice for about
  685. six months now, and going through my old posts here have brought back fond memories.</p>
  686. <p>I hope the current winter-over
  687. crew is doing well, and that everyone is enjoying the <a href="/posts/polar-night">Polar Night</a>. If the egg supply and
  688. consumption rate is the same as it was during Winter 2023, they should soon be
  689. finishing up <a href="/posts/the-last-egg">The Last Egg</a>.</p>
  690. <p>I won’t promise any more content, but I do have a handful of other half-finished posts sitting in my
  691. drafts. We’ll see!</p>