A place to cache linked articles (think custom and personal wayback machine)
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

index.md 11KB

4 years ago

  1. title: In Pursuit of Production Minimalism
  2. url: https://brandur.org/minimalism
  3. hash_url: 232f1334b14f4f17bac10b5c8c3fb555
  4. <p>While working at Lockheed during the cold war, Kelly
  5. Johnson was reported to have coined <a href="https://en.wikipedia.org/wiki/KISS_principle">KISS</a> (“keep it
  6. simple, stupid”); a principle that suggests glibly that
  7. systems should be designed to be as simple as possible.</p>
  8. <p>While complexity is never a conscious design goal of any
  9. project, it arises inherently as new features are pursued
  10. or new components are introduced. KISS encourages designers
  11. to actively counteract this force by making simplicity an
  12. objective in itself, and thus produce products that are
  13. more maintainable, more reliable, and more flexible. In the
  14. case of jet fighters, that might mean a plane that can be
  15. repaired in the field with few tools and under the
  16. stressful conditions of combat.</p>
  17. <p>During his tenure, Lockheed’s Skunk Works would produce
  18. planes like the U-2 and SR-71; so notable for their
  19. engineering excellence that they’ve left a legacy that we
  20. reflect on even today.</p>
  21. <figure>
  22. <p><a href="/assets/minimalism/sr71@2x.jpg"><img src="/assets/minimalism/sr71.jpg" srcset="/assets/minimalism/sr71@2x.jpg 2x, /assets/minimalism/sr71.jpg 1x" class="overflowing"/></a></p>
  23. <figcaption>The famous SR-71, one of the flag ships of Lockheed's Skunk Works. Very fast even if not particularly simple.</figcaption>
  24. </figure>
  25. <p>Many of us pursue work in the engineering field because
  26. we’re intellectually curious. Technology is cool, and new
  27. technology is even better. We want to be using what
  28. everyone’s talking about.</p>
  29. <p>Our news sources, meetups, conferences, and even
  30. conversations bias towards shiny new tech that’s either
  31. under active development or being energetically promoted.
  32. Older components that sit quietly and do their job well
  33. disappear into the background.</p>
  34. <p>Over time, technologies are added, but are rarely removed.
  35. Left unchecked, production stacks that have been around
  36. long enough become sprawling patchworks combining
  37. everything under the sun. This effect is dangerous:</p>
  38. <ul>
  39. <li><p>More parts means more cognitive complexity. If a system
  40. becomes too difficult to understand then the risk of bugs
  41. or operational mishaps increases as developers make
  42. changes without understanding all the intertwined
  43. concerns.</p></li>
  44. <li><p>Nothing operates flawlessly once it hits production.
  45. Every component in the stack is a candidate for failure,
  46. and with sufficient scale, <em>something</em> will be failing all
  47. the time.</p></li>
  48. <li><p>With more technologies engineers will tend to become
  49. jacks of all trades, but masters of none. If a
  50. particularly nefarious problem comes along, it may be
  51. harder to diagnose and repair because there are few
  52. specialists around who are able to dig deeply.</p></li>
  53. </ul>
  54. <p>Even knowing this, the instinct to expand our tools is hard
  55. to suppress. Oftentimes persuasion is a core competency of
  56. our jobs, and we can use that same power to convince
  57. ourselves and our peers that it’s critical to get new
  58. technologies into our stack <em>right now</em>. That Go-based HA
  59. key/value store will take our uptime and fault resilience
  60. to new highs. That real-time event stream will enable an
  61. immutable ledger that will become foundational keystone for
  62. the entire platform. That sexy new container orchestration
  63. system that will take ease of deployment and scaling to new
  64. levels. In many cases, a step back and a moment of
  65. dispassionate thought would reveal that their use could be
  66. withheld until a time when they’re known to be well vetted,
  67. and it’s well understood how they’ll fit into the current
  68. architecture (and what they’ll replace).</p>
  69. <p>In his book <em>Nine Chains to the Moon</em> (published 1938),
  70. inventor R. Buckminster Fuller described the idea of
  71. <strong><em>ephemeralization</em></strong>:</p>
  72. <blockquote>
  73. <p>Do more and more with less and less until eventually you
  74. can do everything with nothing.</p>
  75. </blockquote>
  76. <p>It suggests improving increasing productive output by
  77. continually improving the efficiency of a system even while
  78. keeping input the same. I project this onto technology to
  79. mean building a stack that scales to more users and more
  80. activity while the people and infrastructure supporting it
  81. stay fixed. This is accomplished by building systems that
  82. are more robust, more automatic, and less prone to problems
  83. because the tendency to grow in complexity that’s inherent
  84. to them has been understood, harnessed, and reversed.</p>
  85. <p>For a long time we had a very big and very aspirational
  86. goal of ephemeralization at Heroku. The normal app platform
  87. that we all know was referred to as “user space” while the
  88. internal infrastructure that supported it was called
  89. “kernel space”. We want to break up the kernel in the
  90. kernel and move it piece by piece to run inside the user
  91. space that it supported, in effect rebuilding Heroku so
  92. that it itself ran <em>on Heroku</em>. In the ultimate
  93. manifestation of ephemeralization, the kernel would
  94. diminish in size until it vanished completely. The
  95. specialized components that it contained would be retired,
  96. and we’d be left a single perfectly uniform stack.</p>
  97. <p>Realistic? Probably not. Useful? Yes. Even falling short of
  98. an incredibly ambitious goal tends to leave you somewhere
  99. good.</p>
  100. <p>Here are a few examples of minimalism and ephemeralization
  101. in practice from Heroku’s history:</p>
  102. <ul>
  103. <li><p>The core database that tracked all apps, users, releases,
  104. configuration, etc. used to be its own special snowflake
  105. hosted on a custom-built AWS instance. It was eventually
  106. folded into Heroku Postgres, and became just one more
  107. node to be managed along with every other customer DB.</p></li>
  108. <li><p>Entire products were retired where possible. For example,
  109. the <code>ssl:ip</code> add-on (providing SSL/TLS terminate for an
  110. app), which used to be provisioned and run on its own
  111. dedicated servers, was end-of-lifed completely when a
  112. better (and cheaper) option for terminating SSL was
  113. available through Amazon. With SNI support now
  114. widespread, <code>ssl:endpoint</code> will eventually follow suit.</p></li>
  115. <li><p>All non-ephemeral data was moved out of Redis so that the
  116. only data store handling persistent data for internal
  117. apps was Postgres. This had the added advantage of stacks
  118. being able to tolerate a downed Redis and stay online.</p></li>
  119. <li><p>After a misguided foray into production polyglotism, the
  120. last component written in Scala was retired. Fewer
  121. programming languages in use meant that the entire system
  122. became easier to operate, and by more engineers.</p></li>
  123. <li><p>The component that handled Heroku orgs was originally run
  124. as its own microservice. It eventually became obvious
  125. that there had been a time when our microservice
  126. expansion had been a little overzealous, so to simplify
  127. operation, we folded a few services back into the hub.</p></li>
  128. </ul>
  129. <p>To recognize the effort that went into tearing down or
  130. replacing old technology, we created a ritual where we
  131. symbolically fed dead components to a flame called a <a href="/fragments/burn-parties">burn
  132. party</a>. The time and energy spent
  133. on some of these projects would in some cases be as great,
  134. or even greater, as it would for shipping a new product.</p>
  135. <figure>
  136. <p><a href="/assets/minimalism/fire@2x.jpg"><img src="/assets/minimalism/fire.jpg" srcset="/assets/minimalism/fire@2x.jpg 2x, /assets/minimalism/fire.jpg 1x" class="overflowing"/></a></p>
  137. <figcaption>At Heroku, we'd hold regular "burn parties" to recognize the effort that went into deprecating old products and technology.</figcaption>
  138. </figure>
  139. <p>Practicing minimalism in production is mostly about
  140. recognizing that the problem exists. After achieving that,
  141. mitigations are straightforward:</p>
  142. <ul>
  143. <li><p><strong><em>Retire old technology.</em></strong> Is something new being
  144. introduced? Look for opportunities to retire older
  145. technology that’s roughly equivalent. If you’re about to
  146. put Kafka in, maybe you can get away with retiring Rabbit
  147. or NSQ.</p></li>
  148. <li><p><strong><em>Build common service conventions.</em></strong> Standardize on
  149. one database, one language/runtime, one job queue, one
  150. web server, one reverse proxy, etc. If not one, then
  151. standardize on <em>as few as possible</em>.</p></li>
  152. <li><p><strong><em>Favor simplicity and reduce moving parts.</em></strong> Try to
  153. keep the total number of things in a system small so that
  154. it stays easy to understand and easy to operate. In some
  155. cases this will be a compromise because a technology
  156. that’s slightly less suited to a job may have to be
  157. re-used even if there’s a new one that would technically
  158. be a better fit.</p></li>
  159. <li><p><strong><em>Don’t use new technology the day, or even the year,
  160. that it’s initially released.</em></strong> Save yourself time and
  161. energy by letting others vet it, find bugs, and do the
  162. work to stabilize it. Avoid it permanently if it doesn’t
  163. pick up a significant community that will help support it
  164. well into the future.</p></li>
  165. <li><p><strong><em>Avoid custom technology.</em></strong> Software that you write is
  166. software that you have to maintain. Forever. Don’t
  167. succumb to NIH when there’s a well supported public
  168. solution that fits just as well (or even almost as well).</p></li>
  169. <li><p><strong><em>Use services.</em></strong> Software that you install is software
  170. that you have to operate. From the moment it’s activated,
  171. someone will be taking regular time out of their schedule
  172. to perform maintenance, troubleshoot problems, and
  173. install upgrades. Don’t succumb to NHH (not hosted here)
  174. when there’s a public service available that will do the
  175. job better.</p></li>
  176. </ul>
  177. <p>It’s not that new technology should <em>never</em> be introduced,
  178. but it should be done with rational defensiveness, and with
  179. a critical eye in how it’ll fit into an evolving (and
  180. hopefully ever-improving) architecture.</p>
  181. <p>Antoine de Saint Exupéry, a French poet and pioneering
  182. aviator, had this to say on the subject:</p>
  183. <blockquote>
  184. <p>It seems that perfection is reached not when there is
  185. nothing left to add, but when there is nothing left to
  186. take away.</p>
  187. </blockquote>
  188. <figure>
  189. <p><a href="/assets/minimalism/sea@2x.jpg"><img src="/assets/minimalism/sea.jpg" srcset="/assets/minimalism/sea@2x.jpg 2x, /assets/minimalism/sea.jpg 1x" class="overflowing"/></a></p>
  190. <figcaption>Nothing left to add. Nothing left to take away.</figcaption>
  191. </figure>
  192. <p>Most of us can benefit from architecture that’s a little
  193. simpler, a little more conservative, and a little more
  194. directed. Only by concertedly building a minimal stack
  195. that’s stable and nearly perfectly operable can we maximize
  196. our ability to push forward with new products and ideas.</p>