A place to cache linked articles (think custom and personal wayback machine)
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

index.md 12KB

4 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292
  1. title: What is good code?
  2. url: http://loup-vaillant.fr/articles/good-code
  3. hash_url: 5e437b4eb16b4f47015a9502938dc431
  4. **Good code is cheap code that meets our needs. The cheaper the
  5. better.**
  6. Well, assuming code is just a means to an end. Sometimes we want to
  7. enjoy the code for itself. Most of the time however, the overriding
  8. concern is the _bottom line_.
  9. I could almost stop right there, but I feel like I should elaborate.
  10. Good (enough) programs
  11. ----------------------
  12. There is the code, and there is the program. Code is read by a human.
  13. Programs are interpreted by computers (or virtual machines).
  14. Programs have many requirements, most of which are domain specific.
  15. Some requirements however are universal:
  16. - **Correctness.** The program must not have too many bugs. It would
  17. make it worse than useless.
  18. - **Performance.** The program must not be too slow. It would waste
  19. the users' time, make them angry, or even behave incorrectly (in the
  20. case of real time software).
  21. - **Frugality.** The program must not use too much resources. We have
  22. plenty of CPU memory… but they still have limits. Use too much
  23. resources and your program will drain the battery of your phone,
  24. waste your bandwidth, or plain stop working.
  25. We don't always think of these requirements explicitly. Still, they
  26. must be met, and they have a cost. Want more performance or waste
  27. less resources? Your code will be more complex. Want less bugs? You
  28. will have to spend more time shaking them out.
  29. Once a program do meet all its requirements however, it is good
  30. enough. There is little point in trying to make it even better. With
  31. that goal set, the only thing left to optimise is the cost of code.
  32. Code as a dependency graph
  33. --------------------------
  34. Before we can hope to estimate the cost of code, we must understand a
  35. few things about its structure.
  36. Basically, your code is made up of little chunks that can fit in your
  37. head (if it didn't, you couldn't have written it in the first place).
  38. Most of the time, a chunk is a function —or a method, or a procedure.
  39. Each such chunk is a node of your graph. And their dependencies form
  40. the edges.
  41. Like in any directed graph, when you have N nodes, you can have up to
  42. N² edges. This has important consequences when estimating the cost of
  43. code. The number of edges, relative to the number of nodes is called
  44. the _density_ of the graph. Dense graphs have many edges (close to
  45. N²). Sparse graphs have relatively few edges (close to N).
  46. Each chunk of code have 2 levels of understanding: interface, and
  47. implementation.
  48. - If you just want to _use_ a piece of code, understanding its
  49. interface is enough.
  50. - If you want to _contribute_ to a piece of code, you have to
  51. understand its implementation. Which requires understanding the
  52. interface of all the direct dependencies.
  53. A few caveats, however
  54. - Dependencies aren't always obvious. With purely functional code,
  55. the dependency graph and the call graph are one and the same. On
  56. the other hand, things like shared state create more edges.
  57. _Implicit_ edges. Those implicit dependencies are often easy to
  58. miss.
  59. - To understand a properly documented interface, you generally don't
  60. have to look at anything else. Of course, some _aren't_ properly
  61. documented, forcing you to check the implementation… This little
  62. dance can go coup[pretty far][TC]).
  63. [TC]: https://en.wikipedia.org/wiki/Transitive_closure (Transitive closure (Wikipedia))
  64. The cost of code
  65. ----------------
  66. Basically, code costs however much time we spend on it. I personally
  67. break it down to 3 activities:
  68. - **Typing.** We need to write the code.
  69. - **Understanding.** We can't type randomly.
  70. - **Coordination.** We should avoid duplicate or incompatible work.
  71. Some would talk about development, maintenance, testing… But this is
  72. not an interesting way to break things down. Development and
  73. maintenance have a _lot_ in common. Even testing involves writing and
  74. reading code. And all three activities involve typing, understanding,
  75. and coordination.
  76. ### Typing
  77. It is generally admitted that we spend much more time thinking about
  78. our code than we do typing it down. I still find it interesting
  79. however, because it provides an obvious lower bound. Code _cannot_
  80. cost less than the time taken to type it.
  81. Imagine a 20,000 lines program. By current standards, this is not a
  82. big program. If you were to print it, it would fit in a medium-sized
  83. book: 400 pages, 50 lines per page. Prose would typically have about
  84. 12 words per line, but lines of code are shorter. Let's say 6 words
  85. per line. That's 120,000 words.
  86. Assuming you type 50 words per minute (a fair speed), typing it all
  87. down would take 40 hours. A full work week of mindless typing, and
  88. the program is not even "big". I say current standards are insane.
  89. ### Understanding
  90. You can't write code randomly. You need to know what you're doing.
  91. More specifically, you need to understand three things:
  92. - New code (that you are about to write).
  93. - Existing code. (_This_ is why code is so expensive.)
  94. - Prerequisites. (Required background knowledge.)
  95. #### New code
  96. The depth of understanding required to write new code is significant.
  97. This is going to take longer than 50 words per minute. Those 20,000
  98. lines aren't going to write themselves in a week. Nevertheless,
  99. assuming you work on this code piecemeal (it is impossible not to),
  100. the time taken to understand new code is still proportional to the
  101. length of that code.
  102. Oh, right. _Length._
  103. Intuitively, the time required to understand a piece of code is
  104. proportional to its _complexity_, not its length. Length, measured in
  105. lines of code, is an incredibly crude proxy. But it works. It is
  106. strongly correlated with most complexity measures we came up with so
  107. far, and those don't have more predictive power than length alone.
  108. For instance, if you know a program's length, learning its cyclomatic
  109. complexity won't tell you anything more about things like time to
  110. completion or number of bugs.
  111. Besides a [few exceptions][GOLF], complexity and length are roughly
  112. proportional. Of two chunks of code solving the same problem, if one
  113. is twice as big, it is probably twice as complex. Roughly.
  114. Amazingly, this heuristic works across languages. Terser languages
  115. make the code cheaper. (Again, we may have some [exceptions][APL]).
  116. [GOLF]: https://en.wikipedia.org/wiki/Code_golf (Code Golf (Wikipedia))
  117. [APL]: https://en.wikipedia.org/wiki/APL_%28programming_language%29 (APL (Wikipedia))
  118. #### Existing code
  119. Any new code you write will use, or be used by, existing code. You
  120. need to understand some of the old code before you write anything new.
  121. Unless you're just starting your project, but you won't start forever.
  122. The ideal case is when you work alone, and everything you have written
  123. so far is still fresh in your mind. Understanding it _again_ costs
  124. you nothing.
  125. Well, that never happens. You forget about code you have written long
  126. ago. You don't work alone, and must understand code _others_ have
  127. written. Or maybe you arrived late in the project. Now the density
  128. of your dependency graph matters a great deal:
  129. - If the graph is sparse, you'll rarely have to understand more than a
  130. few dependencies to contribute. The cost is not nil, but it can
  131. still be proportional to the size of the code base.
  132. - If the graph is dense however, you may have to understand many
  133. things before you do anything. In the worst case, that cost can
  134. rise up to the _square_ of the size of code base. Even if no graph
  135. gets _that_ dense, density can still kill your project.
  136. #### Prerequisites
  137. This one is highly context specific. Common knowledge costs nothing,
  138. because everyone knows it (by definition), and some knowledge is
  139. required to merely understand the problem.
  140. Some background knowledge however relates to _how_ you solve your
  141. problem. There are different ways to write a program, and some are
  142. more… esoteric than others. Take for instance that little
  143. [metacompiler][] I have written in Haskell. Very handy when you need
  144. to parse some textual data format. On the other hand, you will need
  145. to know about top-down parsing, parsing expression grammars, parser
  146. combinators, monads, applicative functors… are you _sure_ you want to
  147. learn all that just to parse some text? By the way, I no longer
  148. maintain this tool.
  149. [metacompiler]: ../projects/metacompilers
  150. Required background knowledge is the reason why lambdas and recursion
  151. are often frowned upon in mainstream settings. The typical OOP
  152. programmer is not used to this eldritch stuff from [FP hell][FP]. Not
  153. yet, at least.
  154. [FP]: ../tutorials/from-imperative-to-functional
  155. I don't have a magic bullet for this problem. If you don't know some
  156. useful concept or external tool, you can't use it unless you spend
  157. time to learn it. Good luck with the cost-benefit analysis.
  158. ### Coordination
  159. _(I have worked in several teams, but never lead one. Take this
  160. section with a grain of salt.)_
  161. Most of the time, you will work in a team. You have to. Most
  162. programmers can't do everything a full system requires. Domain
  163. specific algorithms, user interface, network, database, security… Even
  164. if you're one of those miraculous full stack experts, you probably
  165. won't have the time to code it by yourself.
  166. Hence coordination. In the worst case, everyone communicates with
  167. everyone else all the time. The cost of that overhead is quadratic
  168. with respect to the size of the team. Not a problem with 4 people.
  169. Quite a mess with 40. In practice, when teams grow large enough, two
  170. things inevitably happen:
  171. - Someone ends up leading the team. Most communications go through
  172. that person. This brings the amount of communication back to linear
  173. (mostly), but it is all concentrated on one person.
  174. - When the team gets _real_ big, it gets divided into sub-teams, each
  175. with their own leader. That means less overhead for the main
  176. leader, without increasing the communication overhead too much.
  177. (If neither happens, communication overhead explodes and little gets
  178. done.)
  179. How that relates to code is explained by [Conway's law:][CL]
  180. [CL]: https://en.wikipedia.org/w/index.php?title=Conway%27s_law (Wikipedia)
  181. > organizations which design systems … are constrained to produce
  182. > designs which are copies of the communication structures of these
  183. > organizations
  184. This works both ways. The organisation of your team will shape the
  185. code it produces, and the code you ask of your team will shape its
  186. organisation. You just can't build a big monolith with separate
  187. sub-teams. Either the teams will communicate a lot (effectively
  188. merging them together), or they will come up with a more modular
  189. design.
  190. Driving down the costs
  191. ----------------------
  192. From the above we can deduce 4 levers to reduce the cost of code:
  193. - **Write less code.** Less code is just plain cheaper. Enough said.
  194. - **Keep it modular.** Make your dependency graph as sparse as you
  195. can. It will reduce communication overheads, let your team work in
  196. parallel, and reduce the amount of code you need to understand
  197. before you write something new.
  198. - **Ponder your external dependencies.** External tools are a tough
  199. call. While they can massively reduce the amount of code you need
  200. to write, they can have a steep learning curve, or a significant
  201. maintenance burden. Make sure your external dependencies are worth
  202. your while.
  203. - **Use fewer concepts.** If your code requires less background
  204. knowledge, the time saved not learning it can be put to good use.
  205. Be careful, though. The right concepts often makes your code
  206. shorter, and are often widely applicable.
  207. Pretty standard stuff. I just want to stress one thing: those 4
  208. levers are probably the _only_ ones that matter. Found a new
  209. technique, a new language, a new methodology? It has to do one of
  210. those:
  211. - Reduce the amount of code;
  212. - increase its modularity;
  213. - replace or subsume heavier external dependencies;
  214. - or _maybe_ reduce the amount of required background knowledge.
  215. Otherwise it won't reduce your costs.