|
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292 |
- title: What is good code?
- url: http://loup-vaillant.fr/articles/good-code
- hash_url: 5e437b4eb16b4f47015a9502938dc431
-
- **Good code is cheap code that meets our needs. The cheaper the
- better.**
-
- Well, assuming code is just a means to an end. Sometimes we want to
- enjoy the code for itself. Most of the time however, the overriding
- concern is the _bottom line_.
-
- I could almost stop right there, but I feel like I should elaborate.
-
-
- Good (enough) programs
- ----------------------
-
- There is the code, and there is the program. Code is read by a human.
- Programs are interpreted by computers (or virtual machines).
-
- Programs have many requirements, most of which are domain specific.
- Some requirements however are universal:
-
- - **Correctness.** The program must not have too many bugs. It would
- make it worse than useless.
-
- - **Performance.** The program must not be too slow. It would waste
- the users' time, make them angry, or even behave incorrectly (in the
- case of real time software).
-
- - **Frugality.** The program must not use too much resources. We have
- plenty of CPU memory… but they still have limits. Use too much
- resources and your program will drain the battery of your phone,
- waste your bandwidth, or plain stop working.
-
- We don't always think of these requirements explicitly. Still, they
- must be met, and they have a cost. Want more performance or waste
- less resources? Your code will be more complex. Want less bugs? You
- will have to spend more time shaking them out.
-
- Once a program do meet all its requirements however, it is good
- enough. There is little point in trying to make it even better. With
- that goal set, the only thing left to optimise is the cost of code.
-
-
- Code as a dependency graph
- --------------------------
-
- Before we can hope to estimate the cost of code, we must understand a
- few things about its structure.
-
- Basically, your code is made up of little chunks that can fit in your
- head (if it didn't, you couldn't have written it in the first place).
- Most of the time, a chunk is a function âor a method, or a procedure.
- Each such chunk is a node of your graph. And their dependencies form
- the edges.
-
- Like in any directed graph, when you have N nodes, you can have up to
- N² edges. This has important consequences when estimating the cost of
- code. The number of edges, relative to the number of nodes is called
- the _density_ of the graph. Dense graphs have many edges (close to
- N²). Sparse graphs have relatively few edges (close to N).
-
- Each chunk of code have 2 levels of understanding: interface, and
- implementation.
-
- - If you just want to _use_ a piece of code, understanding its
- interface is enough.
- - If you want to _contribute_ to a piece of code, you have to
- understand its implementation. Which requires understanding the
- interface of all the direct dependencies.
-
- A few caveats, however
-
- - Dependencies aren't always obvious. With purely functional code,
- the dependency graph and the call graph are one and the same. On
- the other hand, things like shared state create more edges.
- _Implicit_ edges. Those implicit dependencies are often easy to
- miss.
-
- - To understand a properly documented interface, you generally don't
- have to look at anything else. Of course, some _aren't_ properly
- documented, forcing you to check the implementation… This little
- dance can go coup[pretty far][TC]).
-
- [TC]: https://en.wikipedia.org/wiki/Transitive_closure (Transitive closure (Wikipedia))
-
-
- The cost of code
- ----------------
-
- Basically, code costs however much time we spend on it. I personally
- break it down to 3 activities:
-
- - **Typing.** We need to write the code.
- - **Understanding.** We can't type randomly.
- - **Coordination.** We should avoid duplicate or incompatible work.
-
- Some would talk about development, maintenance, testing… But this is
- not an interesting way to break things down. Development and
- maintenance have a _lot_ in common. Even testing involves writing and
- reading code. And all three activities involve typing, understanding,
- and coordination.
-
-
- ### Typing
-
- It is generally admitted that we spend much more time thinking about
- our code than we do typing it down. I still find it interesting
- however, because it provides an obvious lower bound. Code _cannot_
- cost less than the time taken to type it.
-
- Imagine a 20,000 lines program. By current standards, this is not a
- big program. If you were to print it, it would fit in a medium-sized
- book: 400 pages, 50 lines per page. Prose would typically have about
- 12 words per line, but lines of code are shorter. Let's say 6 words
- per line. That's 120,000 words.
-
- Assuming you type 50 words per minute (a fair speed), typing it all
- down would take 40 hours. A full work week of mindless typing, and
- the program is not even "big". I say current standards are insane.
-
-
- ### Understanding
-
- You can't write code randomly. You need to know what you're doing.
- More specifically, you need to understand three things:
-
- - New code (that you are about to write).
- - Existing code. (_This_ is why code is so expensive.)
- - Prerequisites. (Required background knowledge.)
-
- #### New code
-
- The depth of understanding required to write new code is significant.
- This is going to take longer than 50 words per minute. Those 20,000
- lines aren't going to write themselves in a week. Nevertheless,
- assuming you work on this code piecemeal (it is impossible not to),
- the time taken to understand new code is still proportional to the
- length of that code.
-
- Oh, right. _Length._
-
- Intuitively, the time required to understand a piece of code is
- proportional to its _complexity_, not its length. Length, measured in
- lines of code, is an incredibly crude proxy. But it works. It is
- strongly correlated with most complexity measures we came up with so
- far, and those don't have more predictive power than length alone.
- For instance, if you know a program's length, learning its cyclomatic
- complexity won't tell you anything more about things like time to
- completion or number of bugs.
-
- Besides a [few exceptions][GOLF], complexity and length are roughly
- proportional. Of two chunks of code solving the same problem, if one
- is twice as big, it is probably twice as complex. Roughly.
- Amazingly, this heuristic works across languages. Terser languages
- make the code cheaper. (Again, we may have some [exceptions][APL]).
-
- [GOLF]: https://en.wikipedia.org/wiki/Code_golf (Code Golf (Wikipedia))
- [APL]: https://en.wikipedia.org/wiki/APL_%28programming_language%29 (APL (Wikipedia))
-
- #### Existing code
-
- Any new code you write will use, or be used by, existing code. You
- need to understand some of the old code before you write anything new.
- Unless you're just starting your project, but you won't start forever.
-
- The ideal case is when you work alone, and everything you have written
- so far is still fresh in your mind. Understanding it _again_ costs
- you nothing.
-
- Well, that never happens. You forget about code you have written long
- ago. You don't work alone, and must understand code _others_ have
- written. Or maybe you arrived late in the project. Now the density
- of your dependency graph matters a great deal:
-
- - If the graph is sparse, you'll rarely have to understand more than a
- few dependencies to contribute. The cost is not nil, but it can
- still be proportional to the size of the code base.
-
- - If the graph is dense however, you may have to understand many
- things before you do anything. In the worst case, that cost can
- rise up to the _square_ of the size of code base. Even if no graph
- gets _that_ dense, density can still kill your project.
-
- #### Prerequisites
-
- This one is highly context specific. Common knowledge costs nothing,
- because everyone knows it (by definition), and some knowledge is
- required to merely understand the problem.
-
- Some background knowledge however relates to _how_ you solve your
- problem. There are different ways to write a program, and some are
- more… esoteric than others. Take for instance that little
- [metacompiler][] I have written in Haskell. Very handy when you need
- to parse some textual data format. On the other hand, you will need
- to know about top-down parsing, parsing expression grammars, parser
- combinators, monads, applicative functors… are you _sure_ you want to
- learn all that just to parse some text? By the way, I no longer
- maintain this tool.
-
- [metacompiler]: ../projects/metacompilers
-
- Required background knowledge is the reason why lambdas and recursion
- are often frowned upon in mainstream settings. The typical OOP
- programmer is not used to this eldritch stuff from [FP hell][FP]. Not
- yet, at least.
-
- [FP]: ../tutorials/from-imperative-to-functional
-
- I don't have a magic bullet for this problem. If you don't know some
- useful concept or external tool, you can't use it unless you spend
- time to learn it. Good luck with the cost-benefit analysis.
-
-
- ### Coordination
-
- _(I have worked in several teams, but never lead one. Take this
- section with a grain of salt.)_
-
- Most of the time, you will work in a team. You have to. Most
- programmers can't do everything a full system requires. Domain
- specific algorithms, user interface, network, database, security… Even
- if you're one of those miraculous full stack experts, you probably
- won't have the time to code it by yourself.
-
- Hence coordination. In the worst case, everyone communicates with
- everyone else all the time. The cost of that overhead is quadratic
- with respect to the size of the team. Not a problem with 4 people.
- Quite a mess with 40. In practice, when teams grow large enough, two
- things inevitably happen:
-
- - Someone ends up leading the team. Most communications go through
- that person. This brings the amount of communication back to linear
- (mostly), but it is all concentrated on one person.
- - When the team gets _real_ big, it gets divided into sub-teams, each
- with their own leader. That means less overhead for the main
- leader, without increasing the communication overhead too much.
-
- (If neither happens, communication overhead explodes and little gets
- done.)
-
- How that relates to code is explained by [Conway's law:][CL]
-
- [CL]: https://en.wikipedia.org/w/index.php?title=Conway%27s_law (Wikipedia)
-
- > organizations which design systems … are constrained to produce
- > designs which are copies of the communication structures of these
- > organizations
-
- This works both ways. The organisation of your team will shape the
- code it produces, and the code you ask of your team will shape its
- organisation. You just can't build a big monolith with separate
- sub-teams. Either the teams will communicate a lot (effectively
- merging them together), or they will come up with a more modular
- design.
-
-
- Driving down the costs
- ----------------------
-
- From the above we can deduce 4 levers to reduce the cost of code:
-
- - **Write less code.** Less code is just plain cheaper. Enough said.
-
- - **Keep it modular.** Make your dependency graph as sparse as you
- can. It will reduce communication overheads, let your team work in
- parallel, and reduce the amount of code you need to understand
- before you write something new.
-
- - **Ponder your external dependencies.** External tools are a tough
- call. While they can massively reduce the amount of code you need
- to write, they can have a steep learning curve, or a significant
- maintenance burden. Make sure your external dependencies are worth
- your while.
-
- - **Use fewer concepts.** If your code requires less background
- knowledge, the time saved not learning it can be put to good use.
- Be careful, though. The right concepts often makes your code
- shorter, and are often widely applicable.
-
- Pretty standard stuff. I just want to stress one thing: those 4
- levers are probably the _only_ ones that matter. Found a new
- technique, a new language, a new methodology? It has to do one of
- those:
-
- - Reduce the amount of code;
- - increase its modularity;
- - replace or subsume heavier external dependencies;
- - or _maybe_ reduce the amount of required background knowledge.
-
- Otherwise it won't reduce your costs.
|