davidbgk
/
larlet-fr-david-cache

title: What is good code?
url: http://loup-vaillant.fr/articles/good-code
hash_url: 5e437b4eb16b4f47015a9502938dc431

**Good code is cheap code that meets our needs.  The cheaper the
  better.**

Well, assuming code is just a means to an end.  Sometimes we want to
enjoy the code for itself.  Most of the time however, the overriding
concern is the _bottom line_.

I could almost stop right there, but I feel like I should elaborate.


Good (enough) programs
----------------------

There is the code, and there is the program.  Code is read by a human.
Programs are interpreted by computers (or virtual machines).

Programs have many requirements, most of which are domain specific.
Some requirements however are universal:

- **Correctness.** The program must not have too many bugs.  It would
  make it worse than useless.

- **Performance.** The program must not be too slow.  It would waste
  the users' time, make them angry, or even behave incorrectly (in the
  case of real time software).

- **Frugality.** The program must not use too much resources.  We have
  plenty of CPU memory… but they still have limits.  Use too much
  resources and your program will drain the battery of your phone,
  waste your bandwidth, or plain stop working.

We don't always think of these requirements explicitly.  Still, they
must be met, and they have a cost.  Want more performance or waste
less resources?  Your code will be more complex.  Want less bugs?  You
will have to spend more time shaking them out.

Once a program do meet all its requirements however, it is good
enough.  There is little point in trying to make it even better.  With
that goal set, the only thing left to optimise is the cost of code.


Code as a dependency graph
--------------------------

Before we can hope to estimate the cost of code, we must understand a
few things about its structure.

Basically, your code is made up of little chunks that can fit in your
head (if it didn't, you couldn't have written it in the first place).
Most of the time, a chunk is a function âor a method, or a procedure.
Each such chunk is a node of your graph.  And their dependencies form
the edges.

Like in any directed graph, when you have N nodes, you can have up to
NÂ² edges.  This has important consequences when estimating the cost of
code.  The number of edges, relative to the number of nodes is called
the _density_ of the graph.  Dense graphs have many edges (close to
NÂ²).  Sparse graphs have relatively few edges (close to N).

Each chunk of code have 2 levels of understanding: interface, and
implementation.

- If you just want to _use_ a piece of code, understanding its
  interface is enough.
- If you want to _contribute_ to a piece of code, you have to
  understand its implementation.  Which requires understanding the
  interface of all the direct dependencies.

A few caveats, however

- Dependencies aren't always obvious.  With purely functional code,
  the dependency graph and the call graph are one and the same.  On
  the other hand, things like shared state create more edges.
  _Implicit_ edges.  Those implicit dependencies are often easy to
  miss.

- To understand a properly documented interface, you generally don't
  have to look at anything else.  Of course, some _aren't_ properly
  documented, forcing you to check the implementation… This little
  dance can go coup[pretty far][TC]).

[TC]: https://en.wikipedia.org/wiki/Transitive_closure (Transitive closure (Wikipedia))


The cost of code
----------------

Basically, code costs however much time we spend on it.  I personally
break it down to 3 activities:

- **Typing.** We need to write the code.
- **Understanding.** We can't type randomly.
- **Coordination.** We should avoid duplicate or incompatible work.

Some would talk about development, maintenance, testing… But this is
not an interesting way to break things down.  Development and
maintenance have a _lot_ in common.  Even testing involves writing and
reading code.  And all three activities involve typing, understanding,
and coordination.


### Typing

It is generally admitted that we spend much more time thinking about
our code than we do typing it down.  I still find it interesting
however, because it provides an obvious lower bound.  Code _cannot_
cost less than the time taken to type it.

Imagine a 20,000 lines program.  By current standards, this is not a
big program.  If you were to print it, it would fit in a medium-sized
book: 400 pages, 50 lines per page.  Prose would typically have about
12 words per line, but lines of code are shorter.  Let's say 6 words
per line.  That's 120,000 words.

Assuming you type 50 words per minute (a fair speed), typing it all
down would take 40 hours.  A full work week of mindless typing, and
the program is not even "big".  I say current standards are insane.


### Understanding

You can't write code randomly.  You need to know what you're doing.
More specifically, you need to understand three things:

- New code (that you are about to write).
- Existing code.  (_This_ is why code is so expensive.)
- Prerequisites.  (Required background knowledge.)

#### New code

The depth of understanding required to write new code is significant.
This is going to take longer than 50 words per minute.  Those 20,000
lines aren't going to write themselves in a week.  Nevertheless,
assuming you work on this code piecemeal (it is impossible not to),
the time taken to understand new code is still proportional to the
length of that code.

Oh, right.  _Length._

Intuitively, the time required to understand a piece of code is
proportional to its _complexity_, not its length.  Length, measured in
lines of code, is an incredibly crude proxy.  But it works.  It is
strongly correlated with most complexity measures we came up with so
far, and those don't have more predictive power than length alone.
For instance, if you know a program's length, learning its cyclomatic
complexity won't tell you anything more about things like time to
completion or number of bugs.

Besides a [few exceptions][GOLF], complexity and length are roughly
proportional.  Of two chunks of code solving the same problem, if one
is twice as big, it is probably twice as complex.  Roughly.
Amazingly, this heuristic works across languages.  Terser languages
make the code cheaper.  (Again, we may have some [exceptions][APL]).

[GOLF]: https://en.wikipedia.org/wiki/Code_golf (Code Golf (Wikipedia))
[APL]:  https://en.wikipedia.org/wiki/APL_%28programming_language%29 (APL (Wikipedia))

#### Existing code

Any new code you write will use, or be used by, existing code.  You
need to understand some of the old code before you write anything new.
Unless you're just starting your project, but you won't start forever.

The ideal case is when you work alone, and everything you have written
so far is still fresh in your mind.  Understanding it _again_ costs
you nothing.

Well, that never happens.  You forget about code you have written long
ago.  You don't work alone, and must understand code _others_ have
written.  Or maybe you arrived late in the project.  Now the density
of your dependency graph matters a great deal:

- If the graph is sparse, you'll rarely have to understand more than a
  few dependencies to contribute.  The cost is not nil, but it can
  still be proportional to the size of the code base.

- If the graph is dense however, you may have to understand many
  things before you do anything.  In the worst case, that cost can
  rise up to the _square_ of the size of code base.  Even if no graph
  gets _that_ dense, density can still kill your project.

#### Prerequisites

This one is highly context specific.  Common knowledge costs nothing,
because everyone knows it (by definition), and some knowledge is
required to merely understand the problem.

Some background knowledge however relates to _how_ you solve your
problem.  There are different ways to write a program, and some are
more… esoteric than others.  Take for instance that little
[metacompiler][] I have written in Haskell.  Very handy when you need
to parse some textual data format.  On the other hand, you will need
to know about top-down parsing, parsing expression grammars, parser
combinators, monads, applicative functors… are you _sure_ you want to
learn all that just to parse some text?  By the way, I no longer
maintain this tool.

[metacompiler]: ../projects/metacompilers

Required background knowledge is the reason why lambdas and recursion
are often frowned upon in mainstream settings.  The typical OOP
programmer is not used to this eldritch stuff from [FP hell][FP].  Not
yet, at least.

[FP]: ../tutorials/from-imperative-to-functional

I don't have a magic bullet for this problem.  If you don't know some
useful concept or external tool, you can't use it unless you spend
time to learn it.  Good luck with the cost-benefit analysis.


### Coordination

_(I have worked in several teams, but never lead one.  Take this
section with a grain of salt.)_

Most of the time, you will work in a team.  You have to.  Most
programmers can't do everything a full system requires.  Domain
specific algorithms, user interface, network, database, security… Even
if you're one of those miraculous full stack experts, you probably
won't have the time to code it by yourself.

Hence coordination.  In the worst case, everyone communicates with
everyone else all the time.  The cost of that overhead is quadratic
with respect to the size of the team.  Not a problem with 4 people.
Quite a mess with 40.  In practice, when teams grow large enough, two
things inevitably happen:

- Someone ends up leading the team.  Most communications go through
  that person.  This brings the amount of communication back to linear
  (mostly), but it is all concentrated on one person.
- When the team gets _real_ big, it gets divided into sub-teams, each
  with their own leader.  That means less overhead for the main
  leader, without increasing the communication overhead too much.

(If neither happens, communication overhead explodes and little gets
done.)

How that relates to code is explained by [Conway's law:][CL]

[CL]: https://en.wikipedia.org/w/index.php?title=Conway%27s_law (Wikipedia)

&gt; organizations which design systems … are constrained to produce
&gt; designs which are copies of the communication structures of these
&gt; organizations

This works both ways.  The organisation of your team will shape the
code it produces, and the code you ask of your team will shape its
organisation.  You just can't build a big monolith with separate
sub-teams.  Either the teams will communicate a lot (effectively
merging them together), or they will come up with a more modular
design.


Driving down the costs
----------------------

From the above we can deduce 4 levers to reduce the cost of code:

- **Write less code.** Less code is just plain cheaper.  Enough said.

- **Keep it modular.** Make your dependency graph as sparse as you
  can.  It will reduce communication overheads, let your team work in
  parallel, and reduce the amount of code you need to understand
  before you write something new.

- **Ponder your external dependencies.** External tools are a tough
  call.  While they can massively reduce the amount of code you need
  to write, they can have a steep learning curve, or a significant
  maintenance burden.  Make sure your external dependencies are worth
  your while.

- **Use fewer concepts.** If your code requires less background
  knowledge, the time saved not learning it can be put to good use.
  Be careful, though.  The right concepts often makes your code
  shorter, and are often widely applicable.

Pretty standard stuff.  I just want to stress one thing: those 4
levers are probably the _only_ ones that matter.  Found a new
technique, a new language, a new methodology?  It has to do one of
those:

- Reduce the amount of code;
- increase its modularity;
- replace or subsume heavier external dependencies;
- or _maybe_ reduce the amount of required background knowledge.

Otherwise it won't reduce your costs.