A place to cache linked articles (think custom and personal wayback machine)
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

index.md 48KB

5 years ago
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775
  1. title: Interfaces - The Most Important Software Engineering Concept
  2. url: http://blog.robertelder.org/interfaces-most-important-software-engineering-concept/
  3. hash_url: 0f39895846c57b132dc47e2ab3a34207
  4. <h1>Synopsis</h1>
  5. <p>     An interface can be thought of as a contract between the system and the environment.  In a computer program, the 'system' is the function or module in question, and the 'environment' is the rest of the project.  The interface formally describes what can pass between the system and the environment.  An 'implementation' can be defined as the system minus the interface.  Interfaces in languages like Haskell can be extremely specific, or very non-specific like in Python.  The types of interfaces used can affect the amount of technical debt that is created (a mathematical formula is provided), and programmer productivity.  A method for quantifying and comparing interfaces is proposed.  Based on these comparisons, you can make a number of observations about the way a language or tool is used.</p>
  6. <p>     Read the comment thread on <a href="https://news.ycombinator.com/item?id=11180030">Hacker News</a>.</p>
  7. <h1>Overview</h1>
  8. <p>     The most important concept in software engineering is the concept of an <strong>interface</strong>.  This article is not about interfaces in <em>Java</em>, it is about interfaces in software design, and to a lesser extent, interfaces <em>anywhere in the universe</em>.  There are many other important concepts used in software development, but I would argue that many of them actually end up relating back to why interfaces are so important.  In this article, I will discuss:</p>
  9. <h1 id="what-is-an-interface">What is an Interface?</h1>
  10. <p>     In university we learned of a couple succinct definitions for what an interface is that I really like:</p>
  11. <p class="obvious">An interface is <strong>a contract between the system and the environment.</strong></p>
  12. <p>     or alternatively</p>
  13. <p class="obvious">An interface is <strong>the intersection between the system and the environment.</strong></p>
  14. <p>
  15. <span>Interface = System ∩ Environment</span>
  16. </p>
  17. <img src="https://s3.amazonaws.com/re-software-images/interface-intersection.gif" alt="Interface: Intersection between system and environment"/>
  18. <p>     The intersection definition fits well when the 'system' is actually a physical object.  The above definitions are very abstract, so let's go directly to a specific example of someone typing on a keyboard:</p>
  19. <table>
  20. <tr>
  21. <td>
  22. <img src="https://s3.amazonaws.com/re-software-images/hands.jpg" alt="Hands typing on keyboard."/>
  23. </td>
  24. <td>
  25. <img src="https://s3.amazonaws.com/re-software-images/example-interface.jpg" alt="Hands typing on keyboard overlayed with interface."/>
  26. </td>
  27. </tr>
  28. </table>
  29. <p>     In the above example, the system represents the laptop computer as a whole, the environment is the person's hands (and any nearby cats that like to step on keyboards).  The interface, therefore, must any part of the interaction between the hands and the computer that is not exclusively attributable to one or the other, but can only be attributed to both.  Normally we think of hands and keyboards as being distinctly separate, so the precise boundary of the interface in this case is up for philosophical debate.  It is up to the reader to decide whether they consider the entire keyboard, or just the individual atoms that come in contact with the fingers or keyboard to be part of the interface.  You might be wondering how this example can relate to the definition of an interface as a contract: The "contract" in this case is the convention that we all spent much effort learning back when we had to program our brains with all the muscle memory to know where all the keys are.  There are also more subtle aspects to the contract, like the fact that pressing a key and holding it down has a different meaning than pressing it quickly and releasing.</p>
  30. <p>     This is a nice bit of philosophy, but what does it have to do with writing software?  Well, interfaces in programming are all around us, even if you're not aware of it.  If you're a Java programmer you explicitly name them for what they are, but they also exist in other languages like C.  Let's consider the interface of the function 'add_numbers' in the following example:</p>
  31. <pre>
  32. <code class="c">
  33. unsigned int add_numbers(unsigned int, unsigned int);
  34. void other_function(void){
  35. add_numbers(3,4);
  36. }
  37. unsigned int add_numbers(unsigned int a, unsigned int b){
  38. return a + b;
  39. }
  40. int main(void){
  41. add_numbers(9,99);
  42. return 0;
  43. }
  44. </code>
  45. </pre>
  46. <p>     Let's apply the same highlighting technique to describe the environment, the 'add_numbers' system, and the interface:</p>
  47. <p>     In the above illustrations, the 'system' in question consists of the 'add_numbers' function.  You would be correct to say that the main method, or the 'other_function' method could also be considered an individual system, however for simplicity the images above have considered the 'add_numbers' function as a system in isolation.  It would also be reasonable to consider the invocations of the 'add_numbers' function to be part of the interface too.  Note that we've added a fourth idea: an 'Implementation'.  It is difficult to discuss interfaces without making reference to implementations, so let's go ahead and try to formally define what an implementation is:</p>
  48. <p class="obvious">An implementation is <strong>the system minus the interface</strong>:</p>
  49. <p>
  50. <span>Implementation = System ∖ Interface</span>
  51. <br/>
  52. <span>Implementation = System ∖ (System ∩ Environment)</span>
  53. </p>
  54. <img src="https://s3.amazonaws.com/re-software-images/implementation-intersection.gif" alt="Implementation"/>
  55. <p>     Please note that I've never actually heard of (or don't remember) anyone else defining an implementation this way, but it seems like such an irresistible extension of the set based definition of an interface and it has a couple other benefits I'll discuss shortly.  If you're a poor student studying for an exam, your professor will probably have never heard of this definition.  I also wouldn't be surprised if this definition was in conflict with some Object Oriented Programming taxonomy, but even it does, I don't plan to change it.  Let those crazy OOP people change their textbooks to match my definition.</p>
  56. <p>     Defining implementations this way leads us to other reasonable conclusions: When we talk about interfaces on physical systems, we typically think of the 'system implementation' as the entire physical object, and it would be unnatural to consider the 'real' system implementation to exclude the buttons, screens or any other physical part.  This pushes us to consider the interface to include as little as possible of the physical system, and represent more of a convention.  It is almost as if an interface were just a set of promises, guarantees, or some kind of ... <strong>CONTRACT BETWEEN THE SYSTEM AND THE ENVIRONMENT!</strong></p>
  57. <h1 id="interface-as-contract">An Interface as a Contract</h1>
  58. <p>     Let's consider the interface to the function 'add_numbers' in the previous example as a contract, and see what it guarantees:</p>
  59. <ul>
  60. <li>'add_numbers' is a function that exists.</li>
  61. <li>'add_numbers' takes exactly two parameters, both of which are 'unsigned int's.</li>
  62. <li>'add_numbers' returns exactly one 'unsigned int'.</li>
  63. </ul>
  64. <p>     The interface for this function does not say anything about
  65. <br/>
  66. </p><ul>
  67. <li>Whether 'add_numbers' will ever halt.</li>
  68. <li>The asymptotic run-time complexity of 'add_numbers'.</li>
  69. <li>The quantity of free memory required to run 'add_numbers'.</li>
  70. <li>What the implementation of 'unsigned int' really is.</li>
  71. <li>Side effects (like allocating memory, and modifying global variables)</li>
  72. </ul>
  73. <p>     The interface to 'add_numbers' described above is known as a function 'prototype', and in earlier versions of K&amp;R C, there was a weaker form of describing interfaces:</p>
  74. <pre>
  75. <code class="c">
  76. unsigned int add_numbers();
  77. </code>
  78. </pre>
  79. <p>     Defining an interface as a "contract" is very convenient for programming since most programming tasks simply amount to defining and requiring sets of axioms. Post-conditions, and pre-conditions are all guarantees about certain properties or behaviour.  Before two parties engage in doing business together, they ought to have a contract prepared.  The contract spells out what the deliverables are, how much money is paid, and when.  Other topics like early termination, indemnification, expenses are all lain out in advance.  When the contract is breached, a court or an arbitrator can resolve the situation, but if you forget to define something in the contract, then unexpected surprises are more likely.  In a computer program we have the same thing: Modules and functions specify what they want, and (sometimes) what they will return.  A breach of this contract will result in a compile error, a run-time error, program fault, build system or linter failure or even your manager yelling at you.  I would go so far as to say that the concept of defining an interface as a "contract" is not even metaphorical.  It really is the same concept as a business contract, even thought a business contract is typically not as detailed.</p>
  80. <h1 id="patents-and-copyright">Patents, Copyright and Interfaces</h1>
  81. <p>     This section does not consist of legal advice and may even be in contradiction with existing law, all statements herein are the opinions of the author.</p>
  82. <p>     In the previous section, I stated that I would consider an interface to <em>literally</em> be a 'business contract' between two entities, and I emphasized that I don't consider this to be a metaphor.  I believe this interpretation is one that appeals to the concerns of both computer scientists and also to legal professionals who aim to protect creative works.</p>
  83. <p>     Should an interface be patentable?  Using the definition included in this article, that an interface is a contract between the system and the environment, I do not believe that interfaces should be patentable, and so far the existing case law seems to agree with me.  Keep in mind, however, that the word 'interface' is very generic, and is often used in a way that is different than how I have defined it in this article.  </p>
  84. <p>     Should an interface be copyrightable?  Using the definition included in this article, that an interface is a contract between the system and the environment, I do believe that the "Source Code" of an interface should be copyrightable.  Furthermore, the copyrightable aspects of an interface should extend no further than the point just before they begin to cover the aspects of an interface that make interfaces so special.   The copyright should cover only the medium (source code or handwritten copy), but not the guarantees or constraints.  If any guarantees or constraints of the interface become inseparable from any part of the medium, then those parts of the medium should be disqualified from copyrightability.  I'll propose a simple test that could be applied to determine if something is <strong>not</strong> copyrightable:</p>
  85. <p class="obvious">If you consider a set of attributes of an interface that you'd like to consider copyrightable, given any conceivable third-party piece of software that successfully uses the interface in question in any way, it should always be possible to build some drop-in replacement that declares and implements the same interface and is successfully used by the third-party software without any modification to the third-party software, and without infringing any copyrights.  If every possible drop-in replacement would cause infringement or require that the third-party software be modified or regress in functionality, then the chosen set of copyrightable attributes are too aggressive must be reduced.</p>
  86. <p>     I believe the above test would be appropriate to test for patentability as well.  Note that this test would only determine if something is <strong>not</strong> copyrightable or patentable.  It would say nothing about conclusively determining whether it <em>is</em> copyrightable or patentable.  Finally, the above test is just my opinion, don't confuse it with actual law.</p>
  87. <p>     An important thing to point out in relation to the above test, is that any criteria that could be considered part of the interface in one language, may not be part of the interface in another language.  For example, in Java the order in which functions are declared does not affect program execution.  If you were tempted to casually say 'the order of functions in the file never matters', then you would be incorrect if you consider the following python program:</p>
  88. <pre>
  89. <code class="python">
  90. def foo():
  91. print("asdf")
  92. def foo(abc):
  93. print(abc)
  94. foo("lol")
  95. </code>
  96. </pre>
  97. <p>     Giving consideration to the legal aspects of interfaces prompted me to go back and take a look at the famous <a href="http://www.potomaclaw.com/oracle-v-google-copyrightability-apis/">Oracle Vs. Google</a> case.  The provided link includes details of the case that would be interesting to software developers, so that's what I'll draw on for my analysis.  In summary, based on what I see, I can't find a reason to disagree with the outcome in favour of Oracle.  That's not to say that I support it, since the publicly available case details that I can find are fairly sparse.</p>
  98. <p>     I think the concern among most software developers was that the outcome of the case might set a precedent that would allow copyright or patents to cover parts of interfaces that would cause the above test I proposed to fail.</p>
  99. <p>     The outcome of the case rested on the district court's finding that 'the "structure, sequence and organization" of an API was copyrightable.'  As I mentioned above, I don't think there is a problem with this, as long as the definition of "structure, sequence and organization" does not cause the above test to fail.  Here are a few quotes from the linked article that are key:</p>
  100. <p>     "The district court concluded that 'there is only one way to write' the declarations to interface with Java. If true, the use of identical declarations would not be copyrightable. However, except for three of the API packages, Google did not dispute the fact that it could have written its own API packages to access the Java language." and finally "Google conceded that it copied the declarations verbatim."</p>
  101. <p>     It would seem that the district court made the right decision in concluding that the intrinsically unique properties of an interface are not copyrightable, but Google also admitted to copying declarations 'verbatim'.  If 'verbatim' can be take to include literal copy and pasting, that includes non-functional aspects like white space, and spelling mistakes in comments, then I think it would be very reasonable to consider this as copyright infringement.  The non-copyrightability of an interface does not need to prevent an individual artistic expression of an interface from being copyrighted.</p>
  102. <p>     My knowledge of this case only comes from what I can read online, but it would appear to me that Google created verbatim copies of Java source code which happened to include interfaces.  Google themselves appears to have been been of the opinion that their use of Java required licensing, because prior to 2010 Google pursued licensing deals with Sun to license the use of Java.  After Sun was acquired by Oracle, the licensing negotiations fell through.  The fact that Google was pursuing licensing deals that didn't come to fruition, but continued using 'verbatim' copies of code doesn't seem to help their case.  I suspect that Google's lawyers may have known they had a weak case, and so they attempted to use a defense related to the very legitimate claim that interfaces should not be copyrighted, and hoped that the source code representation of an interface, and the more philosophical concept would become conflated, allowing them to win the case.</p>
  103. <h1 id="what-is-module-or-abstraction">What is a 'Module' or 'Abstraction'?</h1>
  104. <p>     When I think about a 'Module' this is what I picture in my mind:</p>
  105. <img src="https://s3.amazonaws.com/re-software-images/shape-sorter.jpg" alt="Module"/>
  106. <p>     The reason I think this representation applies so well is because it clearly emphasizes the importance of the boundary of the module, and how it interfaces with the rest of its environment.  Furthermore, the interface of the cube above imposes very strong constraints about how the external world can interact with what is inside.  You can't go around the interface, so if you want to interact with it, you must do so through the means it exposes to you.  Finally, there is nothing in the cube, but we actually don't care, because it's not what's on the inside that counts (sorry cube), it's the interface that is exposed to the world.</p>
  107. <p>     Another example I really like is that of a cell and its membrane:  Features on the cell surface like transport and receptor proteins only permit certain things in the extra-cellular matrix to influence what happens inside the cytoplasm according to very specific rules:</p>
  108. <img src="https://s3.amazonaws.com/re-software-images/cell-membrane.gif" alt="Cell Membrane"/>
  109. <p>     In the context of this article, I will refer to 'modules' and 'abstractions' as if they were the same concept.  The dictionary definition of these words is certainly not the same, and even between programming languages these concepts have different meanings.  The key properties I'm interested in are that both of them can be though of as a 'system' as we've been using the term in this article: Abstractions and modules can be thought of as something consisting of an interface, and an implementation.  You could think of an individual function in C as a module, a 'module' in Python, a class or package in Java.  Anything that has some kind of externally presented interface, and some 'hidden' implementation.  Note that the 'hiddenness' of the implementation can be imposed by the rules of the language, or even just by convention of the programmer.</p>
  110. <h1 id="what-is-abstraction-leak">Abstraction Leaks</h1>
  111. <p>     As far as I can tell, the idea of an abstraction leak can be traced back to <a href="http://www.joelonsoftware.com/articles/LeakyAbstractions.html">an essay by Joel Spolsky</a>.  There are a few good examples of specific abstraction leaks in the essay, but I'd like to add one of my own: The concept of a 'map' is very common in programming, and represents a data structure consisting of key and value pairs.  An important constraint that the map guarantees is that all map keys must be unique: Trying to write a new value to a given key either results in an error, or overwriting the previous value for that key.  The result is never to have duplicate keys.  An extremely common programer requirement is to want to iterate over all the keys of the map.  Since ordering of keys is not necessarily a guarantee maps provide, you might wonder what order the keys will be in when you iterate over them?  Well, the ordering is not defined, because the map interface does not provide any guarantee of ordering.  Therefore, any ordering is considered acceptable, but in practice the keys are likely to be sorted in some way.  Why would they be sorted?  Well, sorting happens to be an efficient way of organizing the data.  It can make things like checking for pre-existing keys easier.</p>
  112. <p>     Iterating over sorted data can produce very different results than iterating over random data.  For example, if you're trying to find the minimum number in a list:</p>
  113. <pre>
  114. <code class="c">
  115. min = null;
  116. list = map.getMapKeys();
  117. for (item in list){
  118. if ( min == null ){
  119. min = item
  120. }else if (item &lt; min){
  121. min = min; /* This line has a bug */
  122. }
  123. }
  124. </code>
  125. </pre>
  126. <p>     The 'else if' branch will never execute if your data is sorted in ascending order, and even if you do randomised input testing, your program will never uncover the problem with this line.  This is a huge problem, because if you swap the map implementation out for another one that doesn't return sorted keys, then your code is suddenly going to start running the buggy code path.  At this point, you've complete forgotten about this code, and it is hidden inside a huge monolithic project.</p>
  127. <p>     I'm going to propose my own definition of an <strong>abstraction leak</strong> for use later in this article:</p>
  128. <p class="obvious">An abstraction leak exists when it is possible for an implementation to affect the environment in a way that was not agreed upon in the interface.</p>
  129. <p>     Using this definition, it would seem that nearly <em>every</em> abstraction is leaky, because specifying every environmental effect in the interface is only practical in the most rigorous mathematical systems.  For physical systems, you could probably also make a connection to <a href="https://en.wikipedia.org/wiki/G%C3%B6del%27s_incompleteness_theorems">Gödel's incompleteness theorems</a> here.  The idea that most abstractions are leaky is not unfounded since that is essentially what Joel Spolsky implies with his 'The Law of Leaky Abstractions':</p>
  130. <p class="obvious">"All non-trivial abstractions, to some degree, are leaky."</p>
  131. <p>     Well if every abstraction is leaky why bother talking about it?  Problems only occur when a part of the environment begins to rely on one of these unspecified environmental effects that originated from the system in question.  These are the problematic abstraction leaks that everybody talks about.</p>
  132. <p>     This has far-reaching consequences, not only for casual bugs but also in the security domain.  There is one well-known phrase related to the security of physical systems, where unintended effects from the system leak into the environment in a way that compromises its security: A <a href="https://en.wikipedia.org/wiki/Side-channel_attack">Side-channel attack</a>.  Combining this with the claim that all abstractions are leaky would give the following conclusion:</p>
  133. <p class="obvious">Every physical implementation of a cryptosystem is vulnerable to a side-channel attack.</p>
  134. <p>     Given what we've discussed above, it is not unreasonable to extend this idea to include not only physical implementations, but also emulated ones as well.</p>
  135. <h1 id="quantifying-and-comparing">Quantifying and Comparing Interfaces</h1>
  136. <p>     As we saw above, interfaces in C specify things like the return type, and the number of parameters that can be passed into a function.  But what do interfaces in Python specify?  Note that I'm using the term 'interface' in a way that is consistent with this article, which is likely more general than any literature you've read before on 'interfaces' in Python.</p>
  137. <pre>
  138. <code class="python">
  139. def add_numbers(a,b):
  140. return a + b
  141. print(add_numbers(3,1))
  142. print(add_numbers("abc","def"))
  143. </code>
  144. </pre>
  145. <p>     In Python you don't have to specify the types on the interface to a function.  This has the benefit of making the function easier to define and invoke because there is less information to specify, and the disadvantage of less constraints that can be checked ahead of time (to detect possible programming errors).</p>
  146. <p>     I think there is something to be said about comparing and quantifying the different characteristics of an interface in terms of how many ways you can send information through them.  This could be done for a specific interface, but also from the perspective of all interfaces that can be specified in a given programming language.  It may also be useful for comparing the safety of specific interfaces within the same language.  For the 'add_numbers' example in C, let's consider how much information we can send both through, and around the interface through abstraction leaks:</p>
  147. <table class="characteristic-compare">
  148. <tr>
  149. <td class="compare-side" valign="top">
  150. <h2 class="compare-header">Information Through C Interface</h2>
  151. <table class="characteristic-description">
  152. <tr>
  153. <th><h3>Characteristic Description</h3></th>
  154. <th><h3>Number of Possible States</h3></th>
  155. </tr>
  156. <tr>
  157. <td>Parameter 1 Type</td>
  158. <td>1 (unsigned int)</td>
  159. </tr>
  160. <tr>
  161. <td>Parameter 2 Type</td>
  162. <td>1 (unsigned int)</td>
  163. </tr>
  164. <tr>
  165. <td>Return Value Type</td>
  166. <td>1 (unsigned int)</td>
  167. </tr>
  168. <tr>
  169. <td>Parameter 1 Value</td>
  170. <td>2^(# bits in 'unsigned int')</td>
  171. </tr>
  172. <tr>
  173. <td>Parameter 2 Value</td>
  174. <td>2^(# bits in 'unsigned int')</td>
  175. </tr>
  176. <tr>
  177. <td>Return Value</td>
  178. <td>2^(# bits in 'unsigned int')</td>
  179. </tr>
  180. </table>
  181. </td>
  182. <td class="compare-side" valign="top">
  183. <h2 class="compare-header">Information Around C Interface</h2>
  184. <table class="characteristic-description">
  185. <tr>
  186. <th><h3>Characteristic Description</h3></th>
  187. <th><h3>Number of Possible States</h3></th>
  188. </tr>
  189. <tr>
  190. <td>Global Variable States</td>
  191. <td>(# global variables) * (# global variable states)</td>
  192. </tr>
  193. <tr>
  194. <td>Filesystem</td>
  195. <td># filesystem states</td>
  196. </tr>
  197. <tr>
  198. <td>CPU Time Taken</td>
  199. <td>Unbounded</td>
  200. </tr>
  201. <tr>
  202. <td>Heap State</td>
  203. <td># heap states</td>
  204. </tr>
  205. <tr>
  206. <td>Many Others...</td>
  207. <td>...</td>
  208. </tr>
  209. </table>
  210. </td>
  211. </tr>
  212. </table>
  213. <p>     And these are the number of things that can be communicated through the python interface to 'add_numbers'</p>
  214. <table class="characteristic-compare">
  215. <tr>
  216. <td class="compare-side" valign="top">
  217. <h2 class="compare-header">Information Through Python Interface</h2>
  218. <table class="characteristic-description">
  219. <tr>
  220. <th><h3>Characteristic Description</h3></th>
  221. <th><h3>Number of Possible States</h3></th>
  222. </tr>
  223. <tr>
  224. <td>Parameter 1 Type</td>
  225. <td>practically infinite</td>
  226. </tr>
  227. <tr>
  228. <td>Parameter 2 Type</td>
  229. <td>practically infinite</td>
  230. </tr>
  231. <tr>
  232. <td>Return Value Type</td>
  233. <td>practically infinite</td>
  234. </tr>
  235. <tr>
  236. <td>Parameter 1 Value</td>
  237. <td>practically infinite</td>
  238. </tr>
  239. <tr>
  240. <td>Parameter 2 Value</td>
  241. <td>practically infinite</td>
  242. </tr>
  243. <tr>
  244. <td>Return Value</td>
  245. <td>practically infinite</td>
  246. </tr>
  247. </table>
  248. </td>
  249. <td class="compare-side" valign="top">
  250. <h2 class="compare-header">Information Around Python Interface</h2>
  251. <table class="characteristic-description">
  252. <tr>
  253. <th><h3>Characteristic Description</h3></th>
  254. <th><h3>Number of Possible States</h3></th>
  255. </tr>
  256. <tr>
  257. <td>Global Variable States</td>
  258. <td>(# global variables) * (# global variable states)</td>
  259. </tr>
  260. <tr>
  261. <td>Filesystem</td>
  262. <td># filesystem states</td>
  263. </tr>
  264. <tr>
  265. <td>CPU Time Taken</td>
  266. <td>Unbounded</td>
  267. </tr>
  268. <tr>
  269. <td>Heap State</td>
  270. <td># heap states</td>
  271. </tr>
  272. <tr>
  273. <td>Many Others...</td>
  274. <td>...</td>
  275. </tr>
  276. </table>
  277. </td>
  278. </tr>
  279. </table>
  280. <p>     Now if you take a look at the types of interfaces we can describe in Haskell (Thanks to <a href="https://github.com/hudon">James Hudon</a> for reviewing this, since I barely know any Haskell):</p>
  281. <pre>
  282. <code class="python">
  283. add_numbers :: Int &gt; Int -&gt; Int
  284. add_numbers 3 4 = 7
  285. main = print (add_numbers 3 4)
  286. </code>
  287. </pre>
  288. <p>     With the above Haskell code, the interface 'add_numbers' can accept the following information:</p>
  289. <table class="characteristic-compare">
  290. <tr>
  291. <td class="compare-side" valign="top">
  292. <h2 class="compare-header">Information Through Haskell Interface</h2>
  293. <table class="characteristic-description">
  294. <tr>
  295. <th><h3>Characteristic Description</h3></th>
  296. <th><h3>Number of Possible States</h3></th>
  297. </tr>
  298. <tr>
  299. <td>Parameter 1 Type</td>
  300. <td>1 (Int)</td>
  301. </tr>
  302. <tr>
  303. <td>Parameter 2 Type</td>
  304. <td>1 (Int)</td>
  305. </tr>
  306. <tr>
  307. <td>Return Value Type</td>
  308. <td>1 (Int)</td>
  309. </tr>
  310. <tr>
  311. <td>Parameter 1 Value</td>
  312. <td>1 (the value 3)</td>
  313. </tr>
  314. <tr>
  315. <td>Parameter 2 Value</td>
  316. <td>1 (the value 4)</td>
  317. </tr>
  318. <tr>
  319. <td>Return Value</td>
  320. <td>at least 2^30[<a href="https://en.wikibooks.org/wiki/Haskell/A_Miscellany_of_Types">1</a>]</td>
  321. </tr>
  322. </table>
  323. </td>
  324. <td class="compare-side" valign="top">
  325. <h2 class="compare-header">Information Around Haskell Interface</h2>
  326. <table class="characteristic-description">
  327. <tr>
  328. <th><h3>Characteristic Description</h3></th>
  329. <th><h3>Number of Possible States</h3></th>
  330. </tr>
  331. <tr>
  332. <td>CPU Time Taken</td>
  333. <td>Unbounded</td>
  334. </tr>
  335. <tr>
  336. <td>CPU/Memory Cache effects</td>
  337. <td>Unbounded</td>
  338. </tr>
  339. <tr>
  340. <td>Possibly Others...</td>
  341. <td>...</td>
  342. </tr>
  343. </table>
  344. </td>
  345. </tr>
  346. </table>
  347. <p>     For a specific interface in a given language, you can quantify a couple different things:</p>
  348. <ul>
  349. <li>The number of unique ways you can communicate information through the interface</li>
  350. <li>The number of unique ways you can communicate information around the interface through abstraction leaks</li>
  351. </ul>
  352. <p>     From the perspective of programming languages you can also make observations about</p>
  353. <ul>
  354. <li>How restrictive the language lets you be about how much or how little information goes through an interface</li>
  355. <li>What tools the language provides you with for preventing communication around the interface.</li>
  356. </ul>
  357. <p>     If you extend the same type of analysis to other interfaces, like for example graphical user interface where you can change directories:</p>
  358. <img src="https://s3.amazonaws.com/re-software-images/folders.jpg" alt="Interface: Intersection between system and environment"/>
  359. <table class="characteristic-compare">
  360. <tr>
  361. <td class="compare-side" valign="top">
  362. <h2 class="compare-header">Information Through GUI</h2>
  363. <table class="characteristic-description">
  364. <tr>
  365. <th><h3>Characteristic Description</h3></th>
  366. <th><h3>Number of Possible States</h3></th>
  367. </tr>
  368. <tr>
  369. <td>Click On Folder 1</td>
  370. <td># of pixels Folder 1 takes on screen * number of clicks</td>
  371. </tr>
  372. <tr>
  373. <td>Click On Folder 2</td>
  374. <td># of pixels Folder 2 takes on screen * number of clicks</td>
  375. </tr>
  376. <tr>
  377. <td>Hover On Folder 1</td>
  378. <td># of pixels Folder 1 takes on screen</td>
  379. </tr>
  380. <tr>
  381. <td>Hover On Folder 2</td>
  382. <td># of pixels Folder 2 takes on screen</td>
  383. </tr>
  384. <tr>
  385. <td>Time between hover/click events</td>
  386. <td>Infinite</td>
  387. </tr>
  388. <tr>
  389. <td>Common Keyboard events</td>
  390. <td># common key combinations</td>
  391. </tr>
  392. <tr>
  393. <td>GUI Screen Area</td>
  394. <td># Pixels used for GUI display</td>
  395. </tr>
  396. </table>
  397. </td>
  398. <td class="compare-side" valign="top">
  399. <h2 class="compare-header">Information Around GUI</h2>
  400. <table class="characteristic-description">
  401. <tr>
  402. <th><h3>Characteristic Description</h3></th>
  403. <th><h3>Number of Possible States</h3></th>
  404. </tr>
  405. <tr>
  406. <td>Hidden UI features</td>
  407. <td>Unlimited</td>
  408. </tr>
  409. <tr>
  410. <td>Non-standard keyboard shortcuts</td>
  411. <td># of pixels Button 2 takes on screen</td>
  412. </tr>
  413. <tr>
  414. <td>Other unexpected UI features</td>
  415. <td>...</td>
  416. </tr>
  417. </table>
  418. </td>
  419. </tr>
  420. </table>
  421. <p>     And if you review the same task of changing directories performed on the command line using 'cd':</p>
  422. <table class="characteristic-compare">
  423. <tr>
  424. <td class="compare-side" valign="top">
  425. <h2 class="compare-header">Information Through GUI</h2>
  426. <table class="characteristic-description">
  427. <tr>
  428. <th><h3>Characteristic Description</h3></th>
  429. <th><h3>Number of Possible States</h3></th>
  430. </tr>
  431. <tr>
  432. <td>Number of possible directories typed</td>
  433. <td>Unlimited</td>
  434. </tr>
  435. </table>
  436. </td>
  437. <td class="compare-side" valign="top">
  438. <h2 class="compare-header">Information Around GUI</h2>
  439. <table class="characteristic-description">
  440. <tr>
  441. <th><h3>Characteristic Description</h3></th>
  442. <th><h3>Number of Possible States</h3></th>
  443. </tr>
  444. <tr>
  445. <td>Environment Variables</td>
  446. <td>Unlimited</td>
  447. </tr>
  448. </table>
  449. </td>
  450. </tr>
  451. </table>
  452. <p>     For the information sent through GUIs and the command line, there is actually another piece of data that I didn't include in the above tables:  The amount of noise in the signal.  If you consider how hard it is to exactly repeat a sequence of keyboard strokes (key by key) versus a sequence of mouse movements (pixel by pixel), you'll note that there is always way more error in the data you get from a mouse movement or click versus a keyboard stroke.  GUIs compensate for this by making the semantics they accept more non-specific.  Can you imagine if the clickable area on "OK" and "Cancel" buttons was only 1 pixel wide?  In addition, this analysis can get even more complex when you consider how the error rates change for differently abled individuals.</p>
  453. <p>     Now that I've reviewed one possible way to quantifying and comparing interfaces, I'll make a few extrapolations from these examples and my own personal experience:</p>
  454. <ul>
  455. <li>Human beings tend to prefer interfaces that aren't very specific about the information they accept, especially when they are unfamiliar with that interface</li>
  456. <li>Interfaces that aren't very specific about what information they accept are prone to being misused.</li>
  457. <li>Catch-all interfaces that accept large amounts of information are seen as powerful, but are often misused.</li>
  458. <li>Humans tend to communicate information around an interface when communication becomes tedious.</li>
  459. <li>Communicating around an interface through abstraction leaks is very prone to undesirable surprises.</li>
  460. </ul>
  461. <h1 id="leaky-and-specific-interfaces">Leaky and Specific Interfaces</h1>
  462. <p>     I'm going to make a lot of observations based on the analysis in the previous section, so I'll define a couple terms for the sake of clarity:</p>
  463. <p class="obvious">A <strong>Leaky</strong> interface exists when the interface is prone to being ignored during any communication between the system and the environment.</p>
  464. <p class="obvious">An interface is <strong>Specific</strong> if is has a relatively small number of possible inputs and outputs.</p>
  465. <p>     For more details on <strong>Leaky</strong> interfaces, consult the section on <a href="#what-is-abstraction-leak">abstraction leaks</a>.  A good example of what I mean by a <strong>Specific</strong> interface would be piecewise defined functions, defined only for a very small number of inputs.</p>
  466. <p>     If you can meaningfully quantify how 'leaky', or 'specific' interfaces are, I think it is worth defining a spectrum where interfaces that are very specific and non-leaky are on one end, and non-specific and leaky interfaces are on the other:</p>
  467. <img src="https://s3.amazonaws.com/re-software-images/interface-spectrum.gif" alt="Interface Spectrum"/>
  468. <p>     There are probably reasonable arguments to move any of the items in the spectrum above either more to the right, or the left, but you get the idea.  Note that you could probably split this up into two spectrums:  One for how much the interfaces allow for 'leaky' abstractions, and one for how specific the interfaces are, although in general these two concepts seem to be correlated.  Another correlation that I would propose based on my experience is that 'errors' that come from tools on the "Non User Friendly" end of this spectrum are less frequent, and when they do happen, they are more likely to be caused by failures in <a href="https://en.wikipedia.org/wiki/Software_verification_and_validation">validation</a>.  For the "User Friendly" end of the spectrum, errors are more frequent, and more likely to be <a href="https://en.wikipedia.org/wiki/Software_verification_and_validation">verification</a> errors.</p>
  469. <h1 id="complexity-of-technical-debt">The Asymptotic Complexity of Technical Debt</h1>
  470. <p>     I'm going to start this section with a claim:</p>
  471. <p class="obvious">The majority of technical debt in a project originates from an inappropriate reliance on abstraction leaks, or a reliance on extremely non-specific interface contracts that have difficult to foresee consequences.</p>
  472. <p>     When a project starts there are only one or two modules, and the amount of work you need to do to specify a good interface contract is O(1). If you design a bad interface, the amount of technical debt you will create is O(1) too, so there is not much payoff to taking the time to get the interface contract right.  But as the number of modules increases linearly, the worst-case number of inter-module communications increases according to O(N^2).  Therefore, if you make bad interface contracts, the worse case number of invocations to these bad interface contracts will scale according to N^2 (if every module talks to every other module.).</p>
  473. <img src="https://s3.amazonaws.com/re-software-images/technical-debt.gif" alt="Technical Debt"/>
  474. <p>     In the above graph, you can see that it is initially less work to avoid creating well-defined interfaces, however, this advantage is quickly overtaken because of the fact that inter-module communication problems will occur at a rate that is polynomial in the number of modules, whereas the work required to create good interface specifications is linear in the number of modules.  The polynomial cost comes from considering the handshake problem where in a worst case, every module communicates with every other module.  Obviously, the average project will have communication requirements that scale at a rate of less than O(n^2), but it will definitely be more than O(n).  There is also another factor that deceivingly shifts the rapid increase off into the future:  The human memory.  When you first start out, even if you have 20 modules, you can probably keep in your head what all of them do, so vague function names and esoteric conventions are all the contracts that you need.  Once the project gets large enough that you forget these, or you bring in someone else, the polynomial cost always dominates.</p>
  475. <h1 id="why-still-command-line">Why do People Still Use the Command Line?</h1>
  476. <p>     When you ask this question, people generally give a few different answers, none of which are what I consider to be the most important:</p>
  477. <ul>
  478. <li>It is very powerful and flexible.</li>
  479. <li>The command line uses less resources.</li>
  480. <li>Using the command line gives you more understanding about how things work under the hood.</li>
  481. </ul>
  482. <p>     The most important reason why we still use the command line is <strong>AUTOMATION</strong>!  It is difficult to overestimate the productivity gain you get by automating tasks.  If I need to launch a cluster of 100 servers, are you going to log into each server and manually install your software stack by clicking on a bunch of GUIs?  Even if you want to automate the task of clicking on the GUIs, you'd need some kind of file storage that remembers how and where to click on things.  Some sort of file full of flexible... commands.</p>
  483. <p>     Another general relationship I'll make with the <a href="#quantifying-and-comparing">section on quantifying and comparing interfaces</a>, is that even though we could automate things through automated clicks and screen grabbers, this type of communication is designed for humans, and thus it exposes a very non-specific interface that does not allow you to be very precise.  The result is that your automatic clicker is very likely to get stuck on a screen because a window moved when it didn't expect it to, or perhaps a colour or font changed.  There are too many variables with a GUI.  With the command line, everything is much more precise, and you communicate everything through a very narrow unforgiving interface, which is why many humans don't like it but other computer programs do.</p>
  484. <p>     There are of course, situations where the imprecise communication of a GUI is a virtue.  For example, when doing graphic art work you generally don't care about specifying every individual pixel shade and colour, but you do want <em>something</em> to be specified for every pixel.  In this situation the noise from any motions of your hand as you move the cursor actually become meaningful informational content for the final product.</p>
  485. <h1 id="choosing-a-language">Choosing the Right Language</h1>
  486. <p>     If you read the section on <a href="#complexity-of-technical-debt">the asymptotic complexity of technical debt</a> you might come away with the impression that you should always start a project in a language with very specific interface contracts, like Haskell or Java.  That is not at all the message I want to convey.  If you're making a decision about what language to use for a particular project, I think this question is likely to help:</p>
  487. <p class="obvious">How likely is it that the requirements for the project will change?</p>
  488. <p>     If you're starting a business, the answer is almost certainly going to be 'very likely', especially if you're building a small product from scratch and you're still establishing product market fit.  If you already know exactly what the requirements are, for example, if you're building a compiler, or something based on an international standard, then you'd probably answer 'not very likely'.</p>
  489. <p>     If you answered 'very likely' to this question, then you would probably want to go with a language that doesn't cause you to waste a lot of time specifying interface contracts, because they will likely work against you when the requirements change.  After all, the goal at this stage isn't to get the perfect implementation of the requirements, it is to get the perfect requirements so you can <em>start</em> the final implementation.  An exception to this would be if your MVP actually consists of a huge system that is likely to have hundreds of modules.  If there are already many people involved in building the software then good interface contracts will be necessary to prevent them from stepping on each others toes.</p>
  490. <p>     If you answered 'not very likely' to this question, then you should probably start off with a language that has very strong interface contracts.  It will be more work to get started, but it will also be less work to add new features on day number 1523.  The only exception, would be if you're writing something small (say a few hundred lines).</p>
  491. <p>     Back in the day there was much discussion about how Twitter started out using Ruby on Rails, and then later encountered a number of scaling issues because of this.  They later switched to using Scala.  Some would probably claim that this represented a failure, and that the right decision was to pick Scala all along.  I don't believe this is true.  The idea of Twitter itself is extremely simple, so with a number of potential competitors their initial primary goal was to gain enough market share to be dominant.  They needed to grow as fast as possible at all costs.  This meant iterating on features as fast as possible to figure out what product people actually want them to build.  The scaling issues are not a symptom of failure, but a symptom of success.  The vision for what the 'product' of Twitter actually was articulated, all that was left was to build it.  From the perspective of developers, this is a nirvana like situation that every programmer dreams about, but never experiences:  When your boss says "Re-write this crappy code from scratch in your favourite language in whatever way you want so that it is easier to work on later."  Re-writing something from scratch when you have a weaker reference implementation is not as much work as figuring out what product you actually need build to start a rocket ship company.  Unfortunately, most companies only see this type of switch as an unnecessary cost meant to scratch a programmer's obsessive itch, and waste a lot of time trying to scale something that just wasn't meant to scale.</p>
  492. <h1 id="why-python-is-popular">Why is Python So Popular?</h1>
  493. <p>     In the section on <a href="#leaky-and-specific-interfaces">leaky and specific interfaces</a>, I discussed how you can classify interfaces according to how prone they are to abstraction leaks, and how specific the interface definitions can be.  I also pointed out the fact that languages that are considered more 'user friendly' and 'productive' are often the ones that on the highly leaky, and non-specific end of this spectrum.</p>
  494. <p>     I claim that the reason Python is so popular is because it is an excellent introductory language due to its extremely lean interface contracts.  This is also the same reason that Python becomes difficult to maintain as the size of a project increases.</p>
  495. <p>     Python is also very popular in the scientific community, and for people experimenting with numerical computation.  The very nature of experimentation requires that you constantly iterate on the design of what you're building, and for this reason more specific interfaces would slow down the experimentation process.</p>
  496. <h1 id="why-does-enterprise-use-java">Why is Enterprise Software Usually Java/C++?</h1>
  497. <p>     I claim that the reason is exactly the opposite reason as in the previous section on 'Why Python is So Popular?'.  In the section on <a href="#leaky-and-specific-interfaces">leaky and specific interfaces</a>, I discussed the tradeoffs related to different types of interfaces.  Interfaces in Java and C++ fall more on the specific end of the spectrum than those found in other languages like Python or Ruby.  C++ and Java still can be leaky, and of course there are even more specific languages like Haskell, but Java and C++ seem to strike a balance between scalabilty, user friendliness, and iteration time.  These languages also provide the programmer with flexibility in how leaky they would like their interfaces to be as a matter of project convention.  An example would be how you can make variables or functions private, public, or protected, depending depending on the needs of the project.</p>
  498. <h1 id="how-to-cut-corners">How to Cut Corners Efficiently</h1>
  499. <p>     If there is one thing you should take away from this article, it is this: If you have to cut corners in your project, do it inside the <em>implementation</em>, and wrap a <strong>very good</strong> interface around it.  You might be thinking that if the implementation is bad enough, then the problems in that implementation can leak to other parts of your system, but they shouldn't!  If they do, I would call that bad interface design!  For the sake of clarity, I'll explicitly list out what I mean my 'interfaces' here:</p>
  500. <ul>
  501. <li>Function Protoypes</li>
  502. <li>Java 'Interfaces'</li>
  503. <li>Public Class Methods</li>
  504. <li>Public Member Variables</li>
  505. <li>Header (.h) files in C/C++.</li>
  506. <li>RESTful API endpoints</li>
  507. <li>URL Routing</li>
  508. <li>Publicly visible aspects of 'Modules' or 'Packages'</li>
  509. <li>Database Schema (DDL)</li>
  510. <li>Many Others...</li>
  511. </ul>
  512. <h1 id="conclusion">Conclusion</h1>
  513. <p>     As you can see, the concept of an 'interface' is an incredibly important one with a variety of far reaching consequences.  There are legal consequences, productivity consequences, and a number of very philosophical connections you can make to other aspects of system design.  Ask any experienced programmer what they think about interfaces and you're likely to get an earful.</p>
  514. <p>     A final thanks to <a href="https://github.com/hudon">James Hudon</a> who provided some feedback and corrections to this article.</p>