title: Software is Increasingly Complex. That Can Be Dangerous.
Marc Andreessen has said software is eating the world. Maybe it’s not eating the world, but every day, software becomes ever more important for the functioning of the world as we know it. The complexity of that software also keeps growing, with new bugs popping up like multi-headed hydras in systems we expect to “just work” all the time.
The Apollo 11 moonshot was done with about 145,000 lines of code and a lot less computing power than your printer. Today’s Microsoft Windows contains some 50 million lines of code. A Boeing 787 runs on 7 million lines of code, but a modern car actually runs on 10-100 million lines of code. Google’s infrastructure is estimated to have 2 billion lines of code. It takes an army of programmers to build and maintain these systems, but it is increasingly harder to code and test every permutation of what machines and users might do.
All those millions of lines of code are not written overnight, nor are they rewritten for every new release of a system or product. Systems are layered over time, and complexity and “crust” creeps in. Often one of today’s mission critical systems might layer on the shiny veneer of a new mobile app, but still rely on a codebase that’s been around for 20 years.
While there is nothing inherently wrong with the above, new user interfaces and use paradigms tend to surface problems in code for which it was never architected. The new layers inherently trust the older layers underneath, which perhaps have a new modern API grafted on to existing functionality. But a security flaw or a functional flaw in the layer underneath can cause unforeseen bugs. Apple’s recent admin login bug could be an example of old crust, a testing problem, a back door that inadvertently made it into a distribution build, or all of the above, but it shows it happens even at top companies with the best reputations for quality control.
Will software soon become too complex to fix?
Computer researcher Bret Victor, a Cal Tech graduate and former UX designer at Apple, thinks part of the complexity in today’s software is that programmers are divorced from the problem they’re working on. Most of today’s code is still based on constructs of letters and symbols. While they’re far easier to write and understand than yesterday’s assembly language and FORTRAN (going back to that Apollo timeframe), it still forces the programmer to think in terms of only their module’s interfaces and outputs, and not necessarily understanding the use case or the system it fits in. And that model, despite the aids provided by today’s sophisticated development environments (IDEs like Microsoft’s Visual Studio or the open source Eclipse), is still largely how code is developed.
In 2012, Victor’s Inventing on Principle talk at the Canadian University Software Engineering conference went viral. He discussed how programmers need to be able to better visualize what they are creating. In complex systems with millions of lines of code, it might be hard to make that immediate connection, as running a full system build is not exactly like rebuilding an iPhone app. But his point is the model of building software – not just the toolset – needs to change to ensure programmers can actually understand in real time what they’re building, and how changes they introduce affect the final product.
Machine learning and AI may well end up being what “eats the world.” Machine learning is replacing the model of coding for every possible input and outcome in a given application. It’s a game changer, because programmers are developing learning algorithms that gain knowledge from experience with vast quantities of data. In linear coding, humans are programming computers for all the situations they imagine need to be handled. In machine learning, the algorithm is training the machine to deal with situations by simply encountering as many as possible. It’s what’s enabling rapid advances in self-driving car technology, as well as deciding what Facebook posts to show you at any given moment.
But machine learning introduces yet more complexity into the mix. Neural networks are many layers deep, and the algorithm developers don’t always know exactly how they end up at a specific outcome. In a sense, it can be a black box. Programmers are inserting visualizations into neural network algorithms to better understand how the machine “learns” – it’s not unlike trying to understand the unpredictable thought patterns human brains go through in making a decision.
Sometimes, the results can be surprising. An early version of Google Photos’ image recognition algorithm was tagging some African-American faces as gorillas – which despite the racist implication, was simply an algorithm that needed tuning and perhaps a lot more experience with the nuances of certain images. In a world that leans more on machine learning algorithms than linear coding, programmers will have less absolute control over the machine. They’ll need to be more like coaches, teachers, and trainers – teaching the algorithms, like a child, about the environment they operate in and the proper behaviors in it.
As software takes over the world, we are increasingly dependent on things controlled by code. The world used to automate things with mechanical and electrical solutions, physical things we could actually see much of the time. Going back 30 years or more, it was not atypical for people to diagnose at least some simple things that might go wrong with technology. If your car stopped running, you might run through some exercises to see if it’s an alternator, a loose spark plug wire, or something else you might actually see or get to. Some cars today might shut the powertrain down completely based on a sensor detecting a potential problem or a drive-by-wire system failing – but you may have no idea what happened other than the car flashing a warning for you to call your dealer immediately. If your smartphone unexpectedly freezes, and every time you reboot it the same thing happens, do you really know how to fix it? With cloud-based software updates, and the increasingly locked down nature of devices, it’s harder for a user to figure out what’s wrong with a piece of technology they may be utterly dependent upon for communicating with family, navigating, and remembering where they were supposed to be an hour ago.
Our machines will be increasingly controlled by software, not us. If that’s the case, software quality has to improve. Leslie Lamport, a computer scientist now at Microsoft Research, thinks programmers jump into coding too quickly rather than thoroughly thinking through design and architecture. He also postulates that programmers today need to have a better grasp of the advanced math that underlies system theory and algorithms. Indeed, today’s popular Agile approach to software development may exacerbate jumping into code. The Agile methodology advocates building something in a short sprint, getting it to a user base to hammer on it and get feedback, fleshing it out, and iterating that until you have a finished product the users accept. Market pressures also sometimes contribute to companies building new features into systems that millions of people might use and become dependent on, but without adequate testing or understanding the full impact of that functionality on the rest of infrastructure they ride on.
If we’re going to be so dependent on software, we’ll need to make sure we understand what it’s doing. If that software is a machine-learning algorithm, we’ll need to understand what it’s learning from and how to teach it appropriately. Ultimately, we may need better models for building tomorrow’s systems.