How Machine Learning Extends Software Engineering
Published:
A bank transfer system. A stock exchange. A physics simulator. These are classic examples of traditional software engineering: deterministic, rule-based instruction sets written by humans. Machine learning extends traditional software engineering by applying statistics to software decision making. With statistics, software can learn from observations of the world, generating the instruction sets autonomously.
Traditional software involves defining the breadth of cases that a computer may encounter in executing a task, and writing instructions (“programs”) to deal with those cases, ahead of time. Many real-world systems can be simplified and isolated to provide the required specification, allowing the engineer to construct a software solution.
Consider a metro line. In a rail network, there is a fixed number of stations, and a known count of trains, with location and predictable travel time between stations. Our software can therefore simulate the outcome of some change to the system. To determine whether putting a train on another track will cause a collision, the software first checks whether the track is free of other trains. To predict the duration of delays to a service if a given track is shut, the software can search through routing options which use only the remaining tracks. The solution to any problem in our automated rail network is reachable by a series of definite steps. The traditional software engineer writes instruction sets to allow the rail operators to manipulate the state of the network through this interface.
The trouble comes when we are in an environment that is so chaotic that the engineer cannot come up with a series of steps to solve the problem. Consider an entirely different kind of challenge: recognizing a face in a crowded train station along the same metro line we envisioned earlier. At a most basic level, we do not understand the problem. It is human intuition which tells us that it is that person in the familiar red cap who is our friend – not the other chap wearing a red shirt, and not the red fire hydrant box. We recognise our friend based on some murky and complicated combination of checks done in the mind, behind closed doors, inaccessible and nebulous. Facial recognition is closed off to traditional software engineering, since we cannot articulate a set of instructions to perform it.
For the sake of example, let’s try to do facial recognition by instruction anyway. Our input data is a colour image, perhaps a standard HD 1280×720 pixel image. That’s 921,600 pixels, each carrying three colour values between 0 and 255. We need 3 colour values to mix red, green and blue; together, they can create any visible colour. Then each pixel takes one of 256 * 256 * 256 = 16,777,216 = 2^24 possible values. This means there are a total of (2^24)^(921,600) = (2^(24*921,600) = 2^(22,118,400) possible images in a single HD image. Given our poor understanding of what uniquely distinguishes one human face from another, writing a problem to recognise faces reliably across 2^(22,118,400) possible images is a very hard problem. Simple logical operations and for-loops are a no-go.
The principal innovation of modern learning machines is that they write the facial recognition instructions for us. Hence, an entirely new class of problems is opened to automation. At first, the engineer creates a random model of facial recognition. It will be wrong much more than it will be right. The engineer designs an algorithm which progressively adjusts the model based on how accurately it spots our friend’s face. By training on images of faces in varying positions, lighting conditions, and orientations, the model learns to distinguish a face. Is it more reliable to identify a single distinctive attribute, or check for many, less distinctive characteristics? We do not need to explicitly define these distinctions, though we do shape the model’s behaviour through the structure we specify. Ultimately, the machine finds the best solution to our problem for us.
The model’s learning mechanism is typically some sort of iterative adjustment approach. Each time the model errs, its knowledge is pushed away from its current point. Each time the model gets it right, it gets a bit closer. Learning machines vary in complexity, but their core function remains the same: discover key patterns in data too massive and complex for human engineers to wrangle.
In machine learning, the role of the engineer shifts. Traditionally, software engineers design instructions to solve a problem. In machine learning, the engineer designs the model structure and a training program for the model. The engineer then executes the training program and evaluates the quality of the model’s solution. Alongside the primary concerns of structure design, training program design, and evaluation, there are several high-performance engineering considerations. Large models cost millions to train and run thereafter. There is, first, the engineering of systems to split models across many computers during training. Without this, “distributed computing,” training large models would be impossibly slow. There is also the optimisation of training and execution programs at the foundational level: models are generally executed on hardware specialised for fast vector mathematics, requiring specialised knowledge of hardware networking and mathematics at the hardware level.
Machine learning models are prone to inheriting human biases from their training data. Trained on human data, models can adapt and reproduce human biases. Models may be complicated in their construction, making it difficult to tell which (biased) factors are contributing to their outputs. Engineers must carefully evaluate their machines to avoid business and human risk from bad models.
Finally, it is important to note that machine learning is inappropriate for many software applications. If a task is simple, it is generally much easier, more reliable, and cheaper to use traditional software methods. These machines are trained on only a limited set of observations of the environment or system of interest. They are prone to error in unexpected or confusing cases, making them inappropriate for high-risk tasks. Furthermore, they often have idiosyncratic behaviour: poor model design might produce an image recogniser which flips from, “apple,” to “skyscraper,” with the change of a single pixel. These idiosyncrasies make them vulnerable to deliberate manipulation by, “adversarial attack,” introducing a security risk. In many applications, it is more appropriate to continue using traditional software applications.
Machine learning extends the capabilities of computer software, automating extremely complex problems. Instead of designing rigid rule-based systems, engineers now build models that learn patterns on their own. Engineering learning machines involves careful model design, optimising large-scale computation, and analysing these systems for human fairness. The role of the engineer evolves: from writing instructions, to designing the mechanism which discovers them. Though machine learning offers new capabilities, it is more expensive and less reliable than traditional software methods. Nonetheless, it makes solvable problems previously beyond the reach of computerisation.
