The atomic human, p.7
The Atomic Human, page 7
The trap we have fallen into is to assume that a machine-intelligence solution should copy the human approach to the problem. It triggers a form of introspection that misleads about the way a machine could complete the task. We make the mistake of assuming that the machine’s solution should be human-like. We like to share information through narratives, but computers are much more comfortable sharing information through statistics. This means the machine has shortcuts which are not available to us. These shortcuts go to the heart of how machine learning works.
Radio operators have habits, and it turns out many of these habits are shared across different radio operators. We can make predictions about what radio operators are likely to do based on knowledge of what they have done in the past. We use past examples to teach the machine. The machine doesn’t need to understand humans to discern patterns in human behaviour. It can just use the statistics about how a human has behaved in the past to predict how they will behave in the future.
Bletchley Park didn’t have the technology to use machines to automate the process of producing cribs, but the codebreakers did systematize the process of crib production, and their system can help us understand the techniques Amazon and Facebook use today in their machine-learning algorithms.
Bletchley Park codebreakers used the historic behaviour of different radio operators in different units to predict what their future behaviour might be. They kept paper files which contained words commonly used by these operators. The foibles of the different operators were stored in the filing system, and when a prediction for a new crib was required codebreakers could look up these previous behaviours to get an idea of suitable cribs. In machine learning we would call this approach a bag-of-words model, because the system stores the individual words the different operators use. Bag-of-words models are a simple form of machine learning; early spam filters for email used them. In Bletchley Park these words were stored in filing cabinets; in machine learning we store the words in the computer.
Much of the automatable cognitive labour at Bletchley Park was performed by human operators, whether it was filing or computing, but the overall organization was also structured in a way that made information processing more efficient. Complex problems were decomposed into simpler tasks. Those tasks were completed by humans like Diana or machines like the bombes. Henry Ford mass-produced an affordable car by constructing an assembly line – this involves decomposing the different processes that are needed to produce the car into repeatable activities, each of which is performed at a different place in the line. By decomposing the decryption problem, Bletchley Park created an assembly line for information. Modern artificial intelligence systems work in much the same way. In computer science, this strategy is known as divide and conquer, but the strategy was well known across the Industrial Revolution. Charles Babbage’s work On the Economy of Machinery and Manufactures was published in 1832. So even in the 1940s these ideas were not new, but the big difference between the 1940s and today is the availability of data and the power and flexibility of modern computing machines.
The story of Bletchley Park is the story of decryption on an industrial scale. The machines that were built there, for specific tasks, were much faster than humans. Authority was devolved to machines to complete these jobs, but these were not tasks where there was any room for judgement or interpretation. These tasks were ‘information complete’ – the answer could be deduced given the information available without resorting to judgement. Autonomy was not devolved in the way it was by Eisenhower and Bezos in running the Allied armies and Amazon. Bletchley Park had a very rigid information topography; it was organized into different huts that worked on different German codes. Across the whole process, just as the storage of cribs was systemized, the processing of information was systemized. As in many factories, this production line was operated by a combination of humans and machines: machines were used to automate the highly repetitive tasks, and humans filled in as necessary to set up the machines or pass the information between the different processes. This pattern is common in automation. The machine rigidly performs repetitive tasks and humans’ flexible capabilities are used to bridge the gaps.
The secret messages that were being processed at Bletchley Park were also constructed using machines: the Germans based most of their military codes on a machine called the Enigma. Reports and orders were sent via radio Morse code, one letter at a time. To encrypt the messages, the Enigma mapped each letter from each word in the original message to different letters in a coded message. In cryptography, the original message is called the plaintext and the coded message the ciphertext.
At Bletchley Park we can already see three components to an information assembly line: first develop an insight that gives you a clue that reduces the search space for the puzzle; at Bletchley these were the cribs. They used their contextual knowledge for this. The second step is to translate that insight into terms the machine can understand and compute with. This second job was given to the female operators, and today we would call it ‘programming’. It involved setting up the machine to handle the information in the crib. Finally, the machine was used to exhaustively search through the different settings to solve the puzzle. These three steps remain similar for information-processing systems today. The change since the 1940s is the extent to which we have been able to devolve authority to the machine.
This process of breaking down a task into parts is called decomposition. It’s second nature to computer scientists to take any complex process and decompose it. Once decomposed, each part can be independently automated. The ideas came to computer science from manufacturing. Production lines process objects; computers process information. In both cases, humans are integrated in the processing. This brings us into a relationship with the machine. In a physical production line, humans set up the machines and accommodate any irregular circumstances, such as a manufacturing defect or any repairs for the machine. In an information assembly line we translate the ideas we have into terms the machine can understand. We program the machine according to that understanding. Across history, this has been our relationship with automation. Mechanical production lines require flexible and adaptable humans to service them, and until recently our information-processing pipelines have also required flexible human cognition to interpret and translate the ideas for processing. In the past, the machine could not adapt to us because it had no sense of who we are and how to communicate with us. Until recently, all the intelligences we created were designed for specific goals. In this sense, they are directed intelligences – they solve a particular task that is part of a larger whole. They are given the context of the task: the inputs and the necessary outputs. The performance of the solution they provide can then be measured, for example by how quickly the problem is solved. Automatic decryption gives us the background to how these intelligences came about: in brute-force decryption the goal is clear, but for our own intelligence there is often more ambiguity about what the goal should be. For a brute-force solution such as exhaustively exploring possible combination keys we can definitively say that the machine performs better than the human. This was urgent work – the phases of the Battle of the Atlantic shifted with the changes in the codebreakers’ abilities to read the messages. Tons of shipping and thousands of lives depended on the ability of the people and machines at Bletchley Park to read the Enigma.
The Enigma machine hid plaintext messages in the form of a substitution cipher. Each letter from the original message in this cipher is mapped to a different letter in the locked ciphertext. So ‘A’ might be mapped to ‘G’, ‘B’ might be mapped to ‘N’, and so on. The statistical attack that is used to break a substitution cipher is known as frequency analysis. Frequency analysis works by noticing, for example, that ‘E’ is the most common letter in German. If ‘E’ is the most common letter in the plaintext, then if we substitute all the ‘E’s for a different letter, for example ‘X’, then ‘X’ will be the most common letter in the ciphertext. This attack is not guaranteed to work because it may be an unusual message without the letter ‘E’ in it, but it’s likely to work, which is why it’s called a statistical attack.
The Enigma machine had a keyboard with the twenty-six letters of the alphabet printed on the keys, and every time you pressed a key the machine would change the substitution letter. From a mathematical perspective, it was implementing a function. In mathematics, a function maps one set of values to another set of values. Like a machine, a function has inputs and it has outputs. But in a function those inputs and outputs are in the form of symbols or, more usually, numbers.
Sorry for the maths – you don’t have to follow it in detail, but just get the sense that machines can be real-world implementations of mathematics. Sometimes it’s obvious when this is happening, like with a calculator. But sometimes it’s not so obvious, like with the Enigma machine.
What is an example of a simple function? Well, we can think about the simplest machine: one that does nothing.3 In mathematics that function is known as the identity function. It takes numbers as inputs and gives the same numbers as outputs, meaning 1→1, 2→2. Another relatively simple function is one that turns numbers upside down. This is called the reciprocal function, in which 2→½, 3→⅓, and so on.
The idea of using machines to implement mathematical functions is at the core of machine learning. Perhaps that now gives you some more insight into the term ‘machine learning’. At one level, it is quite misleading: machine-learning engineers don’t get covered in oil and wear overalls, they work with mathematics and computation – because it’s not really machines that do the learning, it’s mathematical functions implemented on computers that do the learning.
In the Enigma machine the input values are the original letters of the message, and the output values are the substituted letters of the secret message. The Enigma machine implements the mathematical function through electrical wires going from the keyboard letters to a set of light bulbs.
The Enigma machine provides a function that takes in a message and outputs an encrypted message. Modern machine-learning methods allow us to create machines which have mathematical functions that can ‘see’. By that I mean they can detect faces or other objects in images. To allow machines to detect faces and objects, we have to decide what numbers represent which faces and objects. We call these numbers labels, then we create a mathematical function that takes in images and outputs labels for the objects. Even large-language models like ChatGPT have a function that takes words and phrases as input, then gives new words and phrases as the output. If the input is a question or a comment, and the output is a response, then you have a chatbot. We can even do this across languages to automatically translate them – the words or letters in a sentence are converted into a set of numbers, then we learn a mathematical function that converts those numbers from the original language to the right numbers for French, Swahili, German, Mandarin or a range of other languages. A big difference is the complexity of the machine. The mathematical functions we use today would be difficult to write down; they are implemented in modern digital computers. The Enigma machine also translated letters from the plaintext to the ciphertext (and back), but the functions it used to do this were simple enough that they could be packaged a in portable box using a combination of electrical and mechanical technologies already available in the 1920s.
The Enigma machine had a set of rotors. When the rotors were in a fixed position, the machine performed a substitution cipher, but if that’s all it did, it would have made the machine vulnerable to frequency analysis, a statistical attack for breaking the code. To prevent this, the substitution changed with every key press. The rotors would turn every time the operator pressed a key, changing the wiring and substituting the letters differently. By using three rotors together, the Enigma machine could cycle through 18,000 different substitution ciphers.
To complicate the code further, the three rotors were interchangeable: they could be placed in any order. The machine also came with two spare rotors: there were five possible rotors in total. Different rotors would be used on different days. Setting the machine up involved picking the rotors, selecting their position and connecting the keyboard through a set of plugs. All these settings together are the equivalent of the combination key setting for the mechanical lock. The combination of different settings meant that instead of the 10,000 different key settings of a mechanical combination lock, the three-rotor military Enigma had over 150 billion billion possible keys.
The cribs gave clues to the codebreakers which reduced the number of keys that needed to be searched, but even with this reduction the number of keys was more than Diana Russell-Clarke and the other human computers could explore on their own. So Alan Turing designed the bombes to do this repetitive work – large machines that could automatically test different settings of the Enigma.
The social network of the 1930s was the telephone. In the 1940s Claude Shannon developed his theory of information in response to the demand for telephone lines and exchanges. When telephone exchanges were first built, human operators switched the lines to connect your call. But by the late 1930s many telephone exchanges had been automated. Switches called relays were automatically opened and closed by magnets to direct calls to the correct destination, automating the operation. These operations could be accurately modelled with logic. The relays operated as automatons; there was no space for devolved autonomy or interpretation. No risk of relays going off track to visit families when they were supposed to be waiting in Hampshire. This wave of automation relied on complete information.
The same relays were used by Turing in the bombes to automatically test different combinations of keys to decode messages. Turing had designed a machine that used these relays to automate the brute-force attack. It provided a simple automation of what had been human labour. The instructions for these machines became known as the program, and that’s why today we still talk about programming computers.
The Bletchley Park operation was an information factory. But in devolving cognitive labour to the machine, there was no room for innovation, no tolerance for disobedience; there was no autonomy or judgement associated with these machines.
The Bletchley Park information topography fed a much larger ecosystem of decision-making, and on 5 June 1944 Eisenhower was at the hub of that system. But unlike the automatons that ran the brute-force calculations in Bletchley, Eisenhower didn’t have complete information, he only had ‘the best information available’. His decision required judgement: at the time he made it he knew he could be wrong, and being wrong would have dreadful consequences for thousands of soldiers and the long-term course of the war. Judgements of this form remained firmly the preserve of the human.
3.
Intent
My first job after university was as a field engineer working on oil rigs for a company called Schlumberger. The graduate-recruitment brochure for the company spoke of high salaries, large bonuses and exotic locations. The role also promised a form of ‘supreme power’. I would be managing a small team, as well as expensive equipment. Having read the brochure, I imagined myself wandering in Egyptian deserts, floating on Nigerian oceans and sweating in Colombian jungles. I would stride the planet and measure the earth. In practice, my experience was more mundane. I was deployed to two faded English seaside resorts: Morecambe and Great Yarmouth. I worked on oil rigs in Liverpool Bay, gas rigs off Humberside and a series of land-based wells across England. The only common feature of each of these locations seemed to be that they each required a six-hour cross-country drive from wherever I was based.
The job was tiring, but it also involved a lot of waiting. The centre of activity on an oil rig is the drill floor. From there the driller threads long lengths of pipe into the earth to excavate the borehole and access the oil. My work required me to have full access to make my measurements. When I worked, the rig waited. Conversely, when the driller and his crew worked, I waited. These idle times combined with cross-country drives left time for thought and reading. My job was to measure the rock formations below the rig. The power delegated to me was responsibility for a large computer, a winch, an explosives store, a set of radioactive sources, the measurement tools to run into the well, and my team of two operators who helped to set everything up. Once I had access to the drill floor, I worked closely with my operators to prepare my tools and then with a geologist to identify formation types and the location of any oil or gas. The geologist would analyse my measurements and declare the results in a report.
The geologist was a representative of the oil company, and, while I had power over my team, the oil company was our customer, so the geologist had power over me. But just as things worked better in my team if I worked collaboratively with my operators, so things worked better with the geologist if we worked in collaboration. To help with this, I learned aspects of the geologist’s role. Specifically, I began to understand how the rock formations were identified. Sometimes the rules seemed very simple, so I began to wonder why the computer I was using couldn’t do this work for us.
Our tools measured properties of the rocks: their density, their porosity, their electrical conductivity. These properties revealed whether the formation was formed from limestone, sandstone or mudstone. But just as I thought I’d understood one of the geologist’s rules, an exception would emerge. The rules were never hard and fast – there was always some nuance or context the geologist was considering. It reminded me of my attempts to learn languages at school. I would be taught a set of rules, and then when I tried to speak or write the language I would encounter a set of exceptions. This nuance came naturally to the geologist, just like the vagaries of language come naturally to a native speaker, but, like in my struggles with French at school, I stumbled as I struggled to formalize my understanding.
