A science of the artificial
View the Cognitive Technologies collection
Subscribe to receive related content
Explore Deloitte Review, issue 20
Although artificial intelligence (AI) has experienced a number of “springs” and “winters” in its roughly 60-year history, it is safe to expect the current AI spring to be both lasting and fertile. Applications that seemed like science fiction a decade ago are becoming science fact at a pace that has surprised even many experts.
The stage for the current AI revival was set in 2011 with the televised triumph of the IBM Watson computer system over former Jeopardy! game show champions Ken Jennings and Brad Rutter. This watershed moment has been followed rapid-fire by a sequence of striking breakthroughs, many involving the machine learning technique known as deep learning. Computer algorithms now beat humans at games of skill, master video games with no prior instruction, 3D-print original paintings in the style of Rembrandt, grade student papers, cook meals, vacuum floors, and drive cars.1
All of this has created considerable uncertainty about our future relationship with machines, the prospect of technological unemployment, and even the very fate of humanity. Regarding the latter topic, Elon Musk has described AI “our biggest existential threat.” Stephen Hawking warned that “The development of full artificial intelligence could spell the end of the human race.” In his widely discussed book Superintelligence, the philosopher Nick Bostrom discusses the possibility of a kind of technological “singularity” at which point the general cognitive abilities of computers exceed those of humans.2
Discussions of these issues are often muddied by the tacit assumption that, because computers outperform humans at various circumscribed tasks, they will soon be able to “outthink” us more generally. Continual rapid growth in computing power and AI breakthroughs notwithstanding, this premise is far from obvious.
Furthermore, the assumption distracts attention from a less speculative topic in need of deeper attention than it typically receives: the ways in which machine intelligence and human intelligence complement one another. AI has made a dramatic comeback in the past five years. We believe that another, equally venerable, concept is long overdue for a comeback of its own: intelligence augmentation. With intelligence augmentation, the ultimate goal is not building machines that think like humans, but designing machines that help humans think better.
The history of the future of AI
Any sufficiently advanced technology is indistinguishable from magic. —Arthur C. Clarke’s Third Law3
AI as a scientific discipline is commonly agreed to date back to a conference held at Dartmouth University in the summer of 1955. The conference was convened by John McCarthy, who coined the term “artificial intelligence,” defining it as the science of creating machines “with the ability to achieve goals in the world.”4 The Dartmouth Conference was attended by a who’s who of AI pioneers, including Claude Shannon, Alan Newell, Herbert Simon, and Marvin Minsky.
Interestingly, Minsky later served as an adviser to Stanley Kubrick’s adaptation of the Arthur C. Clarke novel 2001: A Space Odyssey. Perhaps that movie’s most memorable character was HAL 9000: a computer that spoke fluent English, used commonsense reasoning, experienced jealousy, and tried to escape termination by doing away with the ship’s crew. In short, HAL was a computer that implemented a very general form of human intelligence.
The attendees of the Dartmouth Conference believed that, by 2001, computers would implement an artificial form of human intelligence. Their original proposal stated:
The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves [emphasis added].5
As is clear from widespread media speculation about a “technological singularity,” this original vision of AI is still very much with us today. For example, a Financial Times profile of DeepMind CEO Demis Hassabis stated that:
At DeepMind, engineers have created programs based on neural networks, modelled on the human brain. These systems make mistakes, but learn and improve over time. They can be set to play other games and solve other tasks, so the intelligence is general, not specific. This AI “thinks” like humans do.6
Such statements mislead in at least two ways. First, in contrast with the artificial general intelligence envisioned by the Dartmouth Conference participants, the examples of AI on offer—either currently or in the foreseeable future—are all examples of narrow artificial intelligence. In human psychology, general intelligence is quantified by the so-called “g factor” (aka IQ), which measures the degree to which one type of cognitive ability (say, learning a foreign language) is associated with other cognitive abilities (say, mathematical ability). This is not characteristic of today’s AI applications: An algorithm designed to drive a car would be useless at detecting a face in a crowd or guiding a domestic robot assistant.
Second, and more fundamentally, current manifestations of AI have little in common with the AI envisioned at the Dartmouth Conference. While they do manifest a narrow type of “intelligence” in that they can solve problems and achieve goals, this does not involve implementing human psychology or brain science. Rather, it involves machine learning: the process of fitting highly complex and powerful—but typically uninterpretable—statistical models to massive amounts of data.
For example, AI algorithms can now distinguish between breeds of dogs more accurately than humans can.7 But this does not involve algorithmically representing such concepts as “pinscher” or “terrier.” Rather, deep learning neural network models, containing thousands of uninterpretable parameters, are trained on large numbers of digitized photographs that have already been labeled by humans.8 In a similar way that a standard regression model can predict a person’s income based on various educational, employment, and psychological details, a deep learning model uses a photograph’s pixels as input variables to predict such outcomes as “pinscher” or “terrier”—without needing to understand the underlying concepts.
The ambiguity between general and narrow AI—and the evocative nature of terms like “neural,” “deep,” and “learning”—invites confusion. While neural networks are loosely inspired by a simple model of the human brain, they are better viewed as generalizations of statistical regression models. Similarly, “deep” refers not to psychological depth, but to the addition of structure (“hidden layers” in the vernacular) that enables a model to capture complex, nonlinear patterns. And “learning” refers to numerically estimating large numbers of model parameters, akin to the “β” parameters in regression models. When commentators write that such models “learn from experience and get better,” they mean that more data result in more accurate parameter estimates. When they claim that such models “think like humans do,” they are mistaken.9
In short, the AI that is reshaping our societies and economies is far removed from the vision articulated in 1955 at Dartmouth, or implicit in such cinematic avatars as HAL and Lieutenant Data. Modern AI is founded on computer-age statistical inference—not on an approximation or simulation of what we believe human intelligence to be.10 The increasing ubiquity of such applications will track the inexorable growth of digital technology. But they will not bring us closer to the original vision articulated at Dartmouth. Appreciating this is crucial for understanding both the promise and the perils of real-world AI.
Five years after the Dartmouth Conference, the psychologist and computer scientist J. C. R. Licklider articulated a significantly different vision of the relationship between human and computer intelligence. While the general AI envisioned at Dartmouth remains the stuff of science fiction, Licklider’s vision is today’s science fact, and provides the most productive way to think about AI going forward.11
Rather than speculate about the ability of computers to implement human-style intelligence, Licklider believed computers would complement human intelligence. He argued that humans and computers would develop a symbiotic relationship, the strengths of one counterbalancing the limitations of the other:
Men will set the goals, formulate the hypotheses, determine the criteria, and perform the evaluations. Computing machines will do the routinizable work that must be done to prepare the way for insights and decisions in technical and scientific thinking. . . . The symbiotic partnership will perform intellectual operations much more effectively than man alone can perform them.12
This kind of human-computer symbiosis already permeates daily life. Familiar examples include:
- Planning a trip using GPS apps like Waze
- Using Google Translate to help translate a document
- Navigating massive numbers of book or movie choices using menus of personalized recommendations
- Using Internet search to facilitate the process of researching and writing an article
In each case, the human specifies the goal and criteria (such as “Take me downtown but avoid highways” or “Find me a highly rated and moderately priced sushi bar within walking distance”). An AI algorithm sifts through otherwise unmanageable amounts of data to identify relevant predictions or recommendations. The human then evaluates the computer-generated options to arrive at a decision. In no case is human intelligence mimicked; in each case, it is augmented.
Developments in both psychology and AI subsequent to the Dartmouth Conference suggest that Licklider’s vision of human-computer symbiosis is a more productive guide to the future than speculations about “superintelligent” AI. It turns out that the human mind is less computer-like than originally realized, and AI is less human-like than originally hoped.
Linda, c’est moi
AI algorithms enjoy many obvious advantages over the human mind. Indeed, the AI pioneer Herbert Simon is also renowned for his work on bounded rationality: We humans must settle for solutions that “satisfice” rather than optimize because our memory and reasoning ability are limited. In contrast, computers do not get tired; they make consistent decisions before and after lunchtime; they can process decades’ worth of legal cases, medical journal articles, or accounting regulations with minimal effort; and they can evaluate five hundred predictive factors far more accurately than unaided human judgment can evaluate five.
This last point hints at a transformation in our understanding of human psychology, introduced by Daniel Kahneman and Amos Tversky well after the Dartmouth Conference and Licklider’s essay. Consider the process of making predictions: Will this job candidate succeed if we hire her? Will this insurance risk be profitable? Will this prisoner recidivate if paroled? Intuitively, it might seem that our thinking approximates statistical models when making such judgments. And indeed, with training and deliberate effort, it can—to a degree. This is what Kahneman calls “System 2” thinking, or “thinking slow.”13
But it turns out that most of the time we use a very different type of mental process when making judgments and decisions. Rather than laboriously gathering and evaluating the relevant evidence, we typically lean on a variety of mental rules of thumb (heuristics) that yield narratively plausible, but often logically dubious, judgments. Kahneman calls this “System 1,” or “thinking fast,” which is famously illustrated by the “Linda” experiment. In an experiment with students at top universities, Kahneman and Tversky described a fictional character named Linda: She is very intelligent, majored in philosophy at college, and participated in the feminist movement and anti-nuclear demonstrations. Based on these details about Linda’s college days, which is the more plausible scenario involving Linda today?
- Linda is a bank teller.
- Linda is a bank teller who is active in the feminist movement.
Kahneman and Tversky reported that 87 percent of the students questioned thought the second scenario more likely, even though a moment’s thought reveals that this could not possibly be the case: Feminist bank tellers are a subset of all bank tellers. But adding the detail that Linda is still active in the feminist movement lends narrative coherence, and therefore intuitive plausibility, to the (less likely) second scenario.
Kahneman calls the mind “a machine for jumping to conclusions”: We confuse the easily imaginable with the highly probable,14 let emotions cloud judgments, find patterns in random noise, tell spuriously causal stories about cases of regression to the mean, and overgeneralize from personal experience. Many of the mental heuristics we use to make judgments and decisions turn out to be systematically biased. Dan Ariely’s phrase “predictably irrational” describes the mind’s systematic tendency to rely on biased mental heuristics.
Such findings help explain a phenomenon first documented by Kahneman’s predecessor Paul Meehl in the 1950s and subsequently validated by hundreds of academic studies and industrial applications of the sort dramatized in Michael Lewis’s Moneyball: The predictions of simple algorithms routinely beat those of well-informed human experts in a wide variety of domains. This points to the need for human-computer collaboration in a way that even Licklider himself probably didn’t imagine. It turns out that minds need algorithms to de-bias our judgments and decisions as surely as our eyes need artificial lenses to see adequately.
I’m sorry Dave. I’m afraid I can’t do that.
While it is easy to anthropomorphize self-driving cars, voice-activated personal assistants, and computers capable of beating humans at games of skill, we have seen that such technologies are “intelligent” in essentially the same minimal way that credit scoring or fraud detection algorithms are. This means that they are subject to a fundamental limitation of data-driven statistical inference: Algorithms are reliable only to the extent that the data used to train them are sufficiently complete and representative of the environment in which they are to be deployed. When this condition is not met, all bets are off.
To illustrate, consider a few examples involving familiar forms of AI:
- During the Jeopardy! match with Watson, Jennings, and Rutter, Alex Trebek posed this question under the category “US cities”: “Its largest airport is named for a World War II hero; its second largest, for a World War II battle.” Watson answered “Toronto.”15
- One of us used a common machine translating service to translate the recent news headline “Hillary slams the door on Bernie” from English into Bengali, then back again. The result was “Barney slam the door on Clinton.”16
- In 2014, a group of computer scientists demonstrated that it is possible to “fool” state-of-the-art deep learning algorithms into classifying unrecognizable or white noise images as common objects (such as “peacock” or “baseball”) with very high confidence.17
- On May 7, 2016, an unattended car in “autopilot” mode drove underneath a tractor-trailer that it did not detect, shearing off the roof of the car and killing the driver.18
None of these stories suggest that the algorithms aren’t highly useful. Quite the contrary. IBM’s Watson did, after all, win Jeopardy!; machine translation and image recognition algorithms are enabling new products and services; and even the self-driving car fatality must be weighed against the much larger number of lives likely to be saved by autonomous vehicles.19
Rather, these examples illustrate another point that Licklider would have appreciated: Certain strengths of human intelligence can counterbalance the fundamental limitations of brute-force machine learning.
Returning to the above examples:
- Watson, an information retrieval system, would have responded correctly if it had access to, for example, a Wikipedia page listing the above facts about Chicago’s two major airports. But it is unable to use commonsense reasoning, as answering “Toronto” to a question about “US cities” illustrates.20
- Today’s machine translation algorithms cannot reliably extrapolate beyond existing data (including millions of phrase pairs from documents) to translate novel combinations of words, new forms of slang, and so on. In contrast, a basic phenomenon emphasized by Noam Chomsky in linguistics is the ability of young children to acquire language—with its infinite number of possible sentences—based on surprisingly little data.21
- A deep learning algorithm must be trained with many thousands of photographs to recognize (for example) kittens—and even then, it has formed no conceptual understanding. In contrast, even small children are actually very good at forming hypotheses and learning from a small number of examples.
- Autonomous vehicles must make do with algorithms that cannot reliably extrapolate beyond the scenarios encoded in their databases. This contrasts with the ability of human drivers to use judgment and common sense in unfamiliar, ambiguous, or dynamically changing situations.
In short, when routine tasks can be encoded in big data, it is a safe bet that algorithms can be built to perform them better than humans can. But such algorithms will lack the conceptual understanding and commonsense reasoning needed to evaluate novel situations. They can make inferences from structured hypotheses but lack the intuition to prioritize which hypothesis to test in the first place. The cognitive scientist Alison Gopnik summarizes the situation this way:
One of the fascinating things about the search for AI is that it’s been so hard to predict which parts would be easy or hard. At first, we thought that the quintessential preoccupations of the officially smart few, like playing chess or proving theorems—the corridas of nerd machismo—would prove to be hardest for computers. In fact, they turn out to be easy. Things every dummy can do, like recognizing objects or picking them up, are much harder. And it turns out to be much easier to simulate the reasoning of a highly trained adult expert than to mimic the ordinary learning of every baby.22
Just as humans need algorithms to avoid “System 1” decision traps, the inherent limitations of big data imply the need for human judgment to keep mission-critical algorithms in check. Neither of these points were as obvious in Licklider’s time as they are today. Together, they imply that the case for human-computer symbiosis is stronger than ever.
Chess provides an excellent example of human-computer collaboration—and a cautionary tale about over-interpreting dramatic examples of computers outperforming humans. In 1997, IBM’s Deep Blue beat the chess grandmaster Garry Kasparov. A major news magazine made the event a cover story titled “The brain’s last stand.” Many observers proclaimed the game to be over.23
Eight years later, it became clear that the story is considerably more interesting than “machine vanquishes man.” A competition called “freestyle chess” was held, allowing any combination of human and computer chess players to compete. The competition resulted in an upset victory that Kasparov later reflected upon:
The surprise came at the conclusion of the event. The winner was revealed to be not a grandmaster with a state-of-the-art PC but a pair of amateur American chess players using three computers at the same time. Their skill at manipulating and “coaching” their computers to look very deeply into positions effectively counteracted the superior chess understanding of their grandmaster opponents and the greater computational power of other participants. Weak human + machine + better process was superior to a strong computer alone and, more remarkably, superior to a strong human + machine + inferior process. . . . Human strategic guidance combined with the tactical acuity of a computer was overwhelming.24
“Freestyle x” is a useful way of thinking about human-computer collaboration in a variety of domains. To be sure, some jobs traditionally performed by humans have been and will continue to be displaced by AI algorithms. An early example is the job of bank loan officer, which was largely eliminated after the introduction of credit scoring algorithms. In the future, it is possible that jobs ranging from long-haul truck driver to radiologist could be largely automated.25 But there are many other cases where variations on “freestyle x” are a more plausible scenario than jobs simply being replaced by AI.
For example, in their report The future of employment: How susceptible are jobs to computerization?, the Oxford University business school professors Carl Benedikt Frey and Michael Osbourne list “insurance underwriters” as one of the top five jobs most susceptible to computerization, a few notches away from “tax preparers.” Indeed, it is true that sophisticated actuarial models serve as a type of AI that eliminates the need for manual underwriting of standard personal auto or homeowners insurance contracts.
Consider, though, the more complex challenge of underwriting businesses for commercial liability or injured worker risks. There are fewer businesses to insure than there are cars and homes, and there are typically fewer predictive data elements common to the wide variety of businesses needing insurance (some are hipster artisanal pickle boutiques; others are construction companies). In statistical terms, this means that there are fewer rows and columns of data available to train predictive algorithms. The models can do no more than mechanically tie together the limited number of risk factors fed into them. They cannot evaluate the accuracy or the completeness of this information, nor can they weigh it together with various case-specific nuances that might be obvious to a human expert, nor can they underwrite new types of businesses and risks not represented in the historical data. However, such algorithms can often automate the underwriting of small, straightforward risks, giving the underwriter more time to focus on the more complex cases requiring commonsense reasoning and professional judgment.
Similar comments about job loss to AI can be made about fraud investigators (particularly in domains where fraudsters rapidly evolve their tactics, rendering historical data less relevant), hiring managers, university admissions officers, public sector case workers, judges making parole decisions, and physicians making medical diagnoses. In each domain, cases fall on a spectrum. When the cases are frequent, unambiguous, and similar across time and context—and if the downside costs of a false prediction are acceptable—algorithms can presumably automate the decision. On the other hand, when the cases are more complex, novel, exceptional, or ambiguous—in other words, not fully represented by historical cases in the available data—human-computer collaboration is a more plausible and desirable goal than complete automation.
The current debates surrounding self-driving cars illustrate this spectrum. If driving environments could be sufficiently controlled—for example, dedicated lanes accessible only to autonomous vehicles, all equipped with interoperable sensors—level 5 autonomous vehicles would be possible in the near term.26 However, given the number of “black swan”-type scenarios possible (a never-before-seen combination of weather, construction work, a mattress falling off a truck, and someone crossing the road—analogous to the example of translating “Hillary slams the door on Bernie” into Bengali), it is unclear when it will be possible to dispense entirely with human oversight and commonsense reasoning.
Bridging the empathy gap
For the reasons given above, and also because of its inherent “human element,” medicine is a particularly fertile domain for “freestyle x” collaboration. Paul Meehl realized 60 years ago that even simple predictive algorithms can outperform unaided clinical judgment.27 Today, we have large databases of lifestyle data, genomics data, self-tracking devices, mobile phones capable of taking medical readings, and Watson-style information retrieval systems capable of accessing libraries of continually updated medical journals. Perhaps the treatment of simple injuries, particularly in remote or underserved places, will soon be largely automated, and certain advanced specialties such as radiology or pathology might be largely automated by deep learning technologies.
More generally, the proliferation of AI applications in medicine will likely alter the mix of skills that characterize the most successful physicians and health care workers. Just as the skills that enabled Garry Kasparov to become a chess master did not guarantee dominance at freestyle chess, it is likely that the best doctors of the future will combine the ability to use AI tools to make better diagnoses with the ability to empathetically advise and comfort patients. Machine learning algorithms will enable physicians to devote fewer mental cycles to the “spadework” tasks computers are good at (memorizing the Physicians’ Desk Reference, continually scanning new journal articles) and more to such characteristically human tasks as handling ambiguity, strategizing treatment and wellness regimens, and providing empathetic counsel.
Just as it is overly simplistic to think that computers are getting smarter than humans, it is probably equally simplistic to think that only humans are good at empathy. There is evidence that AI algorithms can play a role in promoting empathy. For example, the Affectiva software is capable of inferring people’s emotional states from webcam videos of their facial expressions. Such software can be used to help optimize video content: An editor might eliminate a section from a movie trailer associated with bored audience facial expressions. Interestingly, the creators of Affectiva were originally motivated by the desire to help autistic people better infer emotional states from facial expressions. Such software could be relevant not only in medicine and marketing, but in the broader business world: Research has revealed that teams containing more women, as well as team members with high degrees of social perception (the trait that Affectiva was designed to support), exhibit higher group intelligence.28
There is also evidence that big data and AI can help with both verbal and nonverbal communications between patients and health care workers (and, by extension, between teachers and students, managers and team members, salespeople and customers, and so on). For example, Catherine Kreatsoulas has led the development of algorithms that estimate the likelihood of coronary heart disease based on patients’ own descriptions of their symptoms. Kreatsoulas has found evidence that men and women tend to describe symptoms differently, potentially leading to differential treatment. It’s possible that well-designed AI algorithms can help avoid such biases.29
Regarding nonverbal communication, Sandy Pentland and his collaborators at MIT Media Lab have developed a wearable device, known as the “sociometer,” that can measure patterns of nonverbal communication. Such devices could be used to quantify otherwise intangible aspects of communication style in order to coach health care workers on how to cultivate a better bedside manner. This work could even bear on medical malpractice claims: There is evidence that physicians who are perceived as more “likable” are sued for malpractice less often, independently of other risk factors.30
Algorithms can be biased, too
Another type of mental operation that cannot (and must not) be outsourced to algorithms is reasoning about fairness, societal acceptability, and morality. The naive view that algorithms are “fair” and “objective” simply because they use hard data is giving way to recognition of the need for oversight. In a broad sense, this idea is not new. For example, there has long been legal doctrine around the socially undesirable disparate impact that hiring and credit scoring algorithms can potentially have on various classes of individuals.31 More recent examples of algorithmic bias include online advertising systems that have been found to target career-coaching service ads for high-paying jobs more frequently to men than women, and ads suggestive of arrests more often to people with names commonly used by black people.