mosquito. Mission accomplished. Ouch.
What seems like the powerful radar of insects in the dark, with blood-seeking
intelligence inexplicable for such tiny brains, is actually just a sensitive nose with almost
no intelligence at all. Mosquitoes are closer to plants that follow the sun than to guided
missiles. Yet by applying this simple “follow your nose” rule quite literally, they can
travel through a house to find you, slip through cracks in a screen door, even zero in on
the tiny strip of skin you left exposed between hat and shirt collar. It’s just a random
walk, combined with flexible wings and legs that let the insect bounce off obstacles, and
an instinct to descend a chemical gradient.
But “gradient descent” is much more than bug navigation. Look around you and
you’ll find it everywhere, from the most basic physical rules of the universe to the most
advanced artificial intelligence.
The Universe
We live in a world of countless gradients, from light and heat to gravity and chemical
trails (chemtrails!). Water flows along a gravity gradient downhill, and your body lives
on chemical solutions flowing across cell membranes from high concentration to
low. Every action in the universe is driven by some gradient drive, from the movement
of the planets around gravity gradients to the joining of atoms along electric-charge
gradients to form molecules. Our own urges, such as hunger and sleepiness, are driven
by electro-chemical gradients in our bodies. And our brain’s functions, the electrical
signals moving along ion channels in the synapses between our neurons, are simply
atoms and electrons flowing “downhill” along yet more electrical and chemical gradients.
104
Forget clockwork analogies; our brains are closer to a system of canals and locks, with
signals traveling like water from one state to another.
As I sit here typing, I’m actually seeking equilibrium states in an n-dimensional
topology of gradients. Take just one: heat. My body temperature is higher than the air
temperature, so I radiate heat, which must be replenished in my core. Even the bacteria
in my digestive tract use sensors to measure sugar concentrations in the liquid around
them and whip their tail-like flagella to swim “upstream” where the sugar supply is
richest. The natural state of all systems is to flow to lower energy states, a process that is
broadly described by entropy (the tendency of things to go from ordered to disordered
states; all things will fall apart eventually, including the universe itself).
But how do you explain more complex behavior, such as our ability to make
decisions? The answer is just more gradient descent.
Our Brains
As miraculous and inscrutable as our human intelligence is, science is coming around to
the view that our brains operate the same way as any other complex system with layers
and feedback loops, all pursuing what we mathematically call “optimization functions”
but you could just as well call “flowing downhill” in some sense.
The essence of intelligence is learning, and we do that by correlating inputs with
positive or negatives scores (rewards or punishment). So, for a baby, “this sound” (your
mother’s voice) is associated with other learned connections to your mother, such as food
or comfort. Likewise, “this muscle motion brings my thumb closer to my mouth.” Over
time and trial and error, the brain’s neural network reinforces those connections.
Meanwhile “this muscle motion does not bring my thumb close to my mouth” is a
negative correlation, and the brain will weaken those connections.
However, this is too simplistic. The limits of gradient descent constitute the socalled
local-minima problem (or local-maxima problem, if you’re doing a gradient
ascent). If you are walking in a mountainous region and want to get home, always
walking downhill will most likely get you to the next valley but not necessarily over the
other mountains that lie around it and between you and home. For that, you need either a
mental model (i.e., a map) of the topology so you know where to ascend to get out of the
valley, or you need to switch between gradient descent and random walks so you can
bounce your way out of the region.
Which is, in fact, exactly what the mosquito does in following my scent: It
descends when it’s in my plume and random-walks when it has lost the trail or hit an
obstacle.
AI
So that’s nature. What about computers? Traditional software doesn’t work that way—it
follows deterministic trees of hard logic: “If this, do that.” But software that interacts
with the physical world tends to work more like the physical world. That means dealing
with noisy inputs (sensors or human behavior) and providing probabilistic, not
deterministic, results. And that, in turn, means more gradient descent.
AI software is the best example of this, especially the kinds of AI that use
artificial neural-network models (including convolutional, or “deep,” neural networks of
many layers). In these, a typical process consists of “training” them by showing them
105
lots of examples of something you want them to learn (pictures of cats labeled “cat,” for
example), along with examples of other random data (pictures of other things). This is
called “supervised learning,” because the neural network is being taught by example,
including the use of “adversarial training” with data that is not correlated to the desired
result.
These neural networks, like their biological models, consist of layers of thousands
of nodes (“neurons,” in the analogy), each of which is connected to all the nodes in the
layers above and below by connections that initially have random strength. The top layer
is presented with data, and the bottom layer is given the correct answer. Any series of
connections that happened to land on the right answer is made stronger (“rewarded”), and
those that were wrong are made weaker (“punished”). Repeat tens of thousands of times
and eventually you have a fully trained network for that kind of data.
You can think of all the possible combinations of connections as like the surface
of a planet, with hills and valleys. (Ignore for the moment that the surface is just 3D and
the actual topology is many-dimensional.) The optimization that the network goes
through as it learns is just a process of finding the deepest valley on the planet. This
consists of the following steps:
1. Define a “cost function” that determines how well the network solved the problem
2. Run the network once and see how it did at that cost function
3. Change the values of the connections and do it again. The difference between
those two results is the direction, or “slope,” in which the network moved
between the two trials.
4. If the slope is pointed “downhill,” change the connections more in that direction.
If it’s “uphill,” change them in the opposite direction.
5. Repeat until there is no improvement in any direction. That means that you’re in
a minimum.
Congrats! But it’s probably a local minimum, or a little dip in the mountains, so you’re
going to have to keep going if you want to do better. You can’t keep going downhill, and
you don’t know where the absolute lowest point is, so you’re going to have to somehow
find it. There are many ways to do that, but here are a few:
1. Try lots of times with different random settings and share learning from each trial;
essentially, you are shaking the system to see if it settles in a lower state. If one
of the other trials found a lower valley, start with those settings.
2. Don’t just go downhill but stumble around a bit like a drunk, too (this is called
“stochastic gradient descent”). If you do this long enough, you’ll eventually find
rock bottom. There’s a metaphor for life in that.
3. Just look for “interesting” features, which are defined by diversity (edges or color
changes, for example). Warning: This way can lead to madness—too much
“interestingness” draws the network to optical illusions. So keep it sane, and
emphasize the kinds of features that are likely to be real in nature, as opposed to
artifacts or errors. This is called “regularization,” and there are lots of techniques
for this, such as whether those kinds of features have been seen before (learned),
106
or are too “high frequency” (like static) rather than “low frequency” (more
continuous, like actual real-world features).
Just because AI systems sometimes end up in local minima, don’t conclude that this
makes them any less like life. Humans—indeed, probably all life-forms—are often stuck
in local minima.
Take our understanding of the game of Go, which was taught and learned and
optimized by humans for thousands of years. It took AIs less than three years to find out
that we’d been playing it wrong all along and that there were better, almost alien,
solutions to the game which we’d never considered—mostly because our brains don’t
have the processing power to consider so many moves ahead.
Even in chess, which is ten times easier and was thought to be understood, bruteforce
machines could beat us at our own strategies. Chess, too, turned out, when
explored by superior neural-network AI systems, to have weird but superior strategies
we’d never considered, like sacrificing queens early to gain an obscure long-term
advantage. It’s as if we had been playing 2D versions of games that actually existed in
higher dimensions.
If any of this sounds familiar, it’s because physics has been wrestling with these
sorts of topological problems for decades. The notion of space being many-dimensional,
and math reducing to understanding the geometries and interactions of “membranes”
beyond the reach of our senses, is where Grand Unified Theorists go to die. But unlike
multidimensional theoretical physics, AI is something we can actually experiment with
and measure.
So that’s what we’re going to do. The next few decades will be an explosive
exploration of ways to think that 7 million years of evolution never found. We’re going
to rock ourselves out of local minima and find deeper minima, maybe even global
minima. And when we’re done, we may even have taught machines to seem as smart as
a mosquito, forever descending the cosmic gradients to an ultimate goal, whatever that
may be.
107
David Kaiser is a physicist atypically interested in the intersection of his science with
politics and culture, about which he has written widely.
In the first meeting (in Washington, Connecticut) that preceded the crafting of this
book, he commented on the change in how “information” is viewed since Wiener’s time:
the military-industrial, Cold War era. Back then, Wiener compared information,
metaphorically, to entropy, in that it could not be conserved—i.e., monopolized; thus, he
argued, our atomic secrets and other such classified matters would not remain secrets for
long. Today, whereas (as Wiener might have expected) information, fake or not, is
leaking all over the other Washington, information in the economic world has indeed
been stockpiled, commodified, and monetized.
This lockdown, David said, was “not all good, not all bad”—depending, I guess,
on whether you’re sick of being pestered by ads for socks or European river cruises
popping up in your browser minutes after you’ve bought them.
To say nothing of information’s proliferation. David complained to the rest of us
attending the meeting that in Wiener’s time, physicists could “take the entire Physical
Review. It would sit comfortably in front of us in a manageable pile. Now we’re awash
in fifty thousand open-source journals per minute,” full of god-knows-what. Neither of
these developments would Wiener have anticipated, said David, prompting him to ask,
“Do we need a new set of guiding metaphors?”
108
“INFORMATION” FOR WIENER, FOR SHANNON, AND FOR US
David Kaiser
David Kaiser is Germeshausen Professor of the History of Science and professor of
physics at MIT, and head of its Program in Science, Technology & Society. He is the
author of How the Hippies Saved Physics: Science, Counterculture, and the Quantum
Revival and American Physics and the Cold War Bubble (forthcoming).
In The Sleepwalkers, a sweeping history of scientific thought from ancient times through
the Renaissance, Arthur Koestler identified a tension that has marked the most dramatic
leaps of our cosmological imagination. In reading the great works of Nicolaus
Copernicus and Johannes Kepler today, Koestler argued, we are struck as much by their
strange unfamiliarity—their embeddedness in the magic or mysticism of an earlier age—
as by their modern-sounding insights.
I detect that same doubleness—the zig-zag origami folds of old and new—in
Norbert Wiener’s classic The Human Use of Human Beings. First published in 1950 and
revised in 1954, the book is in many ways extraordinarily prescient. Wiener, the MIT
polymath, recognized before most observers that “society can only be understood through
a study of the messages and the communication facilities which belong to it.” Wiener
argued that feedback loops, the central feature of his theory of cybernetics, would play a
determining role in social dynamics. Those loops would not only connect people with
one another but connect people with machines, and—crucially—machines with
machines.
Wiener glimpsed a world in which information could be separated from its
medium. People, or machines, could communicate patterns across vast distances and use
them to fashion new items at the endpoints, without “moving a…particle of matter from
one end of the line to the other,” a vision now realized in our world of networked 3D
printers. Wiener also imagined machine-to-machine feedback loops driving huge
advances in automation, even for tasks that had previously relied on human judgment.
“The machine plays no favorites between manual labor and white-collar labor,” he
observed.
For all that, many of the central arguments in The Human Use of Human Beings
seem closer to the 19th century than the 21st. In particular, although Wiener made
reference throughout to Claude Shannon’s then-new work on information theory, he
seems not to have fully embraced Shannon’s notion of information as consisting of
irreducible, meaning-free bits. Since Wiener’s day, Shannon’s theory has come to
undergird recent advances in “Big Data” and “deep learning,” which makes it all the
more interesting to revisit Wiener’s cybernetic imagination. How might tomorrow’s
artificial intelligence be different if practitioners were to re-invest in Wiener’s guiding
vision of “information”?
~ ~ ~
When Wiener wrote The Human Use of Human Beings, his experiences of war-related
research, and of what struck him as the moral ambiguities of intellectual life amid the
military-industrial complex, were still fresh. Just a few years earlier, he had announced
109
in the pages of The Atlantic Monthly that he would not “publish any future work of mine
which may do damage in the hands of irresponsible militarists.” 30 He remained
ambivalent about the transformative power of new technologies, indulging in neither the
boundless hype nor the digital utopianism of later pundits.
“Progress imposes not only new possibilities for the future but new restrictions,”
he wrote, in Human Use. He was concerned about human-made restrictions as well as
technological ones, especially Cold War restrictions that threatened the flow of
information so critical to cybernetic systems: “Under the impetus of Senator [Joseph]
McCarthy and his imitators, the blind and excessive classification of military
information” was driving political leaders in the United States to adopt a “secretive frame
of mind paralleled in history only in the Venice of the Renaissance.” Wiener, echoing
many outspoken veterans of the Manhattan Project, argued that the postwar obsession
with secrecy—especially around nuclear weapons—stemmed from a misunderstanding of
the scientific process. The only genuine secret about the production of nuclear weapons,
he wrote, was whether such bombs could be built. Once that secret had been revealed,
with the bombings of Hiroshima and Nagasaki, no amount of state-imposed secrecy
would stop others from puzzling through chains of reasoning like those the Manhattan
Project researchers had followed. As Wiener memorably put it, “There is no Maginot
Line of the brain.”
To drive this point home, Wiener borrowed Shannon’s fresh ideas about
information theory. In 1948, Shannon, a mathematician and engineer working at Bell
Labs, had published a pair of lengthy articles in the Bell System Technical Journal.
Introducing the new work to a broad readership in 1949, mathematician Warren Weaver
explained that in Shannon’s formulation, “the word information…is used in a special
sense that must not be confused with its ordinary usage. In particular, information must
not be confused with meaning.” 31 Linguists and poets might be concerned about the
“semantic” aspects of communication, Weaver continued, but not engineers like
Shannon. Rather, “this word ‘information’ in communication theory relates not so much
to what you do say, as to what you could say.” In Shannon’s now-famous formulation,
the information content of a string of symbols was given by the logarithm of the number
of possible symbols from which a given string was chosen. Shannon’s key insight was
that the information of a message was just like the entropy of a gas: a measure of the
system’s disorder.
Wiener borrowed this insight when composing Human Use. If information was
like entropy, then it could not be conserved—or contained. Physicists in the 19th century
had demonstrated that the total energy of a physical system must always remain the same,
a perfect balance between the start and the end of a process. Not so for entropy, which
would inexorably increase over time, an imperative that came to be known as the second
law of thermodynamics. From that stark distinction—energy is conserved, whereas
entropy must grow—followed enormous cosmic consequences. Time must flow forward;
30
Norbert Wiener, “A Scientist Rebels,” The Atlantic Monthly, January 1947.
31
Warren Weaver, “Recent Contributions to the Mathematical Theory of Communication,” in Claude
Shannon & Warren Weaver, The Mathematical Theory of Communication (Urbana, IL: University of
Illinois Press, 1949), p. 8 (emphasis in original). Shannon’s 1948 papers were republished in the same
volume.
110
the future cannot be the same as the past. The universe could even be careening toward a
“heat death,” some far-off time when the total stock of energy had uniformly dispersed,
achieving a state of maximum entropy, after which no further change could occur.
If information qua entropy could not be conserved, then Wiener concluded it was
folly for military leaders to try to stockpile the “scientific know-how of the nation in
static libraries and laboratories.” Indeed, “no amount of scientific research, carefully
recorded in books and papers, and then put into our libraries with labels of secrecy, will
be adequate to protect us for any length of time in a world where the effective level of
information is perpetually advancing.” Any such efforts at secrecy, classification, or the
containment of information would fail, Wiener argued, just as surely as hucksters’
schemes for perpetual-motion machines faltered in the face of the second law of
thermodynamics.
Wiener criticized the American “orthodoxy” of free-market fundamentalism in
much the same way. For most Americans, “questions of information will be evaluated
according to a standard American criterion: a thing is valuable as a commodity for what it
will bring in the open market.” Indeed, “the fate of information in the typically American
world is to become something which can be bought or sold;” most people, he observed,
“cannot conceive of a piece of information without an owner.” Wiener considered this
view to be as wrong-headed as rampant military classification. Again he invoked
Shannon’s insight: Since “information and entropy are not conserved,” they are “equally
unsuited to being commodities.”
~ ~ ~
Information cannot be conserved—so far, so good. But did Wiener really have
Shannon’s “information” in mind? The crux of Shannon’s argument, as Weaver had
emphasized, was to distinguish a colloquial sense of “information,” as message with
meaning, from an abstracted, rarefied notion of strings of symbols arrayed with some
probability and selected from an enormous universe of gibberish. For Shannon,
“information” could be quantified because its fundamental unit, the bit, was a unit of
conveyance rather than understanding.
When Wiener characterized “information” throughout Human Use, on the other
hand, he tilted time and again to a classical, humanistic sense of the term. “A piece of
information,” he wrote—tellingly, not a “bit” of information—“in order to contribute to
the general information of the community, must say something substantially different
from the community’s previous common stock of information.” This was why
“schoolboys do not like Shakespeare,” he concluded: The Bard’s couplets may depart
starkly from random bitstreams, but they had nonetheless become all too familiar to the
sense-making public and “absorbed into the superficial clichés of the time.”
At least the information content of Shakespeare had once seemed fresh. During
the postwar boom years, Wiener fretted, the “enormous per capita bulk of
communication”—ranging across newspapers and movies to radio, television, and
books—had bred mediocrity, an informational reversion to the mean. “More and more
we must accept a standardized inoffensive and insignificant product which, like the white
bread of the bakeries, is made rather for its keeping and selling properties than for its
food value.” “Heaven save us,” he pleaded, “from the first novels which are written
because a young man desires the prestige of being a novelist rather than because he has
111
something to say! Heaven save us likewise from the mathematical papers which are
correct and elegant but without body or spirit.” Wiener’s treatment of “information”
sounded more like Matthew Arnold in 1869 32 than Claude Shannon in 1948—more
“body and spirit” than “bit.” Wiener shared Arnold’s Romantic view of the “content
producer” as well. “Properly speaking the artist, the writer, and the scientist should be
moved by such an irresistible impulse to create that, even if they were not being paid for
their work, they would be willing to pay to get the chance to do it.” L’art pour l’art, that
19th-century cry: Artists should suffer for their work; the quest for meaningful expression
should always trump lucre.
To Wiener, this was the proper measure of “information”: body, spirit, aspiration,
expression. Yet to argue against its commodification, Wiener reverted again to
Shannon’s mathematics of information-as-entropy.
~ ~ ~
Flash forward to our day. In many ways, Wiener has been proved right. His vision of
networked feedback loops driven by machine-to-machine communication has become a
mundane feature of everyday life. From the earliest stirrings of the Internet Age,
moreover, digital piracy has upended the view that “information”—in the form of songs,
movies, books, or code—could remain contained. Put up a paywall here, and the content
will diffuse over there, all so much informational entropy that cannot be conserved.
On the other hand, enormous multinational corporations—some of the largest and
most profitable in the world—now routinely disprove Wiener’s contention that
“information” cannot be stockpiled or monetized. Ironically, the “information” they
trade in is closer to Shannon’s definition than Wiener’s, Shannon’s mathematical proofs
notwithstanding.
While Google Books may help circulate hundreds of thousands of works of
literature for free, Google itself—like Facebook, Amazon, Twitter, and their many
imitators—has commandeered a baser form of “information” and exploited it for
extraordinary profit. Petabytes of Shannon-like information—a seemingly meaningless
stream of clicks, “likes,” and retweets, collected from virtually every person who has ever
touched a networked computer—are sifted through proprietary “deep-learning”
algorithms to micro-target everything from the advertisements we see to the news stories
(fake or otherwise) we encounter while browsing the Web.
Back in the early 1950s, Wiener had proposed that researchers study the
structures and limitations of ants—in contrast to humans—so that machines might one
day achieve the “almost indefinite intellectual expansion” that people (rather than insects)
can attain. He found solace in the notion that machines could come to dominate us only
“in the last stages of increasing entropy,” when “the statistical differences among
individuals are nil.” Today’s data-mining algorithms turn Wiener’s approach on its head.
They produce profit by exploiting our reptilian brains rather than imitating our cerebral
cortexes, harvesting information from all our late-night, blog-addled, pleasure-seeking
clickstreams—leveraging precisely the tiny, residual “statistical differences among
individuals.”
32
Matthew Arnold, Culture and Anarchy, Jane Garnett, ed. (Oxford, U.K.: Oxford University Press, 2006).
112
To be sure, some recent achievements in artificial intelligence have been
remarkably impressive. Computers can now produce visual artworks and musical
compositions akin to those of recognized masters, creating just the sort of “information”
that Wiener most prized. But by far the largest impact on society to date has come from
the collection and manipulation of Shannon-like information, which has reshaped our
shopping habits, political participation, personal relationships, expectations of privacy,
and more.
What might “deep learning” evolve into, if the fundamental currency becomes
“information” as Wiener defined it? How might the field shift if re-animated by
Wiener’s deep moral convictions, informed as they were by his prescient concerns about
rampant militarism, runaway corporate profit-seeking, the self-limiting features of
secrecy, and the reduction of human expression to interchangeable commodities?
Perhaps “deep learning” might then become the cultivation of meaningful information
rather than the relentless pursuit of potent, if meaningless, bits.
113
In the aforementioned Connecticut discussion on The Human Use of Human Beings, Neil
Gershenfeld provided some fresh air, of a kind, by professing that he hated the book,
which remark was met by universal laughter—as was his observation that computer
science was one the worst things to happen to computers, or science. His overall
contention was that Wiener missed the implications of the digital revolution that was
happening around him—although some would say this charge can’t be leveled at
someone on the ground floor and lacking clairvoyance.
“The tail wagging the dog of my life,” he told us, “has been Fab Labs and the
maker movement, and [when] Wiener talks about the threat of automation he misses the
inverse, which is that access to the means for automation can empower people, and in
Fab Labs, the corner I’ve been involved in, that’s an exponential.”
In 2003, I visited Neil at MIT, where he runs the Center for Bits and Atoms.
Hours later, I emerged from what had been an exuberant display of very weird stuff. He
showed me the work of one student in his popular rapid-prototyping class (“How to
Make Almost Anything”), a sculptor with no engineering background, who had made a
portable personal space for screaming that saves up your screams and plays them back
later. Another student in the class had made a Web browser that lets parrots navigate
the Net. Neil himself was doing fundamental research on the roadmap to that sci-fi
staple, a “universal replicator.” It was a visit that took me a couple of years to get my
head around.
Neil manages a global network of Fab Labs—small-scale manufacturing systems,
enabled by digital technologies, which give people the wherewithal to build whatever
they’d like. As guru of the maker movement, which merges digital communication and
computation with fabrication, he sometimes feels outside the current heated debate on AI
safety. “My ability to do research rests on tools that augment my capabilities,” he says.
“Asking whether or not they are intelligent is as fruitful as asking how I know I exist—
amusing philosophically, but not testable empirically.” What interests him is “how bits
and atoms relate—the boundary between digital and physical. Scientifically, it’s the most
exciting thing I know.”
114
SCALING
Neil Gershenfeld
Neil Gershenfeld is a physicist and director of MIT’s Center for Bits and Atoms. He is
the author of FAB, co-author (with Alan Gershenfeld & Joel Cutcher-Gershenfeld) of
Designing Reality, and founder of the global fab lab network.
Discussions about artificial intelligence have been oddly ahistorical. They could better
be described as manic-depressive; depending on how you count, we’re now in the fifth
boom-bust cycle. Those swings mask the continuity in the underlying progress and the
implications for where it’s headed.
The cycles have come in roughly decade-long waves. First there were
mainframes, which by their very existence were going to automate away work. That ran
into the reality that it was hard to write programs to do tasks that were simple for people
to do. Then came expert systems, which were going to codify and then replace the
knowledge of experts. These ran into difficulty in assembling that knowledge and
reasoning about cases not already covered. Perceptrons sought to get around these
problems by modeling how the brain learns, but they were unable to do much of
anything. Multilayer perceptrons could handle test problems that had tripped up those
simpler networks, but their demonstrations did poorly on unstructured, real-world
problems. We’re now in the deep-learning era, which is delivering on many of the early
AI promises but in a way that’s considered hard to understand, with consequences
ranging from intellectual to existential threats.
Each of these stages was heralded as a revolutionary advance over the limitations
of its predecessors, yet all effectively do the same thing: They make inferences from
observations. How these approaches relate can be understood by how they scale—that is,
how their performance depends on the difficulty of the problem they’re addressing. Both
a light switch and a self-driving car must determine their operator’s intentions, but the
former has just two options to choose from, whereas the latter has many more. The AIboom
phases have started with promising examples in limited domains; the bust phases
came with the failure of those demonstrations to handle the complexity of less-structured,
practical problems.
Less apparent is the steady progress we’ve made in mastering scaling. This
progress rests on the technological distinction between linear and exponential functions—
a distinction that was becoming evident at the dawn of AI but with implications for AI
that weren’t appreciated until many years later.
In one of the founding documents of the study of intelligent machines, The
Human Use of Human Beings, Norbert Wiener does a remarkable job of identifying many
of the most significant trends to arise since he wrote it, along with noting the people
responsible for them and then consistently failing to recognize why these people’s work
proved to be so important. Wiener is credited with creating the field of cybernetics; I’ve
never understood what that is, but what’s missing from the book is at the heart of how AI
has progressed. This history matters because of the echoes of it that persist to this day.
Claude Shannon makes a cameo appearance in the book, in the context of his
thoughts about the prospects for a chess-playing computer. Shannon was doing
115
something much more significant than speculating at the time: He was laying the
foundations for the digital revolution. As a graduate student at MIT, he worked for
Vannevar Bush on the Differential Analyzer. This was one of the last great analog
computers, a room full of gears and shafts. Shannon’s frustration with the difficulty of
solving problems this way led him in 1937 to write what might be the best master’s thesis
ever. In it, he showed how electrical circuits could be designed to evaluate arbitrary
logical expressions, introducing the basis for universal digital logic.
After MIT, Shannon studied communications at Bell Labs. Analog telephone
calls degraded with distance; the farther they traveled, the worse they sounded. Rather
than continue to improve them incrementally, Shannon showed in 1948 that by
communicating with symbols rather than continuous quantities, the behavior is very
different. Converting speech waveforms to the binary values of 1 and 0 is an example,
but many other sets of symbols can be (and are) used in digital communications. What
matters is not the particular symbols but rather the ability to detect and correct errors.
Shannon found that if the noise is above a threshold (which depends on the system
design), then there are certain to be errors. But if the noise is below a threshold, then a
linear increase in the physical resources representing the symbol results in an exponential
decrease in the likelihood of making an error in correctly receiving the symbol. This
relationship was the first of what we’d now call a threshold theorem.
Such scaling falls off so quickly that the probability of an error can be so small as
to effectively never happen. Each symbol sent multiplies rather than adds to the
certainty, so that the probability of a mistake can go from 0.1 to 0.01 to 0.001, and so
forth. This exponential decrease in communication errors made possible an exponential
increase in the capacity of communication networks. And that eventually solved the
problem of where the knowledge in an AI system came from.
For many years, the fastest way to speed up a computation was to do nothing—
just wait for computers to get faster. In the same way, there were years of AI projects
that aimed to accumulate everyday knowledge by laboriously entering pieces of
information. That didn’t scale; it could progress only as fast as the number of people
doing the entering. But when phone calls, newspaper stories, and mail messages all
moved onto the Internet, everyone doing any of those things became a data generator.
The result was an exponential rather than a linear rate of knowledge accumulation.
John von Neumann also has a cameo in The Human Use of Human Beings, for
game theory. What Wiener missed here was von Neumann’s seminal role in digitizing
computation. Whereas analog communication degraded with distance, analog computing
(like the Differential Analyzer) degraded with time, accumulating errors as it progressed.
Von Neumann presented in 1952 a result corresponding to Shannon’s for computation
(they had met at the Institute for Advanced Study, in Princeton), showing that it was
possible to compute reliably with an unreliable computing device by using symbols rather
than continuous quantities. This was, again, a scaling argument, with a linear increase in
the physical resources representing the symbol resulting in an exponential reduction in
the error rate as long as the noise was below a threshold. That’s what makes it possible
to have a billion transistors in a computer chip, with the last one as useful as the first one.
This relationship led to an exponential increase in computing performance, which solved
a second problem in AI: how to process exponentially increasing amounts of data.
The third problem that scaling solved for AI was coming up with the rules for
116
reasoning without having to hire a programmer for each problem. Wiener recognized the
role of feedback in machine learning, but he missed the key role of representation. It’s
not possible to store all possible images in a self-driving car, or all possible sounds in a
conversational computer; they have to be able to generalize from experience. The “deep”
part of deep learning refers not to the (hoped-for) depth of insight but to the depth of the
mathematical network layers used to make predictions. It turned out that a linear increase
in network complexity led to an exponential increase in the expressive power of the
network.
If you lose your keys in a room, you can search for them. If you’re not sure
which room they’re in, you have to search all the rooms in a building. If you’re not sure
which building they’re in, you have to search all the rooms in all the buildings in a city.
If you’re not sure which city they’re in, you have to search all the rooms in all the
buildings in all the cities. In AI, finding the keys corresponds to things like a car safely
following the road, or a computer correctly interpreting a spoken command, and the
rooms and buildings and cities correspond to all of the options that have to be considered.
This is called the curse of dimensionality.
The solution to the curse of dimensionality came in using information about the
problem to constrain the search. The search algorithms themselves are not new. But
when applied to a deep-learning network, they adaptively build up representations of
where to search. The price of this is that it’s no longer possible to exactly solve for the
best answer to a problem, but typically all that’s needed is an answer that’s good enough.
Taken together, it shouldn’t be surprising that these scaling laws have allowed
machines to become effectively as capable as the corresponding stages of biological
complexity. Neural networks started out with a goal of modeling how the brain works.
That goal was abandoned as they evolved into mathematical abstractions unrelated to
how neurons actually function. But now there’s a kind of convergence that can be
thought of as forward- rather than reverse-engineering biology, as the results of deep
learning echo brain layers and regions.
One of the most difficult research projects I’ve managed paired what we’d now
call data scientists with AI pioneers. It was a miserable experience in moving goalposts.
As the former progressed in solving long-standing problems posed by the latter, this was
deemed to not count because it wasn’t accompanied by corresponding leaps in
understanding the solutions. What’s the value of a chess-playing computer if you can’t
explain how it plays chess?
The answer of course is that it can play chess. There is interesting emerging
research that is applying AI to AI—that is, training networks to explain how they operate.
But both brains and computer chips are hard to understand by watching their inner
workings; they’re easily interpreted only by observing their external interfaces. We come
to trust (or not) brains and computer chips alike based on experience that tests them
rather than on explanations for how they work.
Many branches of engineering are making a transition from what’s called
imperative to declarative or generative design. This means that instead of explicitly
designing a system with tools like CAD files, circuit schematics, and computer code, you
describe what you want the system to do and then an automated search is done for
designs that satisfy your goals and restrictions. This approach becomes necessary as
design complexity exceeds what can be understood by a human designer. While that
117
might sound like a risk, human understanding comes with its own limits; engineering
design is littered with what appeared to be good insights that have had bad consequences.
Declarative design rests on all the advances in AI, plus the improving fidelity of
simulations to virtually test designs.
The mother of all design problems is the one that resulted in us. The way we’re
designed resides in one of the oldest and most conserved parts of the genome, called the
Hox genes. These are genes that regulate genes, in what are called developmental
programs. Nothing in your genome stores the design of your body; your genome stores,
rather, a series of steps to follow that results in your body. This is an exact parallel to
how search is done in AI. There are too many possible body plans to search over, and
most modifications would be either inconsequential or fatal. The Hox genes are a
representation of a productive place for evolutionary search. It’s a kind of natural
intelligence at the molecular level.
AI has a mind-body problem, in that it has no body. Most work on AI is done in
the cloud, running on virtual machines in computer centers where data are funneled. Our
own intelligence is the result of a search algorithm (evolution) that was able to change
our physical form as well as our programming—those are inextricably linked. If the
history of AI can be understood as the working of scaling laws rather than a succession of
fashions, then its future can be seen in the same way. What’s now being digitized, after
communication and computation, is fabrication, bringing the programmability of bits to
the world of atoms. By digitizing not just designs but the construction of materials, the
same lessons that von Neumann and Shannon taught us apply to exponentially increasing
fabricational complexity.
I’ve defined digital materials to be those constructed from a discrete set of parts
reversibly joined with a discrete set of relative positions and orientations. These
attributes allow the global geometry to be determined from local constraints, assembly
errors to be detected and corrected, heterogeneous materials to be joined, and structures
to be disassembled rather than disposed of when they’re no longer needed. The amino
acids that are the foundation of life and the Lego bricks that are the foundation of play
share these properties.
What’s interesting about amino acids is that they’re not interesting. They have
attributes that are typical but not unusual, such as attracting or repelling water. But just
twenty types of them are enough to make you. In the same way, twenty or so types of
digital-material part types—conducting, insulating, rigid, flexible, magnetic, etc.—are
enough to assemble the range of functions that go into making modern technologies like
robots and computers.
The connection between computation and fabrication was foreshadowed by the
very pioneers whose work the edifice of computing is based on. Wiener hinted at this by
linking material transportation with message transportation. John von Neumann is
credited with modern computer architecture, something he actually wrote very little
about; the final thing he studied, and wrote about beautifully and at length, was selfreproducing
systems. As an abstraction of life, he modeled a machine that can
communicate a computation that constructs itself. And the final thing Alan Turing, who
is credited with the theoretical framework for computer science, studied was how the
instructions in genes can give rise to physical forms. These questions address a topic
absent from a typical computer-science education: the physical configuration of a
118
computation.
Von Neumann and Turing posed their questions as theoretical studies, because it
was beyond the technology of their day to realize them. But with the convergence of
communication and computation with fabrication, these investigations are now becoming
accessible experimentally. Making an assembler that can assemble itself from the parts
that it’s assembling is a focus of my lab, along with collaborations to develop synthetic
cells.
The prospect of physically self-reproducing automata is potentially much scarier
than fears of out-of-control AI, because it moves the intelligence out here to where we
live. It could be a roadmap leading to Terminator’s Skynet robotic overlords. But it’s
also a more hopeful prospect, because an ability to program atoms as well as bits enables
designs to be shared globally while locally producing things like energy, food, and
shelter—all of these are emerging as exciting early applications of digital fabrication.
Wiener worried about the future of work, but he didn’t question implicit assumptions
about the nature of work which are challenged when consumption can be replaced by
creation.
History suggests that neither utopian nor dystopian scenarios prevail; we
generally end up muddling along somewhere in between. But history also suggests that
we don’t have to wait on history. Gordon Moore in 1965 was able to use five years of the
doubling of the specifications of integrated circuits to project what turned out to be fifty
years of exponential improvements in digital technologies. We’ve spent many of those
years responding to, rather than anticipating, its implications. We have more data
available now than Gordon Moore did to project fifty years of doubling the performance
of digital fabrication. With the benefit of hindsight, it should be possible to avoid the
excesses of digital computing and communications this time around, and, from the outset,
address issues like access and literacy.
If the maker movement is the harbinger of a third digital revolution, the success of
AI in meeting many of its own early goals can be seen as the crowning achievement of
the first two digital revolutions. Although machine making and machine thinking might
appear to be unrelated trends, they lie in each other’s futures. The same scaling trends
that have made AI possible suggest that the current mania is a phase that will pass, to be
followed by something even more significant: the merging of artificial and natural
intelligence.
It was an advance for atoms to form molecules, molecules to form organelles,
organelles to form cells, cells to form organs, organs to form organisms, organisms to
form families, families to form societies, and societies to form civilizations. This grand
evolutionary loop can now be closed, with atoms arranging bits arranging atoms.
119
While Danny Hillis was an undergraduate at MIT, he built a computer out of Tinkertoys.
It has around 10,000 wooden parts, plays tic-tac-toe, and never loses; it’s now in the
Computer History Museum, in Mountain View, California.
As a graduate student at the MIT Computer Science and Artificial Intelligence
Laboratory in the early 1980s, Danny designed a massively parallel computer with
64,000 processors. He named it the Connection Machine and founded what may have
been the first AI company—Thinking Machines Corporation—to produce and market it.
This was despite a lunch he had with Richard Feynman, at which the celebrated physicist
remarked, “That is positively the dopiest idea I ever heard.” Maybe “despite” is the
wrong word, since Feynman had a well-known predilection for playing with dopey ideas.
In the event, he showed up on the day the company was incorporated and stayed on, for
summer jobs and special assignments, to make invaluable contributions to its work.
Danny has since established a number of technology companies, of which the
latest is Applied Invention, which partners with commercial enterprises to develop
technological solutions to their most intractable problems. He holds hundreds of U.S.
patents, covering parallel computers, touch interfaces, disk arrays, forgery prevention
methods, and a slew of electronic and mechanical devices. His imagination is apparently
boundless, and here he sketches some possible scenarios that will result from our pursuit
of a better and better AI.
“Our thinking machines are more than metaphors,” he says. “The question is not,
‘Will they be powerful enough to hurt us?’ (they will), or whether they will always act in
our best interests (they won’t), but whether over the long term they can help us find our
way—where we come out on the Panacea/Apocalypse continuum.”
120
THE FIRST MACHINE INTELLIGENCES
W. Daniel Hillis
W. Daniel “Danny” Hillis is an inventor, entrepreneur, and computer scientist, Judge
Widney Professor of Engineering and Medicine at USC, and author of The Pattern on the
Stone: The Simple Ideas That Make Computers Work.
I have spoken of machines, but not only of machines having brains of brass and thews of
iron. When human atoms are knit into an organization in which they are used, not in
their full right as responsible human beings, but as cogs and levers and rods, it matters
little that their raw material is flesh and blood. What is used as an element in a machine,
is in fact an element in the machine. Whether we entrust our decisions to machines of
metal, or to those machines of flesh and blood which are bureaus and vast laboratories
and armies and corporations, we shall never receive the right answers to our questions
unless we ask the right questions…. The hour is very late, and the choice of good and
evil knocks at our door.
—Norbert Wiener, The Human Use of Human Beings
Norbert Wiener was ahead of his time in recognizing the potential danger of emergent
intelligent machines. I believe he was even further ahead in recognizing that the first
artificial intelligences had already begun to emerge. He was correct in identifying the
corporations and bureaus that he called “machines of flesh and blood” as the first
intelligent machines. He anticipated the dangers of creating artificial superintelligences
with goals not necessarily aligned with our own.
What is now clear, whether or not it was apparent to Wiener, is that these
organizational superintelligences are not just made of humans, they are hybrids of
humans and the information technologies that allow them to coordinate. Even in
Wiener’s time, the “bureaus and vast laboratories and armies and corporations” could not
operate without telephones, telegraphs, radios, and tabulating machines. Today they
could not operate without networks of computers, databases, and decision support
systems. These hybrid intelligences are technologically augmented networks of humans.
These artificial intelligences have superhuman powers. They can know more than
individual humans; they can sense more; they can make more complicated analyses and
more complex plans. They can have vastly more resources and power than any single
individual.
Although we do not always perceive it, hybrid superintelligences such as nation
states and corporations have their own emergent goals. Although they are built by and
for humans, they often act like independent intelligent entities, and their actions are not
always aligned to the interests of the people who created them. The state is not always
for the citizen, nor the company for the shareholder. Nor do not-for-profits, religious
orders, or political parties always act in furtherance of their founding principles.
Intuitively, we recognize that their actions are guided by internal goals, which is why we
personify them, both legally and in our habits of thought. When talking about “what
China wants,” or “what General Motors is trying to do,” we are not speaking in
metaphors. These organizations act as intelligences that perceive, decide, and act. Like
the goals of individual humans, the goals of organizations are complex and often selfcontradictory,
but they are true goals in the sense that they direct action. Those goals
121
depend somewhat on the goals of the people within the organization, but they are not
identical.
Any American knows how loose the tie is between the actions of the U.S.
government and the diverse and often contradictory aims of its citizens. That is also true
of corporations. For-profit corporations nominally serve multiple constituencies,
including shareholders, senior executives, employees, and customers. These corporations
differ in how they balance their loyalties and often behave in ways that serve none of
their constituents. The “neurons” that carry their corporate thought are not just the
human employees or the technologies that connect them; they are also coded into the
policies, incentive structures, culture, and procedural habits of the corporation. The
emergent corporate goals do not always reflect the values of the people who implement
them. For instance, an oil company led and staffed by people who care about the
environment may have incentive structures or policies that cause it to compromise
environmental safety for the sake of corporate earnings. The components’ good
intentions are not a guarantee of the emergent system’s good behavior.
Governments and corporations, both built partly of humans, are naturally
motivated to at least appear to share the goals of the humans they depend upon. They
could not function without the people, so they need to keep them cooperative. When
such organizations appear to behave altruistically, this is often part of their motive. I
once complimented the CEO of a large corporation on the contribution his company
made toward a humanitarian relief effort. The CEO responded, without a trace of irony,
“Yes. We have decided to do more things like that to make our brand more likeable.”
Individuals who compose a hybrid superintelligence may occasionally exert a
“humanizing” influence—for example, an employee may break company policies to
accommodate the needs of another human. The employee may act out of true human
empathy, but we should not attribute any such empathy to the superintelligence itself.
These hybrid machines have goals, and their citizens/customers/employees are some of
the resources they use to accomplish them.
We are close to being able to build superintelligences out of pure information
technology, without human components. This is what people normally refer to as
“artificial intelligence,” or AI. It is reasonable to ask what the attitudes of the
hypothetical machine superintelligences will be toward humans. Will they, too, see
humans as useful resources and a good relationship with us as worth preserving? Will
they be constructed to have goals that are aligned with our own? Will a superintelligence
even see these questions as important? What are the “right questions” that we should be
asking? I believe that one of the most important is this: What relationship will various
superintelligences have to one another?
It is interesting to consider how the hybrid superintelligences currently deal with
conflicts among themselves. Today, much of the ultimate power rests in the nation
states, which claim authority over a patch of ground. Whether they are optimized to act
in the interests of their citizens or those of a despotic ruler, nation states assert priority
over other intelligences’ desires or goals within their geographic dominion. They claim a
monopoly on the use of force and recognize only other nation states as peers. They are
willing, if necessary, to demand great sacrifices of their citizens to enforce their authority,
even to the point of sacrificing their citizens’ lives.
122
This geographical division of authority made logical sense when most of the
actors were humans who spent their lives within a single nation state, but now that the
actors of importance include geographically distributed hybrid intelligences such as
multinational corporations, that logic is less obvious. Today we live in a complex
transitional period, when distributed superintelligences still largely rely on the nation
states to settle the arguments arising among them. Often, those arguments are resolved
differently in different jurisdictions. It is becoming more difficult even to assign
individual humans to nation states: International travelers living and working outside
their native country, refugees, and immigrants (documented and not) are still dealt with
as awkward exceptions. Superintelligences built purely of information technology will
prove even more awkward for the territorial system of authority, since there is no reason
why they need to be tied to physical resources in a single country—or even to any
particular physical resources at all. An artificial intelligence might well exist “in the
cloud” rather than at any physical location.
I can imagine at least four scenarios for how machine superintelligences will
relate to hybrid superintelligences.
In one obvious scenario, multiple machine intelligences will ultimately be
controlled by, and allied with, individual nation states. In this state/AI scenario, one can
envision American and Chinese super-AIs wrestling each other for resources on behalf of
their state. In some sense, these AIs would be citizens of their nation state in the way that
many commercial corporations often act as “corporate citizens” today. In this scenario,
the host nation states would presumably give the machine superintelligences the
resources they needed to work for the state’s advantage. Or, to the degree that the
superintelligences can influence their state governments, they will presumably do so to
enhance their own power, for instance by garnering a larger share of the state’s resources.
Nation states’ AIs might not want competing AIs to grow up within their jurisdiction. In
this scenario, the superintelligences become an extension of the state, and vice versa.
The state/AI scenario seems plausible, but it is not our current course. Our most
powerful and rapidly improving artificial intelligences are controlled by for-profit
corporations. This is the corporate/AI scenario, in which the balance of power between
nation states and corporations becomes inverted. Today, the most powerful and
intelligent collections of machines are probably owned by Google, but companies like
Amazon, Baidu, Microsoft, Facebook, Apple, and IBM may not be far behind. These
companies all see a business imperative to build artificial intelligences of their own. It is
easy to imagine a future in which corporations independently build their own machine
intelligences, protected within firewalls preventing the machines from taking advantage
of one another’s knowledge. These machines will be designed to have goals aligned with
those of the corporation. If this alignment is effective, nation states may continue to lag
behind in developing their own artificial-intelligence capability and instead depend on
their “corporate citizens” to do it for them. To the extent that corporations successfully
control the goals, they will become more powerful and autonomous than nation states.
Another scenario, perhaps the one people fear the most, is that artificial
intelligences will not be aligned with either humans or hybrid superintelligences but will
act solely in their own interest. They might even merge into a single machine
superintelligence, since there may be no technical requirement for machine intelligences
to maintain distinct identities. The attitude of a self-interested super-AI toward hybrid
123
superintelligences is likely to be competitive. Humans might be seen as minor
annoyances, like ants at a picnic, but hybrid superintelligences—like corporations,
organized religions, and nation states—could be existential threats. Like hybrid
superintelligences, AIs might see humans mostly as useful tools to accomplish their
goals, as pawns in their competition with the other superintelligences. Or we might
simply be irrelevant. It is not impossible that a machine intelligence has already emerged
and we simply do not recognize it as such. It may not wish to be noticed, or it may be so
alien to us that we are incapable of perceiving it. This makes the self-interested AI
scenario the most difficult to imagine. I believe the easy-to-imagine versions, like the
humanoid intelligent robots of science fiction, are the least likely. Our most complex
machines, like the Internet, have already grown beyond the detailed understanding of a
single human, and their emergent behaviors may be well beyond our ken.
The final scenario is that machine intelligences will not be allied with one another
but instead will work to further the goals of humanity as a whole. In this optimistic
scenario, AI could help us restore the balance of power between the individual and the
corporation, between the citizen and the state. It could help us solve the problems that
have been created by hybrid superintelligences that subvert the goals of humans. In this
scenario, AIs will empower us by giving us access to processing capacity and knowledge
currently available only to corporations and states. In effect, they could become
extensions of our own individual intelligences, in furtherance of our human goals. They
could make our weak individual intelligences strong. This prospect is both exciting and
plausible. It is plausible because we have some choice in what we build, and we have a
history of using technology to expand and augment our human capacities. As airplanes
have given us wings and engines have given us muscles to move mountains, so our
network of computers may amplify and extend our minds. We may not fully understand
or control our destiny, but we have a chance to bend it in the direction of our values. The
future is not something that will happen to us; it is something that we will build.
Why Wiener Saw What Others Missed
There is in electrical engineering a split which is known in Germany as the split between
the technique of strong currents and the technique of weak currents, and which we know
as the distinction between power and communication engineering. It is this split which
separates the age just past from that in which we are now living.
—Norbert Wiener, Cybernetics, or Control and
Communication in the Animal and the Machine
Cybernetics is the study of the how the weak can control the strong. Consider the
defining metaphor of the field: the helmsman guiding a ship with a tiller. The
helmsman’s goal is to control the heading of the ship, to keep it on the right course. The
information, the message that is sent to the helmsman, comes from the compass or the
stars, and the helmsman closes the feedback loop by sending the steering messages
through the gentle force of his hand on the tiller. In this picture, we see the ship tossing
in powerful wind and waves in the real world, controlled by the communication system
of messages in the world of information.
Yet the distinction between “real” and “information” is mostly a difference in
perspective. The signals that carry messages, like the light of the stars and pressure of the
124
hand on the tiller, exist in a world of energy and forces, as does the helmsman. The weak
forces that control the rudder are as real and physical as the strong forces that toss the
ship. If we shift our cybernetics perspective from the ship to the helmsman, the pressures
on the rudder become a strong force of muscles controlled by the weak signals in the
mind of the helmsman. These messages in the helmsman’s mind are amplified into a
physical force strong enough to steer the ship. Or instead, we can zoom out and take a
large cybernetics perspective. We might see the ship itself as part of a vast trade
network, part of a feedback loop that regulates the price of commodities through the flow
of goods. In this perspective, the tiny ship is merely a messenger. So, the distinction
between the physical world and the information world is a way to describe the
relationship between the weak and the strong.
Wiener chose to view the world from the vantage point and scale of the individual
human. As a cyberneticist, he took the perspective of the weak protagonist embedded
within a strong system, trying to make the best of limited powers. He incorporated this
perspective in his very definition of information. “Information,” he said, “is a name for
the content of what is exchanged with the outer world as we adjust to it, and make our
adjustment felt upon it.” In his words, information is what we use to “live effectively
within that environment.” 33 For Wiener, information is a way for the weak to effectively
cope with the strong. This viewpoint is also reflected in Gregory Bateson’s definition of
information as “a difference that makes a difference,” by which he meant the small
difference that makes a big difference.
The goal of cybernetics was to create a tiny model of the system using “weak
currents” to amplify and control “strong currents” of the real world. The central insight
was that a control problem could be solved by building an analogous system in the
information space of messages and then amplifying solutions into the larger world of
reality. Inherent in the motion of a control system is the concept of amplification, which
makes the small big and the weak strong. Amplification allows the difference that makes
a difference to make a difference.
In this way of looking at the world, a control system needed to be as complex as
the system it controlled. Cyberneticist W. Ross Ashby proved that this was true in a
precise mathematical sense, in what is now called Ashby’s Law of Requisite Variety, or
sometimes the First Law of Cybernetics. The law tells us that to control a system
completely, the controller must be as complex as the controlled. Thus cyberneticists
tended to see control systems as a kind of analog of the systems they governed, like the
homunculus—the hypothetical little person inside the brain who controls the actual
person.
This notion of analogous structure is sometimes confused with the notion of
analog encoding of messages, but the two are logically distinct. Norbert Wiener was
much impressed with Vannevar Bush’s Digital Differential Analyzer, which could be
reconfigured to match the structure of whatever problem it was given to solve but used
digital signal encoding. Signals could be simplified to openly represent the relevant
distinctions, allowing them to be more accurately communicated and stored. In digital
signals, one needed only to preserve the difference in signals that made a difference. It is
this distinction and signal coding that we commonly use to distinguish “analog” versus
“digital.” Digital signal encoding was entirely compatible with cybernetic thinking—in
33
The Human Use of Human Beings (Boston: Houghton Mifflin, 1954), p. 17-18.
125
fact, enabling to it. What was constraining to cybernetics was the presumption of an
analogy of structure between the controller and the controlled. By the 1930s, Kurt Gödel,
Alonzo Church, and Alan Turing had all described universal systems of computation, in
which the computation required no structural analogy to functions that were computed.
These universal computers could also compute the functions of control.
The analogy of structure between the controller and the controlled was central to
the cybernetic perspective. Just as digital coding collapses the space of possible
messages into a simplified version that represents only the difference that makes a
difference, so the control system collapses the state space of a controlled system into a
simplified model that reflects only the goals of the controller. Ashby’s Law does not
imply that every controller must model every state of the system but only those states that
matter for advancing the controller’s goals. Thus, in cybernetics, the goal of the
controller becomes the perspective from which the world is viewed.
Norbert Wiener adopted the perspective of the individual human relating to vast
organizations and trying to “live effectively within that environment.” He took the
perspective of the weak trying to influence the strong. Perhaps this is why he was able to
notice the emergent goals of the “machines of flesh and blood” and anticipate some of the
human challenges posed by these new intelligences, hybrid machine intelligences with
goals of their own.
126
Venki Ramakrishnan is a Nobel Prize-winning biologist whose many scientific
contributions include his work on the atomic structure of the ribosome—in effect, a huge
molecular machine that reads our genes and makes proteins. His work would have been
impossible without powerful computers. The Internet made his own work a lot easier
and, he notes, acted as a leveler internationally: “When I grew up in India, if you wanted
to get a book, it would show up six months or a year after it had already come out in the
West. . . . Journals would arrive by surface mail a few months later. I didn’t have to
deal with it, because I left India when I was nineteen, but I know Indian scientists had to
deal with it. Today they have access to information at the click of a button. More
important, they have access to lectures. They can listen to Richard Feynman. That
would have been a dream of mine when I was growing up. They can just watch Richard
Feynman on the Web. That’s a big leveling in the field.” And yet. . . “Along with the
benefits [of the Web], there is now a huge amount of noise. You have all of these people
spouting pseudoscientific jargon and pushing their own ideas as if they were science.”
As president of the Royal Society, Venki worries, too, about the broader issue of
trust: public trust in evidence-based scientific findings, but also trust among scientists,
bolstered by rigorous checking of one another’s conclusions—trust that is in danger of
eroding because of the “black box” character of deep-learning computers. “This
[erosion] is going to happen more and more, as data sets get bigger, as we have genomewide
studies, population studies, and all sorts of things,” he says. “How do we, as a
science community, grapple with this and communicate to the public a sense of what
science is about, what is reliable in science, what is uncertain in science, and what is just
plain wrong in science?”
127
WILL COMPUTERS BECOME OUR OVERLORDS?
Venki Ramakrishnan
Venki Ramakrishnan is a scientist at the Medical Research Council Laboratory of
Molecular Biology, Cambridge University; recipient of the Nobel Prize in Chemistry
(2009); current president of the Royal Society; and the author of Gene Machine: The
Race to Discover the Secrets of the Ribosome.
A former colleague of mine, Gérard Bricogne, used to joke that carbon-based intelligence
was simply a catalyst for the evolution of silicon-based intelligence. For quite a long
time, both Hollywood movies and scientific Jeremiahs have been predicting our eventual
capitulation to our computer overlords. We all await the singularity, which always seems
to be just over the horizon.
In a sense, computers have already taken over, facilitating virtually every aspect
of our lives—from banking, travel, and utilities to the most intimate personal
communication. I can see and talk to my grandson in New York for free. I remember
when I first saw the 1968 movie 2001: A Space Odyssey, the audience laughed at the
absurdly cheap cost of a picturephone call from space: $1.70, at a time when a longdistance
call within the U.S. was $3 per minute.
However, the convenience and power of computers is also something of a
Faustian bargain, for it comes with a loss of control. Computers prevent us from doing
things we want. Try getting on a flight if you arrive at the airport and the airline
computer systems are down, as happened not so long ago to British Airways at Heathrow.
The planes, pilots, and passengers were all there; even the air-traffic controls were
working. But no flights for that airline were allowed to take off. Computers also make
us do things we don’t want—by generating mailing lists and print labels to send us all
millions of pieces of unwanted mail, which we humans have to sort, deliver, and dispose
of.
But you ain’t seen nothing yet. In the past, we programmed computers using
algorithms we understood at least in principle. So when machines did amazing things
like beating world chess champion Garry Kasparov, we could say that the victorious
programs were designed with algorithms based on our own understanding—using, in this
instance, the experience and advice of top grandmasters. Machines were simply faster at
doing brute-force calculations, had prodigious amounts of memory, and were not prone to
errors. One article described Deep Blue’s victory not as that of a computer, which was
just a dumb machine, but as the victory of hundreds of programmers over Kasparov, a
single individual.
That way of programming is changing dramatically. After a long hiatus, the
power of machine learning has taken off. Much of the change came when programmers,
rather than trying to anticipate and code for every possible contingency, allowed
computers to train themselves on data, using deep neural networks based on models of
how our own brains learn. They use probabilistic methods to “learn” from large
quantities of data; computers can recognize patterns and come up with conclusions on
their own. A particularly powerful method is called reinforcement learning, by which the
computer learns, without prior input, which variables are important and how much to
128
weight them to reach a certain goal. This method in some sense mimics how we learn as
children. The results from these new approaches are amazing.
Such a deep-learning program was used to teach a computer to play Go, a game
that only a few years ago was thought to be beyond the reach of AI because it was so
hard to calculate how well you were doing. It seemed that top Go players relied a great
deal on intuition and a feel for position, so proficiency was thought to require a
particularly human kind of intelligence. But the AlphaGo program produced by
DeepMind, after being trained on thousands of high-level Go games played by humans
and then millions of games with itself, was able to beat the top human players in short
order. Even more amazingly, the related AlphaGo Zero program, which learned from
scratch by playing itself, was stronger than the version trained initially on human games!
It was as though the humans had been preventing the computer from reaching its true
potential. The same method has recently been generalized: Starting from scratch, within
just twenty-four hours, an equivalent AlphaZero chess program was able to beat today’s
top “conventional” chess programs, which in turn have beaten the best humans.
Progress has not been restricted to games. Computers are significantly better at
image and voice recognition and speech synthesis than they used to be. They can detect
tumors in radiographs earlier than most humans. Medical diagnostics and personalized
medicine will improve substantially. Transportation by self-driving cars will keep us all
safer, on average. My grandson may never have to acquire a driver’s license, because
driving a car will be like riding a horse today—a hobby for the few. Dangerous
activities, such as mining, and tedious repetitive work will be done by computers.
Governments will offer better targeted, more personalized and efficient public services.
AI could revolutionize education by analyzing an individual pupil’s needs and enabling
customized teaching, so that each student can advance at an optimal rate.
Along with these huge benefits, of course, will come alarming risks. With the
vast amounts of personal data, computers will learn more about us than we may know
about ourselves; the question of who owns data about us will be paramount. Moreover,
data-based decisions will undoubtedly reflect social biases: Even an allegedly neutral
intelligent system designed to predict loan risks, say, may conclude that mere
membership in a particular minority group makes you more likely to default on a loan.
While this is an obvious example that we could correct, the real danger is that we are not
always aware of biases in the data and may simply perpetuate them.
Machine learning may also perpetuate our own biases. When Netflix or Amazon
tries to tell you what you might want to watch or buy, this is an application of machine
learning. Currently such suggestions are sometimes laughable, but with time and more
data they will get increasingly accurate, reinforcing our prejudices and likes and dislikes.
Will we miss out on the random encounter that might persuade us to change our views by
exposing us to new and conflicting ideas? Social media, given its influence on elections,
is a particularly striking illustration of how the divide between people on different sides
of the political spectrum can be accentuated.
We may have already reached the stage where most governments are powerless to
resist the combined clout of a few powerful multinational companies that control us and
our digital future. The fight between dominant companies today is really a fight for
control over our data. They will use their enormous influence to prevent regulation of
data, because their interests lie in unfettered control of it. Moreover, they have the
129
financial resources to hire the most talented workers in the field, enhancing their power
even further. We have been giving away valuable data for the sake of freebies like Gmail
and Facebook, but as the journalist and author John Lanchester has pointed out in the
London Review of Books, if it is free, then you are the product. Their real customers are
the ones who pay them for access to knowledge about us, so that they can persuade us to
buy their products or otherwise influence us. One way around the monopolistic control
of data is to split the ownership of data away from firms that use them. Individuals
would instead own and control access to their personal data (a model that would
encourage competition, since people would be free to move their data to a company that
offered better services). Finally, abuse of data is not limited to corporations: In
totalitarian states, or even nominally democratic ones, governments know things about
their citizens that Orwell could not have imagined. The use they make of this
information may not always be transparent or possible to counter.
The prospect of AI for military purposes is frightening. One can imagine
intelligent systems being designed to act autonomously based on real-time data and able
to act faster than the enemy, starting catastrophic wars. Such wars may not necessarily
be conventional or even nuclear wars. Given how essential computer networks are to
modern society, it is much more likely that AI wars will be fought in cyberspace. The
consequences could be just as dire.
~ ~ ~
Despite this loss of control, we continue to march inexorably into a world in which AI
will be everywhere: Individuals won’t be able to resist its convenience and power, and
corporations and governments won’t be able to resist its competitive advantages. But
important questions arise about the future of work. Computers have been responsible for
considerable losses in blue-collar jobs in the last few decades, but until recently many
white-collar jobs—jobs that “only humans can do”—were thought to be safe. Suddenly
that no longer appears to be true. Accountants, many legal and medical professionals,
financial analysts and stockbrokers, travel agents—in fact, a large fraction of white-collar
jobs—will disappear as a result of sophisticated machine-learning programs. We face a
future in which factories churn out goods with very few employees and the movement of
goods is largely automated, as are many services. What’s left for humans to do?
In 1930—long before the advent of computers, let alone AI—John Maynard
Keynes wrote, in an essay called “Economic Possibilities for our Grandchildren,” that as
a result of improvements in productivity, society could produce all its needs with a
fifteen-hour work week. He also predicted, along with the growth of creative leisure, the
end of money and wealth as a goal:
We shall be able to afford to dare to assess the money-motive at its true value.
The love of money as a possession—as distinguished from the love of money as
a means to the enjoyments and realities of life—will be recognised for what it is,
a somewhat disgusting morbidity, one of those semi-criminal, semi-pathological
propensities which one hands over with a shudder to the specialists in mental
disease.
130
Sadly, Keynes’s predictions did not come true. Although productivity did indeed
increase, the system—possibly inherent in a market economy—did not result in humans
working much shorter hours. Rather, what happened is what the anthropologist and
anarchist David Graeber describes as the growth of “bullshit jobs.” 34 While jobs that
produce essentials like food, shelter, and goods have been largely automated away, we
have seen an enormous expansion of sectors like corporate law, academic and health
administration (as opposed to actual teaching, research, and the practice of medicine),
“human resources,” and public relations, not to mention new industries like financial
services and telemarketing and ancillary industries in the so-called gig economy which
serve those who are too busy doing all that additional work.
How will societies cope with technology’s increasingly rapid destruction of entire
professions and throwing large numbers of people out of work? Some argue that this
concern is based on a false premise, because new jobs spring up that didn’t exist before,
but as Graeber points out, these new jobs won’t necessarily be rewarding or fulfilling.
During the first industrial revolution, it took almost a century before most people were
better off. That revolution was possible only because the government of the time
ruthlessly favored property rights over labor, and most people (and all women) did not
have the vote. In today’s democratic societies, it is not clear that the population will
tolerate such a dramatic upheaval of society based on the promise that “eventually”
things will get better.
Even that rosy vision will depend on a radical shake-up of education and lifelong
learning. The Industrial Revolution did trigger enormous social change of this kind,
including a shift to universal education. But it will not happen unless we make it happen:
This is essentially about power, agency, and control. What’s next for, say, the forty-yearold
taxi driver or truck driver in an era of autonomous vehicles?
One idea that has been touted is that of a universal basic income, which will allow
citizens to pursue their interests, retrain for new occupations, and generally be free to live
a decent life. However, market economies, which are predicated on growing consumer
demand over all else, may not tolerate this innovation. There is also a feeling among
many that meaningful work is essential to human dignity and fulfillment. So another
possibility is that the enormous wealth generated by increased productivity due to
automation could be redistributed to jobs requiring human labor and creativity in fields
such as the arts, music, social work, and other worthwhile pursuits. Ultimately, which
jobs are rewarding or productive and which are “bullshit” is a matter of judgment and
may vary from society to society, as well as over time.
~ ~ ~
So far, I’ve focused on AI’s practical consequences. As a scientist, what bothers me is
our potential loss of understanding. We are now accumulating data at an incredible rate.
In my own lab, an experiment generates over a terabyte of data a day. These data are
massaged, analyzed, and reduced until there is an interpretable result. But in all of this
data analysis, we believe we know what’s happening. We know what the programs are
34
https://strikemag.org/bullshit-jobs/
131
doing because we designed the algorithms at their heart. So when our computers
generate a result, we feel that we intellectually grasp it.
The new machine-learning programs are different. Having recognized patterns
via deep neural networks, they come up with conclusions, and we have no idea exactly
how. When they uncover relationships, we don’t understand it in the same way as if we
had deduced those relationships ourselves using an underlying theoretical framework. As
data sets become larger, we won’t be able to analyze them ourselves even with the help
of computers; rather, we will rely entirely on computers to do the analysis for us. So if
someone asks us how we know something, we will simply say it is because the machine
analyzed the data and produced the conclusion.
One day a computer may well come up with an entirely new result—e.g., a
mathematical theorem whose proof, or even whose statement, no human can understand.
That is philosophically different from the way we have been doing science. Or at least
thought we had; some might argue that we don’t know how our own brains reach
conclusions either, and that these new methods are a way of mimicking learning by the
human brain. Nevertheless, I find this potential loss of understanding disturbing.
Despite the remarkable advances in computing, the hype about AGI—a generalintelligence
machine that will think like a human and possibly develop consciousness—
smacks of science fiction to me, partly because we don’t understand the brain at that level
of detail. Not only do we not understand what consciousness is, we don’t even
understand a relatively simple problem like how we remember a phone number. In just
that one question, there are all sorts of things to consider. How do we know it is a
number? How do we associate it with a person, a name, face, and other characteristics?
Even such seemingly trivial questions involve everything from high-level cognition and
memory to how a cell stores information and how neurons interact.
Moreover, that’s just one task among many that the brain does effortlessly.
Whereas machines will no doubt do ever more amazing things, they’re unlikely to be a
replacement for human thought and human creativity and vision. Eric Schmidt, former
chairman of Google’s parent company, said in a recent interview at the London Science
Museum that even designing a robot that would clear the table, wash the dishes, and put
them away was a huge challenge. The calculations involved in figuring out all the
movements the body has to make to throw a ball accurately or do slalom skiing are
prodigious. The brain can do all these and also do mathematics and music, and invent
games like chess and Go, not just play them. We tend to underestimate the complexity
and creativity of the human brain and how amazingly general it is.
If AI is to become more humanlike in its abilities, the machine-learning and
neuroscience communities need to interact closely, something that is happening already.
Some of today’s greatest exponents of machine learning—such as Geoffrey Hinton,
Zoubin Ghahramani, and Demis Hassabis—have backgrounds in cognitive neuroscience,
and their success has been at least in part due to attempts to model brainlike behavior in
their algorithms. At the same time, neurobiology has also flourished. All sorts of tools
have been developed to watch which neurons are firing and genetically manipulate them
and see what’s happening in real time with inputs. Several countries have launched
moon-shot neuroscience initiatives to see if we can crack the workings of the brain.
Advances in AI and neuroscience seem to go hand in hand; each field can propel the
other.
132
Many evolutionary scientists, and such philosophers as Daniel Dennett, have
pointed out that the human brain is the result of billions of years of evolution. 35 Human
intelligence is not the special characteristic we think it is, but just another survival
mechanism not unlike our digestive or immune systems, both of which are also
amazingly complex. Intelligence evolved because it allowed us to make sense of the
world around us, to plan ahead, and thus cope with all sorts of unexpected things in order
to survive. However, as Descartes stated, we humans define our very existence by our
ability to think. So it is not surprising that, in an anthropomorphic way, our fears about
AI reflect this belief that our intelligence is what makes us special.
But if we step back and look at life on Earth, we see that we are far from the most
resilient species. If we’re going to be taken over at some point, it will be by some of
Earth’s oldest life-forms, like bacteria, which can live anywhere from Antarctica to deepsea
thermal vents hotter than boiling water, or in acid environments that would melt you
and me. So when people ask where we’re headed, we need to put the question in a
broader context. I don’t know what sort of future AI will bring: whether AI will make
humans subservient or obsolete or will be a useful and welcome enhancement of our
abilities which will enrich our lives. But I am reasonably certain that computers will
never be the overlords of bacteria.
35
See, for example, Dennett’s From Bacteria to Bach and Back: The Evolution of Minds (New York: W.
W. Norton, 2017).
133
Alex “Sandy” Pentland, an exponent of what he has termed “social physics,” is
interested in building powerful human-AI ecologies. He is concerned at the same time
about the potential dangers of decision-making systems in which the data in effect take
over and human creativity is relegated to the background.
The advent of Big Data, he believes, has given us the opportunity to reinvent our
civilization: “We can now begin to actually look at the details of social interaction and
how those play out, and we’re no longer limited to averages like market indices or
election results. This is an astounding change. The ability to see the details of the
market, of political revolutions, and to be able to predict and control them is definitely a
case of Promethean fire—it could be used for good or for ill. Big Data brings us to
interesting times.”
At our group meeting in Washington, Connecticut, he confessed that reading
Norbert Wiener on the concept of feedback “felt like reading my own thoughts.”
“After Wiener, people discovered or focused on the fact that there are genuinely
chaotic systems that are just not predictable,” he said, “but if you look at human
socioeconomic systems, there is a large percentage of variance you can account for and
predict. . . . Today there is data from all sorts of digital devices, and from all of our
transactions. The fact that everything is datafied means you can measure things in real
time in most aspects of human life—and increasingly in every aspect of human life. The
fact that we have interesting computers and machine-learning techniques means that you
can build predictive models of human systems in ways you could never do before.”
134
THE HUMAN STRATEGY
Alex “Sandy” Pentland
Alex “Sandy” Pentland is Toshiba Professor and professor of media arts and sciences,
MIT; director of the Human Dynamics and Connection Science labs and the Media Lab
Entrepreneurship Program, and the author of Social Physics.
In the last half-century, the idea of AI and intelligent robots has dominated thinking about
the relationship between humans and computers. In part, this is because it’s easy to tell
the stories about AI and robots, and in part because of early successes (e.g., theorem
provers that reproduced most of Whitehead and Russell’s Principia Mathematica) and
massive military funding. The earlier and broader vision of cybernetics, which
considered the artificial as part of larger systems of feedback and mutual influence, faded
from public awareness.
However, in the intervening years the cybernetics vision has slowly grown and
quietly taken over—to the point where it is “in the air.” State-of-the-art research in most
engineering disciplines is now framed as feedback systems that are dynamic and driven
by energy flows. Even AI is being recast as human/machine “advisor” systems, and the
military is beginning large-scale funding in this area—something that should perhaps
worry us more than drones and independent humanoid robots.
But as science and engineering have adopted a more cybernetics-like stance, it has
become clear that even the vision of cybernetics is far too small. It was originally
centered on the embeddedness of the individual actor but not on the emergent properties
of a network of actors. This is unsurprising, because the mathematics of networks did not
exist until recently, so a quantitative science of how networks behave was impossible.
We now know that study of the individual does not produce understanding of the system
except in certain simple cases. Recent progress in this area was foreshadowed by
understanding that “chaos,” and later “complexity,” were the typical behavior of systems,
but we can now go far beyond these statistical understandings.
We’re beginning to be able to analyze, predict, and even design the emergent
behavior of complex heterogeneous networks. The cybernetics view of the connected
individual actor can now be expanded to cover complex systems of connected individuals
and machines, and the insights we obtain from this broader view are fundamentally
different from those obtained from the cybernetics view. Thinking about the network is
analogous to thinking about entire ecosystems. How would you guide ecosystems to
grow in a good direction? What do you even mean by “a good direction”? Questions
like this are beyond the boundary of traditional cybernetic thinking.
Perhaps the most stunning realization is that humans are already beginning to use
AI and machine learning to guide entire ecosystems, including ecosystems of people, thus
creating human-AI ecologies. Now that everything is becoming “datafied,” we can
measure most aspects of human life and, increasingly, aspects of all life. This, together
with new, powerful machine-learning techniques, means that we can build models of
these ecologies in ways we couldn’t before. Well-known examples are weather- and
traffic-prediction models, which are being extended to predict the global climate and plan
city growth and renewal. AI-aided engineering of the ecologies is already here.
135
Development of human-AI ecosystems is perhaps inevitable for a social species
such as ourselves. We became social early in our evolution, millions of years ago. We
began exchanging information with one another to stay alive, to increase our fitness. We
developed writing to share abstract and complex ideas, and most recently we’ve
developed computers to enhance our communication abilities. Now we’re developing AI
and machine-learning models of ecosystems and sharing the predictions of those models
to jointly shape our world through new laws and international agreements.
We live in an unprecedented historic moment, in which the availability of vast
amounts of human behavioral data and advances in machine learning enable us to tackle
complex social problems through algorithmic decision making. The opportunities for
such a human-AI ecology to have positive social impact through fairer and more
transparent decisions are obvious. But there are also risks of a “tyranny of algorithms,”
where unelected data experts are running the world. The choices we make now are
perhaps even more momentous than those we faced in the 1950s, when AI and
cybernetics were created. The issues look similar, but they’re not. We have moved down
the road, and now the scope is larger. It’s not just AI robots versus individuals. It’s AI
guiding entire ecologies.
~ ~ ~
How can we make a good human-artificial ecosystem, something that’s not a machine
society but a cyberculture in which we can all live as humans—a culture with a human
feel to it? We don’t want to think small—for example, to talk only of robots and selfdriving
cars. We want this to be a global ecology. Think Skynet-size. But how would
you make Skynet something that’s about the human fabric?
The first thing to ask is: What’s the magic that makes the current AI work?
Where is it wrong and where is it right?
The good magic is that it has something called the credit-assignment function.
What that lets you do is take “stupid neurons”—little linear functions—and figure out, in
a big network, which ones are doing the work and strengthen them. It’s a way of taking a
random bunch of switches all hooked together in a network and making them smart by
giving them feedback about what works and what doesn’t. This sounds simple, but
there’s some complicated math around it. That’s the magic that makes current AI work.
The bad part of it is, because those little neurons are stupid, the things they learn
don’t generalize very well. If an AI sees something it hasn’t seen before, or if the world
changes a little bit, the AI is likely to make a horrible mistake. It has absolutely no sense
of context. In some ways, it’s as far from Norbert Wiener’s original notion of
cybernetics as you can get, because it isn’t contextualized; it’s a little idiot savant.
But imagine that you took away those limitations: Imagine that instead of using
dumb neurons, you used neurons in which real-world knowledge was embedded. Maybe
instead of linear neurons, you used neurons that were functions in physics, and then you
tried to fit physics data. Or maybe you put in a lot of knowledge about humans and how
they interact with one another—the statistics and characteristics of humans.
When you add this background knowledge and surround it with a good creditassignment
function, then you can take observational data and use the credit-assignment
function to reinforce the functions that are producing good answers. The result is an AI
that works extremely well and can generalize. For instance, in solving physical
136
problems, it often takes only a couple of noisy data points to get something that’s a
beautiful description of a phenomenon, because you’re putting in knowledge about how
physics works. That’s in huge contrast to normal AI, which requires millions of training
examples and is very sensitive to noise. By adding the appropriate background
knowledge, you get much more intelligence.
Similar to the physical-systems case, if we make neurons that know a lot about
how humans learn from each other, then we can detect human fads and predict human
behavior trends in surprisingly accurate and efficient ways. This “social physics” works
because human behavior is determined as much by the patterns of our culture as by
rational, individual thinking. These patterns can be described mathematically and
employed to make accurate predictions.
This idea of a credit-assignment function reinforcing connections between
neurons that are doing the best work is the core of current AI. If you make those little
neurons smarter, the AI gets smarter. So, what would happen if we replaced the neurons
with people? People have lots of capabilities. They know lots of things about the world;
they can perceive things in a broadly competent, human way. What would happen if you
had a network of people in which you could reinforce the connections that were helping
and minimize the connections that weren’t?
That begins to sound like a society, or a company. We all live in a human social
network. We’re reinforced for doing things that seem to help everybody and discouraged
from doing things that are not appreciated. Culture is the result of this sort of human AI
as applied to human problems; it is the process of building social structures by
reinforcing the good connections and penalizing the bad. Once you’ve realized you can
take this general AI framework and create a human AI, the question becomes, What’s the
right way to do that? Is it a safe idea? Is it completely crazy?
My students and I are looking at how people make decisions, on huge databases
of financial decisions, business decisions, and many other sorts of decisions. What we’ve
found is that humans often make decisions in a way that mimics AI credit-assignment
algorithms and works to make the community smarter. A particularly interesting feature
of this work is that it addresses a classic problem in evolution known as the groupselection
problem. The core of this problem is: How can we select for culture in
evolution, when it’s the individuals that reproduce? What you need is something that
selects for the best cultures and the best groups but also selects for the best individuals,
because they’re the units that transmit the genes.
When you frame the question this way and go through the mathematical literature,
you discover that there’s one generally best way to do this. It’s called “distributed
Thompson sampling,” a mathematical algorithm used in choosing, out of a set of possible
actions with unknown payoffs, the action that maximizes the expected reward in respect