Algorithms to Live By: The Computer Science of Human Decisions. Brian Christian
Чтение книги онлайн.
Читать онлайн книгу Algorithms to Live By: The Computer Science of Human Decisions - Brian Christian страница 16
So, while we tend to commit to a new secretary too soon, it seems like we tend to stop trying new airlines too late. But just as there’s a cost to not having a secretary, there’s a cost to committing too soon to a particular airline: the world might change.
The standard multi-armed bandit problem assumes that the probabilities with which the arms pay off are fixed over time. But that’s not necessarily true of airlines, restaurants, or other contexts in which people have to make repeated choices. If the probabilities of a payoff on the different arms change over time—what has been termed a “restless bandit”—the problem becomes much harder. (So much harder, in fact, that there’s no tractable algorithm for completely solving it, and it’s believed there never will be.) Part of this difficulty is that it is no longer simply a matter of exploring for a while and then exploiting: when the world can change, continuing to explore can be the right choice. It might be worth going back to that disappointing restaurant you haven’t visited for a few years, just in case it’s under new management.
In his celebrated essay “Walking,” Henry David Thoreau reflected on how he preferred to do his traveling close to home, how he never tired of his surroundings and always found something new or surprising in the Massachusetts landscape. “There is in fact a sort of harmony discoverable between the capabilities of the landscape within a circle of ten miles’ radius, or the limits of an afternoon walk, and the threescore years and ten of human life,” he wrote. “It will never become quite familiar to you.”
To live in a restless world requires a certain restlessness in oneself. So long as things continue to change, you must never fully cease exploring.
Still, the algorithmic techniques honed for the standard version of the multi-armed bandit problem are useful even in a restless world. Strategies like the Gittins index and Upper Confidence Bound provide reasonably good approximate solutions and rules of thumb, particularly if payoffs don’t change very much over time. And many of the world’s payoffs are arguably more static today than they’ve ever been. A berry patch might be ripe one week and rotten the next, but as Andy Warhol put it, “A Coke is a Coke.” Having instincts tuned by evolution for a world in constant flux isn’t necessarily helpful in an era of industrial standardization.
Perhaps most importantly, thinking about versions of the multi-armed bandit problem that do have optimal solutions doesn’t just offer algorithms, it also offers insights. The conceptual vocabulary derived from the classical form of the problem—the tension of explore/exploit, the importance of the interval, the high value of the 0–0 option, the minimization of regret—gives us a new way of making sense not only of specific problems that come before us, but of the entire arc of human life.
Explore …
While laboratory studies can be illuminating, the interval of many of the most important problems people face is far too long to be studied in the lab. Learning the structure of the world around us and forming lasting social relationships are both lifelong tasks. So it’s instructive to see how the general pattern of early exploration and late exploitation appears over the course of a lifetime.
One of the curious things about human beings, which any developmental psychologist aspires to understand and explain, is that we take years to become competent and autonomous. Caribou and gazelles must be prepared to run from predators the day they’re born, but humans take more than a year to make their first steps. Alison Gopnik, professor of developmental psychology at UC Berkeley and author of The Scientist in the Crib, has an explanation for why human beings have such an extended period of dependence: “it gives you a developmental way of solving the exploration/exploitation tradeoff.” As we have seen, good algorithms for playing multi-armed bandits tend to explore more early on, exploiting the resulting knowledge later. But as Gopnik points out, “the disadvantage of that is that you don’t get good payoffs when you are in the exploration stage.” Hence childhood: “Childhood gives you a period in which you can just explore possibilities, and you don’t have to worry about payoffs because payoffs are being taken care of by the mamas and the papas and the grandmas and the babysitters.”
Thinking about children as simply being at the transitory exploration stage of a lifelong algorithm might provide some solace for parents of preschoolers. (Tom has two highly exploratory preschool-age daughters, and hopes they are following an algorithm that has minimal regret.) But it also provides new insights about the rationality of children. Gopnik points out that “if you look at the history of the way that people have thought about children, they have typically argued that children are cognitively deficient in various ways—because if you look at their exploit capacities, they look terrible. They can’t tie their shoes, they’re not good at long-term planning, they’re not good at focused attention. Those are all things that kids are really awful at.” But pressing buttons at random, being very interested in new toys, and jumping quickly from one thing to another are all things that kids are really great at. And those are exactly what they should be doing if their goal is exploration. If you’re a baby, putting every object in the house into your mouth is like studiously pulling all the handles at the casino.
More generally, our intuitions about rationality are too often informed by exploitation rather than exploration. When we talk about decision-making, we usually focus just on the immediate payoff of a single decision—and if you treat every decision as if it were your last, then indeed only exploitation makes sense. But over a lifetime, you’re going to make a lot of decisions. And it’s actually rational to emphasize exploration—the new rather than the best, the exciting rather than the safe, the random rather than the considered—for many of those choices, particularly earlier in life.
What we take to be the caprice of children may be wiser than we know.
… And Exploit
I had reached a juncture in my reading life that is familiar to those who have been there: in the allotted time left to me on earth, should I read more and more new books, or should I cease with that vain consumption—vain because it is endless—and begin to reread those books that had given me the intensest pleasure in my past.
—LYDIA DAVIS
At the other extreme from toddlers we have the elderly. And thinking about aging from the perspective of the explore/exploit dilemma also provides some surprising insights into how we should expect our lives to change as time goes on.
Laura Carstensen, a professor of psychology at Stanford, has spent her career challenging our preconceptions about getting older. Particularly, she has investigated exactly how, and why, people’s social relationships change as they age. The basic pattern is clear: the size of people’s social networks (that is, the number of social relationships they engage in) almost invariably decreases over time. But Carstensen’s research has transformed how we should think about this phenomenon.
The traditional explanation for the elderly having smaller social networks is that it’s just one example of the decrease in quality of life that comes with aging—the result of diminished ability to contribute to social relationships, greater fragility, and general disengagement from society. But Carstensen has argued that, in fact, the elderly have fewer social relationships by choice. As she puts it, these decreases are “the result of lifelong selection processes by which people strategically and adaptively cultivate their social networks to maximize social and emotional gains and minimize social and emotional risks.”
What Carstensen and her colleagues found is that the shrinking of social networks with aging is due primarily to “pruning” peripheral relationships and