Algorithms to Live By: The Computer Science of Human Decisions. Brian Christian
Чтение книги онлайн.
Читать онлайн книгу Algorithms to Live By: The Computer Science of Human Decisions - Brian Christian страница 15
From 1982 to 1984, Bartlett and his colleagues at the University of Michigan performed a study on newborns with respiratory failure. The team was clear that they wanted to address, as they put it, “the ethical issue of withholding an unproven but potentially lifesaving treatment,” and were “reluctant to withhold a lifesaving treatment from alternate patients simply to meet conventional random assignment technique.” Hence they turned to Zelen’s algorithm. The strategy resulted in one infant being assigned the “conventional” treatment and dying, and eleven infants in a row being assigned the experimental ECMO treatment, all of them surviving. Between April and November of 1984, after the end of the official study, ten additional infants met the criteria for ECMO treatment. Eight were treated with ECMO, and all eight survived. Two were treated conventionally, and both died.
These are eye-catching numbers, yet shortly after the University of Michigan study on ECMO was completed it became mired in controversy. Having so few patients in a trial receive the conventional treatment deviated significantly from standard methodology, and the procedure itself was highly invasive and potentially risky. After the publication of the paper, Jim Ware, professor of biostatistics at the Harvard School of Public Health, and his medical colleagues examined the data carefully and concluded that they “did not justify routine use of ECMO without further study.” So Ware and his colleagues designed a second clinical trial, still trying to balance the acquisition of knowledge with the effective treatment of patients but using a less radical design. They would randomly assign patients to either ECMO or the conventional treatment until a prespecified number of deaths was observed in one of the groups. Then they would switch all the patients in the study to the more effective treatment of the two.
In the first phase of Ware’s study, four of ten infants receiving conventional treatment died, and all nine of nine infants receiving ECMO survived. The four deaths were enough to trigger a transition to the second phase, where all twenty patients were treated with ECMO and nineteen survived. Ware and colleagues were convinced, concluding that “it is difficult to defend further randomization ethically.”
But some had already concluded this before the Ware study, and were vocal about it. The critics included Don Berry, one of the world’s leading experts on multi-armed bandits. In a comment that was published alongside the Ware study in Statistical Science, Berry wrote that “randomizing patients to non-ECMO therapy as in the Ware study was unethical.… In my view, the Ware study should not have been conducted.”
And yet even the Ware study was not conclusive for all in the medical community. In the 1990s yet another study on ECMO was conducted, enrolling nearly two hundred infants in the United Kingdom. Instead of using adaptive algorithms, this study followed the traditional methods, splitting the infants randomly into two equal groups. The researchers justified the experiment by saying that ECMO’s usefulness “is controversial because of varying interpretation of the available evidence.” As it turned out, the difference between the treatments wasn’t as pronounced in the United Kingdom as it had been in the two American studies, but the results were nonetheless declared “in accord with the earlier preliminary findings that a policy of ECMO support reduces the risk of death.” The cost of that knowledge? Twenty-four more infants died in the “conventional” group than in the group receiving ECMO treatment.
The widespread difficulty with accepting results from adaptive clinical trials might seem incomprehensible. But consider that part of what the advent of statistics did for medicine, at the start of the twentieth century, was to transform it from a field in which doctors had to persuade each other in ad hoc ways about every new treatment into one where they had clear guidelines about what sorts of evidence were and were not persuasive. Changes to accepted standard statistical practice have the potential to upset this balance, at least temporarily.
After the controversy over ECMO, Don Berry moved from the statistics department at the University of Minnesota to the MD Anderson Cancer Center in Houston, where he has used methods developed by studying multi-armed bandits to design clinical trials for a variety of cancer treatments. While he remains one of the more vocal critics of randomized clinical trials, he is by no means the only one. In recent years, the ideas he’s been fighting for are finally beginning to come into the mainstream. In 2010 and 2015, the FDA released a pair of draft “guidance” documents on “Adaptive Design” clinical trials for drugs and medical devices, which suggests—despite a long history of sticking to an option they trust—that they might at last be willing to explore alternatives.
The Restless World
Once you become familiar with them, it’s easy to see multi-armed bandits just about everywhere we turn. It’s rare that we make an isolated decision, where the outcome doesn’t provide us with any information that we’ll use to make other decisions in the future. So it’s natural to ask, as we did with optimal stopping, how well people generally tend to solve these problems—a question that has been extensively explored in the laboratory by psychologists and behavioral economists.
In general, it seems that people tend to over-explore—to favor the new disproportionately over the best. In a simple demonstration of this phenomenon, published in 1966, Amos Tversky and Ward Edwards conducted experiments where people were shown a box with two lights on it and told that each light would turn on a fixed (but unknown) percentage of the time. They were then given 1,000 opportunities either to observe which light came on, or to place a bet on the outcome without getting to observe it. (Unlike a more traditional bandit problem setup, here one could not make a “pull” that was both wager and observation at once; participants would not learn whether their bets had paid off until the end.) This is pure exploration vs. exploitation, pitting the gaining of information squarely against the use of it. For the most part, people adopted a sensible strategy of observing for a while, then placing bets on what seemed like the best outcome—but they consistently spent a lot more time observing than they should have. How much more time? In one experiment, one light came on 60% of the time and the other 40% of the time, a difference neither particularly blatant nor particularly subtle. In that case, people chose to observe 505 times, on average, placing bets the other 495 times. But the math says they should have started to bet after just 38 observations—leaving 962 chances to cash in.
Other studies have produced similar conclusions. In the 1990s, Robert Meyer and Yong Shi, researchers at Wharton, ran a study where people were given a choice between two options, one with a known payoff chance and one unknown—specifically two airlines, an established carrier with a known on-time rate and a new company without a track record yet. Given the goal of maximizing the number of on-time arrivals over some period of time, the mathematically optimal strategy is to initially only fly the new airline, as long as the established one isn’t clearly better. If at any point it’s apparent that the well-known carrier is better—that is, if the Gittins index of the new option falls below the on-time rate of the familiar carrier—then you should switch hard to the familiar one and never look back. (Since in this setup you can’t get any more information about the new company once you stop flying it, there is no opportunity for it to redeem itself.) But in the experiment, people tended to use the untried airline too little when it was good and too much when it was bad. They also didn’t make clean breaks away from it, often continuing to alternate, particularly when neither airline was departing on time. All of this is consistent with tending to over-explore.
Finally, psychologists Mark Steyvers, Michael Lee, and E.-J. Wagenmakers have run an experiment with a four-armed bandit, asking a group of people to choose which arm to play over a sequence of fifteen opportunities. They then classified the strategies that