Business Experiments with R. B. D. McCullough
Чтение книги онлайн.
Читать онлайн книгу Business Experiments with R - B. D. McCullough страница 21
![Business Experiments with R - B. D. McCullough Business Experiments with R - B. D. McCullough](/cover_pre926115.jpg)
2 1.7.2 What are the four key steps in any experiment?
3 1.7.3 Ice cream sales are positively correlated with shark attacks in the Eastern United States. What is the lurking variable?
4 1.7.4 What is necessary to show causation?
1.8 Learning More
In this chapter we have twice quoted Stefan Thomke's book, Experimentation Works: The Surprising Power of Business Experiments, and we highly recommend it as an introduction to how experimental methods are used in business. It describes many, many business experiments and how to adopt an experimentation culture in a business. Thomke, a professor at Harvard Business School, has been conducting research in this area for decades. His last book, in 2003, was entitled Experimentation Matters.
Section 1.1 “Life Expectancy and Newspapers”
• The life expectancy example is based on Zaman (2010). The associated data are from the World Bank Indicator Tables for the year 2010, including only all observations that have no missing values.
• The smoother used by R in Figure 1.1 is called “lowess,” which stands for LOcally WEighted Scatterplot Smoother. It's used to detect nonlinearities in a scatterplot.
• The classic reference on eliciting causal information from observational data is Rosenbaum (2010), but it is a technical book written for people who have taken advanced courses in statistics. A much more accessible book that covers both experimental and observational methods is Rosenbaum (2017), which is filled with words and has practically no equations; someone who wants to understand the topic and who hasn't taken advanced courses in statistics can do no better than this book. It also contains a very good example of how observational data can be used to inform causal questions; see the section entitled “A Simple Example: Does a Parent's Occupation Put Children at Risk?” (pp. 119‐124).
• Looking only at correlations (or lack thereof) can make it impossible to uncover causal relations. Suppose a car travels at a constant speed over hilly roads. The driver will have to accelerate on the inclines and brake on the declines to maintain a constant speed. A person who knows nothing of automobiles might observe these data and conclude that depressing the accelerator or the brake has nothing to do with what speed the car travels.
Section 1.2 “Case: Credit Card Defaults”
• The credit card data set is the “default of credit card clients Data Set” from https://archive.ics.uci.edu/ml/index.html. For the education variable, numbers 4, 5, and 6 were converted to “other,” similarly for the marriage variable values 0 and 4.
• It is not a good idea to run a linear regression with the variable default on the left‐hand side because default is binary (takes on only the values zero and one) and linear regression is for continuous dependent variables. There is a special method for binary dependent variables called “logistic regression,” but that's something for an advanced statistics course.
• The idea of the garden of forking paths is discussed clearly and nontechnically in Gelman and Loken (2014), which article was included in Best Math Writing of 2015; although nontechnical, it's an excellent read for the statistically inclined person, too.
• In general, lurking variables affect observational data and confounding variables affect designed experiments. A lurking variable connects two otherwise unconnected variables, creating the appearance of a causal relation between two other variables. Consider the firefighter example, where the number of firefighters is highly correlated with the damage caused by the fire. Adding more firefighters doesn't increase the amount of damage (the variables are really unconnected). Rather, the lurking variable “intensity of the fire” connects them. A lurking variable (say,
• Variables are confounded when we cannot separate their respective effects on the response. A confounding variable
Confounding can occur in a poorly designed experiment. Suppose you wish to determine the effects of fees and interest rates on credit card use. Suppose you offer a low fee and low interest rate to one group and a high fee and a high interest rate to the other group. The first group will have more credit card use and the second group less use, but you won't be able to tell whether the low fee or the low rate caused more use in the first group or whether the high fee and the high interest rate caused less use in the second group. The rate and the fee are confounded.
On the other hand, we could isolate the effect of rate by offering low fee and high rate to the first group and low fee and low rate to the second group. In more advanced designs, we sometimes will have many effects and be unable to isolate them all. In such a situation, we will deliberately confound the effects that we don't care about so much so that we can isolate the effects that we do care about. We will address this in Chapter 8.
Section 1.3 “Case: Salk Polio Vaccine”
• A layman's overview of the Salk trials is given in Meier (1989).
• The source for the polio data is http://www.post-polio.org/ir-usa.html. The source for the “under 18” US population is https://www.census.gov/data/tables/time-series/demo/popest/pre-1980-national.html. The source for the US population data is US Current Population Reports Series P25.
Section 1.4 “What Is a Business Experiment?”
• The financial services example is based on Watson‐Hemphill and Kastle (2012).
• The Progressive Insurance example comes from Chapter 22 of Holland and Cochran (2005).