Business Experiments with R. B. D. McCullough

Чтение книги онлайн.

Читать онлайн книгу Business Experiments with R - B. D. McCullough страница 21

Business Experiments with R - B. D. McCullough

Скачать книгу

sampleobservational data / experimental data

      2 1.7.2 What are the four key steps in any experiment?

      3 1.7.3 Ice cream sales are positively correlated with shark attacks in the Eastern United States. What is the lurking variable?

      4 1.7.4 What is necessary to show causation?

      In this chapter we have twice quoted Stefan Thomke's book, Experimentation Works: The Surprising Power of Business Experiments, and we highly recommend it as an introduction to how experimental methods are used in business. It describes many, many business experiments and how to adopt an experimentation culture in a business. Thomke, a professor at Harvard Business School, has been conducting research in this area for decades. His last book, in 2003, was entitled Experimentation Matters.

      Section 1.1 “Life Expectancy and Newspapers”

      • The life expectancy example is based on Zaman (2010). The associated data are from the World Bank Indicator Tables for the year 2010, including only all observations that have no missing values.

      • The smoother used by R in Figure 1.1 is called “lowess,” which stands for LOcally WEighted Scatterplot Smoother. It's used to detect nonlinearities in a scatterplot.

      • Looking only at correlations (or lack thereof) can make it impossible to uncover causal relations. Suppose a car travels at a constant speed over hilly roads. The driver will have to accelerate on the inclines and brake on the declines to maintain a constant speed. A person who knows nothing of automobiles might observe these data and conclude that depressing the accelerator or the brake has nothing to do with what speed the car travels.

      Section 1.2 “Case: Credit Card Defaults”

      • The credit card data set is the “default of credit card clients Data Set” from https://archive.ics.uci.edu/ml/index.html. For the education variable, numbers 4, 5, and 6 were converted to “other,” similarly for the marriage variable values 0 and 4.

      • It is not a good idea to run a linear regression with the variable default on the left‐hand side because default is binary (takes on only the values zero and one) and linear regression is for continuous dependent variables. There is a special method for binary dependent variables called “logistic regression,” but that's something for an advanced statistics course.

      • The idea of the garden of forking paths is discussed clearly and nontechnically in Gelman and Loken (2014), which article was included in Best Math Writing of 2015; although nontechnical, it's an excellent read for the statistically inclined person, too.

      • In general, lurking variables affect observational data and confounding variables affect designed experiments. A lurking variable connects two otherwise unconnected variables, creating the appearance of a causal relation between two other variables. Consider the firefighter example, where the number of firefighters is highly correlated with the damage caused by the fire. Adding more firefighters doesn't increase the amount of damage (the variables are really unconnected). Rather, the lurking variable “intensity of the fire” connects them. A lurking variable (say, images) creates the illusion of a causal relationship between two other variables, images and images. A good article on how to detect lurking variables is Joiner (1981).

      • Variables are confounded when we cannot separate their respective effects on the response. A confounding variable images has an effect on the response images, but another variable images also has an effect on images, and we are unable to separate the effects of images and images. For example, images might be store sales and images is a store promotion, while images is bad weather. We cannot determine the true effect of the promotion on sales because it is confounded with the weather.

      On the other hand, we could isolate the effect of rate by offering low fee and high rate to the first group and low fee and low rate to the second group. In more advanced designs, we sometimes will have many effects and be unable to isolate them all. In such a situation, we will deliberately confound the effects that we don't care about so much so that we can isolate the effects that we do care about. We will address this in Chapter 8.

      Section 1.3 “Case: Salk Polio Vaccine”

      • A layman's overview of the Salk trials is given in Meier (1989).

      • The source for the polio data is http://www.post-polio.org/ir-usa.html. The source for the “under 18” US population is https://www.census.gov/data/tables/time-series/demo/popest/pre-1980-national.html. The source for the US population data is US Current Population Reports Series P25.

      Section 1.4 “What Is a Business Experiment?”

      • The financial services example is based on Watson‐Hemphill and Kastle (2012).

      • The Progressive Insurance example comes from Chapter 22 of Holland and Cochran (2005).

Скачать книгу