Business Experiments with R. B. D. McCullough
Чтение книги онлайн.
Читать онлайн книгу Business Experiments with R - B. D. McCullough страница 13
![Business Experiments with R - B. D. McCullough Business Experiments with R - B. D. McCullough](/cover_pre926115.jpg)
1.2.1 Lurking Variables
It is not uncommon for an analyst to reach mistaken conclusions based on observational data that are incorrect due to lurking variables.
1 During WWII, an analysis of the accuracy of strategic bombing runs showed that Allied bombers were more accurate at lower altitudes than at higher altitudes (this makes sense). The analysis also showed that Allied bombers were more accurate when opposed by enemy fighters than when enemy fighters were not present. Explain.
2 A scatterplot shows a strong relationship between the number of firefighters at a fire and the dollar amount of the damage caused by the fire. While this relationship may be predictive, it is not causal: it is not true that if fewer firefighters are sent to a fire, the dollar amount of the damage will decrease. What is the missing causal variable?
3 On a daily basis in a coastal town, there is a positive relationship between ice cream sales and drowning deaths. What is the missing causal variable?
4 The observational data repeatedly say that persons who eat five fruits and veggies per day have a lower cancer rate than those who don't eat fruits and veggies. The experimental results find no difference in cancer rates. Explain the discrepancy.
5 A large, expensive observational study by the National Institutes of Health concluded that hormone replacement therapy (HRT) prevents heart disease in postmenopausal women. Consequently many women were placed on HRT. Later, an experiment showed that HRT does not prevent heart disease in postmenopausal women. Explain the discrepancy.
The resolutions of the above dilemmas are given below:
1 Cloud cover. Planes couldn't fly in the clouds and had to fly above the clouds. If the weather was cloudy, the enemy wouldn't bother to send up fighters, and accuracy was terrible because in that era, bombing depended on sighting landmarks on the ground.
2 There is a third variable in the background – the seriousness of the fire – that is responsible for the observed relationship. More serious fires require more firefighters and also cause more damage.
3 The lurking variable that causes both ice cream sales and an increase in drowning deaths is season of the year, i.e. summer.
4 Of course, persons who eat five fruits and veggies per day are different than those who do not. How, precisely, they are different we do not know. Just because there is a lurking variable does not mean that we can identify it.
5 The women who chose HRT were different from other women in ways for which the observational study could not control. Again, just because we can deduce the existence of a lurking variable does not follow that we can say what the variable is.
The above “experiments” (the word is in quotes because they really aren't experiments) are actually just observational data masquerading as experiments, and the way to see this is to perform a hypothetical thought experiment and think about manipulating one of the variables as it would be manipulated in a true experiment. In the fire example above, imagine there was a fire and firemen had responded, and then we ordered 100 more firemen to show up to the fire. Would we expect there to be more damage simply because more firemen were present? Of course not. As will be seen, designed experiments eliminate the effect of the lurking variables.
Here we mention that many authors conflate the concepts of “lurking variable” and “confounding variable,” treating them as one and the same, but this is a mistake. Though they both make it difficult for the analyst to interpret results, they do so through different mechanisms. A lurking variable affects observational data, while a confounding variable affects experimental data. In this chapter we only encounter lurking variables. In later chapters we will encounter confounding. The “Learning More” section for this chapter describes the differences in detail.
1.2.2 Sample Selection Bias
Sample selection bias plagues nonexperimental, i.e. observational, data. Its effects are especially pernicious when selection is based on the dependent variable (the effects are not so bad when selection is based on an independent variable). To motivate this important idea, we generated some linear data with a zero intercept and a slope of unity.
Try it!
Use the data in the file SampleSelection.csv
to repeat the above analysis by running the regression for the full sample and again only for those observations for which
These results are consistent with the true intercept of zero and the true slope of unity. Suppose that we only get to observe observations when