Applied Univariate, Bivariate, and Multivariate Statistics Using Python. Daniel J. Denis
Чтение книги онлайн.
Читать онлайн книгу Applied Univariate, Bivariate, and Multivariate Statistics Using Python - Daniel J. Denis страница 12
What is the probability of observing a difference such as we have observed in our sample if the true difference in the population is equal to 0?
The above is the key question that repeats itself in one form or another in virtually every evaluation of a null hypothesis. That is, state a value for a parameter, then evaluate the probability of the sample result obtained in light of the null hypothesis. You might see where the argument goes from here. If the probability of the sample result is relatively high under the null, then we have no reason to reject the null hypothesis in favor of the statistical alternative. However, if the probability of the sample result is low under the null, then we take this as evidence that the null hypothesis may be false. We do not know if it is false, but we reject it because of the implausibility of the data in light of it. A rejection of the null hypothesis does not necessarily mean the null is false. What it does mean is that we will act as though it is false or potentially make scientific decisions based on its presumed falsity. Whether it is actually false or not usually remains an unknown in many cases.
For our example, if the number of people surviving in each group in our sample were equal to 50 spot on, then we definitely would not have evidence to reject the null hypothesis. Why not? Because a sample result of 50 and 50 lines up exactly with what we would expect under the null hypothesis. That is, it lines up perfectly with expectation under the null model. However, if the numbers turned up as they did earlier, 50 vs. 20, and we found the probability of this result to be rather small under the null, then it could be taken as evidence to possibly reject the null hypothesis and infer the alternative that the survival rates in each group are not the same. This is where the substantive or research alternative hypothesis comes in. Why were the survival rates found to be different? For our example, this is an easy one. If we did our experiment properly, it is hopefully due to the treatment. However, had we not performed a rigorous experimental design, then concluding the substantive or research hypothesis becomes much more difficult. That is, simply because you are able to reject a null hypothesis does not in itself lend credit to the substantive alternative hypothesis of your wishes and dreams. The substantive alternative hypothesis should naturally drop out or be a natural consequence of the rigorous approach and controls implemented for the experiment. If it does not, then drawing a substantive conclusion becomes very much more difficult if not impossible. This is one reason why drawing conclusions from correlational research can be exceedingly difficult, if not impossible. If you do not have a bullet-proof experimental design, then logically it becomes nearly impossible to know why the null was rejected. Even if you have a strong experimental design such conclusions are difficult under the best of circumstances, so if you do not have this level of rigor, you are in hot water when it comes to drawing strong conclusions. Many published research papers feature very little scientific support for purported scientific claims simply based on a rejection of a null hypothesis. This is due to many researchers not understanding or appreciating what a rejection of the null means (and what it does not mean). As we will discuss later in the book, rejecting a null hypothesis is, usually, and by itself, no big deal at all.
1.2 Statistics and Decision-Making
We have discussed thus far that a null hypothesis is typically rejected when the probability of observed data in the sample is relatively small under the posited null. For instance, with a simple example of 100 flips of a presumably fair coin, we would for certain reject the null hypothesis of fairness if we observed, for example, 98 heads. That is, the probability of observing 98 heads on 100 flips of a fair coin is very small. However, when we reject the null, we could be wrong. That is, rejecting fairness could be a mistake. Now, there is a very important distinction to make here. Rejecting the null hypothesis itself in this situation is likely to be a good decision. We have every reason to reject it based on the number of heads out of 100 flips. Obtaining 98 heads is more than enough statistical evidence in the sample to reject the null. However, as mentioned, a rejection of the null hypothesis does not necessarily mean the null hypothesis is false. All we have done is reject it. In other words, it is entirely possible that the coin is fair, but we simply observed an unlikely result. This is the problem with statistical inference, and that is, there is always a chance of being wrong in our decision to reject a null hypothesis and infer an alternative. That does not mean the rejection itself was wrong. It means simply that our decision may not turn out to be in our favor. In other words, we may not get a “lucky outcome.” We have to live with that risk of being wrong if we are to make virtually any decisions (such as leaving the house and crossing the street or going shopping during a pandemic).
The above is an extremely important distinction and cannot be emphasized enough. Many times, researchers (and others, especially media) evaluate decisions based not on the logic that went into them, but rather on outcomes. This is a philosophically faulty way of assessing the goodness of a decision, however. The goodness of the decision should be based on whether it was made based on solid and efficient decision-making principles that a rational agent would make under similar circumstances, not whether the outcome happened to accord with what we hoped to see. Again, sometimes we experience lucky outcomes, sometimes we do not, even when our decision-making criteria is “spot on” in both cases. This is what the art of decision-making is all about. The following are some examples of popular decision-making events and the actual outcome of the given decision:
The Iraq war beginning in 2003. Politics aside, a motivator for invasion was presumably whether or not Saddam Hussein possessed weapons of mass destruction. We know now that he apparently did not, and hence many have argued that