Applied Univariate, Bivariate, and Multivariate Statistics Using Python. Daniel J. Denis
Чтение книги онлайн.
Читать онлайн книгу Applied Univariate, Bivariate, and Multivariate Statistics Using Python - Daniel J. Denis страница 13
The decision in 2020 to essentially shut down the US economy in the month of March due to the spread of COVID-19. Was it a good decision? The decision should not be evaluated based on the outcome of the spread or the degree to which it affected people’s lives. The decision should be evaluated on the principles and logic that went into the decision beforehand. Whether a lucky outcome or not was achieved is a different process to the actual decision that was made. Likewise, the decision to purchase a stock then lose all of one’s investment cannot be based on the outcome of oil dropping to negative numbers during the pandemic. It must be instead evaluated on the decision-making criteria that went into the decision. You may have purchased a great stock prior to the pandemic, but got an extremely unlucky and improbable outcome when the oil crash hit.
The launch of SpaceX in May of 2020, returning Americans to space. On the day of the launch, there was a slight chance of lightning in the area, but the risk was low enough to go ahead with the launch. Had lightning occurred and it adversely affected the mission, it would not have somehow meant a poor decision was made. What it would have indicated above all else is that an unlucky outcome occurred. There is always a measure of risk tolerance in any event such as this. The goal in decision-making is generally to calibrate such risk and minimize it to an acceptable and sufficient degree.
1.3 Quantifying Error Rates in Decision-Making: Type I and Type II Errors
As discussed thus far, decision-making is risky business. Virtually all decisions are made with at least some degree of risk of being wrong. How that risk is distributed and calibrated, and the costs of making the wrong decision, are the components that must be considered before making the decision. For example, again with the coin, if we start out assuming the coin is fair (null hypothesis), then reject that hypothesis after obtaining a large number of heads out of 100 flips, though the decision is logical, reality itself may not agree with our decision. That is, the coin may, in reality, be fair. We simply observed a string of heads that may simply be due to chance fluctuation. Now, how are we ever to know if the coin is fair or not? That’s a difficult question, since according to frequentist probabilists, we would literally need to flip the coin forever to get the true probability of heads. Since we cannot study an infinite population of coin flips, we are always restricted on betting based on the sample, and hoping our bet gets us a lucky outcome.
What may be most surprising to those unfamiliar with statistical inference, is that quite remarkably, statistical inference in science operates on the same philosophical principles as games of chance in Vegas! Science is a gamble and all decisions have error rates. Again, consider the idea of a potential treatment being advanced for COVID-19 in 2020, the year of the pandemic. Does the treatment work? We hope so, but if it does not, what are the risks of it not working? With every decision, there are error rates, and error rates also imply potential opportunity costs. Good decisions are made with an awareness of the benefits of being correct or the costs of being wrong. Beyond that, we roll the proverbial dice and see what happens.
If we set up a null hypothesis, then reject it, we risk a false rejection of the null. That is, maybe in truth the null hypothesis should not have been rejected. This type of error, a false rejection of the null, is what is known as a type I error. The probability of making a type I error is typically set at whatever the level of significance is for the statistical test. Scientists usually like to limit the type I error rate, keeping it at a nominal level such as 0.05. This is the infamous p < 0.05 level. However, this is an arbitrarily set level and there is absolutely no logic or reason to be setting it at 0.05 for every experiment you run. How the level is set should be governed by, you guessed it, your tolerance for risk of making a wrong decision. However, why is minimizing type I error rates usually preferred? Consider the COVID-19 treatment. If the null hypothesis is that it does not work, and we reject that null hypothesis, we probably want a relatively small chance of being wrong. That is, you probably do not want to be taking medication that is promised to work when it does not and nor does the scientific community want to fill their publication space with presumed treatments that in actuality are not effective. Hence, we usually wish to keep type I error rates quite low. It was R.A. Fisher, pioneer of modern-day NHST, who suggested 0.05 as a convenient level of significance. Scientists, afterward, adopted it as “gospel” without giving it further thought (Denis, 2004). As historians of statistics have argued, adopting “p < 0.05” was more of a social and historical phenomenon than a rational and scientific one.
However, error rates go both ways. Researchers often wish to minimize the risk of a type I error, often ignoring the type II error rate. A type II error is failing to reject a false null hypothesis. For our COVID-19 example, this would essentially mean failing to detect that a treatment is effective when in fact it is effective and could potentially save lives. If in reality the null hypothesis is false, yet through our statistical test we fail to detect its falsity, then we could potentially be missing out on a treatment that is effective. So-called “experimental treatments” for a disease (i.e. the “right to try”) are often well-attuned to the risk of making type II errors. That is, the risk of not acting, even on something that has a relatively small probability of working out, may be high, because if it does work out, then the benefits could be substantial.
1.4 Estimation of Parameters
As has undoubtedly become clear at this point, statistical inference is about estimating parameters. If we regularly dealt with population data, then we would have no need for estimation. We would already know the parameters of the population and could simply describe features of the population using, aptly named, descriptive statistics. Referring again to our COVID-19 example, if researchers actually knew the true proportion of those suffering from the virus in the population, then we would know the parameter and could simply describe it via the population proportion. However, as discussed, we rarely if ever know the true parameters, for the reason that our populations are usually quite large and in some cases, as with the coin, may actually be infinite in size. Hence, since we typically do not know the actual population parameters, we have to resort to using inferential statistics to estimate them. That is, we compute something on a sample and use that as an estimate of the population parameter. It should be remarked that the distinction between descriptive vs. inferential does not entail them to be mutually exclusive. When we compute a statistic on a sample, we can call that both a descriptive statistic as well as an inferential one, so long as we are using it for both purposes. Hence, we may compute the proportion of cases suffering from COVID-19 to be 0.01 in our sample, refer to that as a descriptive statistic because we are “describing the sample,” yet when we use that statistic to infer the true proportion in the population, refer to it as an inferential statistic. Hence, it is best not to get too stuck on the meaning of “descriptive” in this case. An inferential statistic, however, typically always implies we are making an educated guess or inference toward