Statistical Analysis with Excel For Dummies. Joseph Schmuller
Чтение книги онлайн.
Читать онлайн книгу Statistical Analysis with Excel For Dummies - Joseph Schmuller страница 16
So, what is probability? The best way to attack this is with a few examples. If you toss a coin, what's the probability that it comes up heads? Intuitively, you know that if the coin is fair, you have a 50-50 chance of heads and a 50-50 chance of tails. In terms of the kinds of numbers associated with probability, that’s ½.
How about rolling a die? (That’s one member of a pair of dice.) What’s the probability that you roll a 3? Hmm… . A die has six faces and one of them is 3, so that ought to be 1⁄6, right? Right.
Here’s one more. You have a standard deck of playing cards. You select one card at random. What’s the probability that it’s a club? Well, a deck of cards has four suits, so that answer is ¼.
I think you’re getting the picture. If you want to know the probability that an event occurs, figure out how many ways that event can happen and divide by the total number of events that can happen. In each of the three examples, the event we’re interested in (heads, 3, or club) happens only one way.
Things can get a bit more complicated. When you toss a die, what’s the probability you roll a 3 or a 4? Now you're talking about two ways the event you're interested in can occur, so that's
. What about the probability of rolling an even number? That has to be 2, 4, or 6, and the probability is .On to another kind of probability question. Suppose you roll a die and toss a coin at the same time. What's the probability you roll a 3 and the coin comes up heads? Consider all the possible events that can occur when you roll a die and toss a coin at the same time. The outcome can be a head and 1-6 or a tail and 1-6. That's a total of 12 possibilities. The head-and-3 combination can happen only one way, so the answer is
.In general, the formula for the probability that a particular event occurs is
I begin this section by saying that statisticians express their confidence about their decisions in terms of probability, which is really why I brought up this topic in the first place. This line of thinking leads me to conditional probability — the probability that an event occurs given that some other event occurs. For example, suppose I roll a die, take a look at it (so that you can't see it), and tell you I’ve rolled an even number. What’s the probability that I've rolled a 2? Ordinarily, the probability of a 2 is 1⁄6, but I’ve narrowed the field. I’ve eliminated the three odd numbers (1, 3, and 5) as possibilities. In this case, only the three even numbers (2, 4, and 6) are possible, so now the probability of rolling a 2 is 1⁄3.
Exactly how does conditional probability play into statistical analysis? Read on.
Inferential Statistics: Testing Hypotheses
In advance of doing a study, a statistician draws up a tentative explanation — a hypothesis — of why the data might come out a certain way. After the study is complete and the sample data are all tabulated, the statistician faces the essential decision every statistician has to make: whether or not to reject the hypothesis.
That decision is wrapped in a conditional probability question — what’s the probability of obtaining the sample data, given that this hypothesis is correct? Statistical analysis provides tools to calculate the probability. If the probability turns out to be low, the statistician rejects the hypothesis.
Suppose you’re interested in whether or not a particular coin is fair — whether it has an equal chance of coming up heads or tails. To study this issue, you'd take the coin and toss it a number of times — say, 100. These 100 tosses make up your sample data. Starting from the hypothesis that the coin is fair, you'd expect that the data in your sample of 100 tosses would show around 50 heads and 50 tails.
If it turns out to be 99 heads and 1 tail, you’d undoubtedly reject the fair coin hypothesis. Why? The conditional probability of getting 99 heads and 1 tail given a fair coin is very low. Wait a second. The coin could still be fair and you just happened to get a 99-1 split, right? Absolutely. In fact, you never really know. You have to gather the sample data (the results from 100 tosses) and make a decision. Your decision might be right, or it might not.
Juries face this dilemma all the time. They have to decide among competing hypotheses that explain the evidence in a trial. (Think of the evidence as data.) One hypothesis is that the defendant is guilty. The other is that the defendant is not guilty. Jury members have to consider the evidence and, in effect, answer a conditional probability question: What’s the probability of the evidence given that the defendant is not guilty? The answer to this question determines the verdict.
Null and alternative hypotheses
Consider once again the coin tossing study I mention in the preceding section. The sample data are the results from the 100 tosses. Before tossing the coin, you might start with the hypothesis that the coin is a fair one so that you expect an equal number of heads and tails. This starting point is called the null hypothesis. The statistical notation for the null hypothesis is H0. According to this hypothesis, any heads-tails split in the data is consistent with a fair coin. Think of it as the idea that nothing in the results of the study is out of the ordinary.
An alternative hypothesis is possible: The coin isn’t a fair one, and it's loaded to produce an unequal number of heads and tails. This hypothesis says that any heads-tails split is consistent with an unfair coin. The alternative hypothesis is called, believe it or not, the alternative hypothesis. The statistical notation for the alternative hypothesis is H1.
With the hypotheses in place, toss the coin 100 times and note the number of heads and tails. If the results are something like 90 heads and 10 tails, it's a good idea to reject H0. If the results are around 50 heads and 50 tails, don't reject H0. Similar ideas apply to the reading speed example I give earlier, in the section “Samples and populations.” One sample of children receives reading instruction under a new method designed to increase reading speed, and the other learns via a traditional method. Measure the children's reading speeds before and after instruction and tabulate the improvement for each child. The null hypothesis, H0, is that one method isn't different from the other. If the improvements are greater with the new method than with the traditional method — so much greater that it's unlikely that the methods aren't different from one another — reject H0. If they're not greater, don't reject H0.
Notice that I did not say “accept H0.” The way the logic works, you never accept a hypothesis. You either reject H0 or don't reject H0.
Here’s a real-world example to help you understand this idea. Whenever a defendant goes on trial, that person is presumed innocent until proven guilty. Think of innocent as H0. The prosecutor’s job is to convince the jury to reject H0. If the jurors reject, the verdict is guilty.