Читать онлайн книгу - Applied Univariate, Bivariate, and Multivariate Statistics. Daniel J. Denis. Математика. LiveLib

Новинки Лучшее Рекомендации

Информация о книге:

Название:

Автор:

Жанр:

Серия:

Издательство:

Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis

Скачать книгу

2.2. The area we are interested in is that at or above 2.5 (the area where the arrow is pointing). Since we know the area under the normal density is equal to 1, we can subtract pnorm(2.5, 0, 1) from 1:

> 1-pnorm(2.5, 0, 1) [1] 0.006209665

Graph depicts the shaded area under the standard normal distribution at a z-score of up to 2.5 standard deviations.

Figure 2.2 Shaded area under the standard normal distribution at a z‐score of up to 2.5 standard deviations.

We can see then the percentage of students scoring higher than Mary is in the margin of approximately 0.6% (i.e., multiply the proportion by 100). What proportion of students scored better than John in his class? Recall that his z‐score was equal to −2.5. Because we know the normal distribution is symmetric, we already know the area lying below −2.5 is the same as that lying above 2.5. This means that approximately 99.38% of students scored higher than John. Hence, we see that Mary drastically outperformed her colleague when we consider their scores relative to their classes. Be careful to note that in drawing these conclusions, we had to assume each score (that of John's and Mary's) came from a normal distribution. The mere fact that we transformed their raw scores to z‐scores in no way normalizes their raw distributions. Standardization standardizes, but it does not normalize.

One can also easily verify that approximately 68% of cases in a normal distribution lie within −1 and +1 standard deviations, while approximately 95% of cases lie within −2 and +2 standard deviations.

2.1.1 Plotting Normal Distributions

We can plot normal densities in R by simply requesting the lower and upper limit on the abscissa:

> x <- seq(from = -3, to = +3, length.out = 100) > plot(x, dnorm(x))

Distributions (and densities) of a single variable typically go by the name of univariate distributions to distinguish them from distributions of two (bivariate) or more variables (multivariate).

For example, we consider some of Galton's data on parent and child heights (the height of the children were measured when they were adults, not actual toddlers). Some of Galton's data appears below, retrieved from the HistData package (Friendly, 2014) in R:

> install.packages(“HistData”) > library(HistData) > attach(Galton) > Galton parent child 1 70.5 61.7 2 68.5 61.7 3 65.5 61.7 4 64.5 61.7 5 64.0 61.7 6 67.5 62.2 7 67.5 62.2 8 67.5 62.2 9 66.5 62.2 10 66.5 62.2

We first install the package using the install.packages function. The library statement loads the package HistData into R's search path. From there, we attach the Galton data to insert the object (dataframe) into the search list. We generate a histogram of parent height:

> hist(parent, main = "Histogram of Parent Height")

One can overlay a normal density over an empirical plot to show how closely observed data match that of a theoretical normal distribution, as was done by Fisher in 1925 displaying a distribution of the heights of 1375 women (see Figure 2.3, taken from Classics in the History of Psychology¹). R.A. Fisher is usually regarded as the father of modern statistics and among his greatest contributions was the publication of Statistical Methods for Research Workers in 1925 in which he discussed such topics as tests of significance, correlation coefficients, and the analysis of variance.

We can see that the normal density serves as a close, and very convenient, approximation to empirical data. Indeed, the normal density has figured prominent in the history of statistics largely because it serves as a useful model for many phenomena, and also because it provides a very convenient starting point for much work in theoretical statistics. Oftentimes the assumption of normality will be invoked in a derivation because it makes the problem simpler and easier to solve.

2.1.2 Binomial Distributions

The binomial distribution is given by:

where,

p(r) is the probability of observing r occurrences out of n possible occurrences,²

p is the probability of a “success” on any given trial, and

1 − p is the probability of a failure on any given trial, often simply referred to by “q” (i.e., q = 1 − p).

The binomial setting provides an ideal context to demonstrate the essentials of hypothesis‐testing logic, as we will soon see. In a binomial setting, the following conditions must hold:

The variable under study must be binary in nature. That is, the outcome of the experiment can result in only one category or another. That is, the outcome categories are mutually exclusive. For instance, the flipping of a coin has this characteristic, because the coin can either come up “head” or “tail” and nothing else (yes, we are ruling out the possibility that it lands on its side, and I think it is safe to do so).

The probability of a “success” on each trial remains constant (or stationary) from trial to trial. For example, if the probability of head is equal to 0.5 on our first flip, we assume it is also equal to 0.5 on the second, third, fourth flips, and so on.

Each trial is independent of each other trial. That is, the fact that we get a head on our first flip of the coin in no way changes the probability of getting a head or tail on the next flip, and so on for the other flips (i.e., no outcome is ever “due” to occur, as the gambler sometimes believes).

We can easily demonstrate hypothesis testing in a binomial setting using R. For instance, let us return to the coin‐flipping experiment. Suppose you would like to know the probability of obtaining two heads on five flips of a fair coin, where each flip is assumed to have a probability of heads equal to 0.5. In R, we can compute this as follows:

> dbinom(2, size = 5, prob = 0.5) [1] 0.3125

Figure 2.3 Fisher's overlay of normal density on empirical observations.

Source: Fisher (1925, 1934).

where dbinom calls the “density for the binomial,”

Скачать книгу

Applied Univariate, Bivariate, and Multivariate Statistics. Daniel J. Denis

Чтение книги онлайн.

Читать онлайн книгу Applied Univariate, Bivariate, and Multivariate Statistics - Daniel J. Denis страница 23

Информация о книге:

2.1.1 Plotting Normal Distributions

2.1.2 Binomial Distributions