Читать онлайн книгу - Applied Regression Modeling. Iain Pardoe. Математика. LiveLib

Новинки Лучшее Рекомендации

Информация о книге:

Название:

Автор:

Жанр:

Серия:

Издательство:

Applied Regression Modeling - Iain Pardoe

Скачать книгу

alt="images"/> and

). The normal density curve is sometimes called the “bell curve” since its shape resembles that of a bell. It is a slightly odd bell, however, since its sides never quite reach the ground (although the ends of the curve in Figure 1.3 are quite close to zero on the vertical axis, they would never actually quite reach there, even if the graph were extended a very long way on either side).

Graph depicts the standard normal density curve together with a shaded area of 0.475 between a=0 and b=1.96, which represents the probability that a standard normal random variable lies between 0 and 1.96.

Figure 1.3 Standard normal density curve together with a shaded area of between and , which represents the probability that a standard normal random variable lies between and .

The key feature of the normal density curve that allows us to make statistical inferences is that areas under the curve represent probabilities. The entire area under the curve is one, while the area under the curve between one point on the horizontal axis (, say) and another point (, say) represents the probability that a random variable that follows a standard normal distribution is between and . So, for example, Figure 1.3 shows there is a probability of 0.475 that a standard normal random variable lies between and , since the area under the curve between and is 0.475.

We can obtain values for these areas or probabilities from a variety of sources: tables of numbers, calculators, spreadsheet or statistical software, Internet websites, and so on. In this book, we print only a few select values since most of the later calculations use a generalization of the normal distribution called the “t‐distribution.” Also, rather than areas such as that shaded in Figure 1.3, it will become more useful to consider “tail areas” (e.g., to the right of point ), and so for consistency with later tables of numbers, the following table allows calculation of such tail areas: Normal distribution probabilities (tail areas) and percentiles (horizontal axis values)

Upper‐tail area	0.1	0.05	0.025	0.01	0.005	0.001
Horizontal axis value	1.282	1.645	1.960	2.326	2.576	3.090
Two‐tail area	0.2	0.1	0.05	0.02	0.01	0.002

In particular, the upper‐tail area to the right of 1.960 is 0.025; this is equivalent to saying that the area between 0 and 1.960 is 0.475 (since the entire area under the curve is 1 and the area to the right of 0 is 0.5). Similarly, the two‐tail area, which is the sum of the areas to the right of 1.960 and to the left of −1.960, is two times 0.025, or 0.05.

How does all this help us to make statistical inferences about populations such as that in our home prices example? The essential idea is that we fit a normal distribution model to our sample data and then use this model to make inferences about the corresponding population. For example, we can use probability calculations for a normal distribution (as shown in Figure 1.3) to make probability statements about a population modeled using that normal distribution—we will show exactly how to do this in Section 1.3. Before we do that, however, we pause to consider an aspect of this inferential sequence that can make or break the process. Does the model provide a close enough approximation to the pattern of sample values that we can be confident the model adequately represents the population values? The better the approximation, the more reliable our inferential statements will be.

We saw in Figure 1.2 how a density curve can be thought of as a histogram with a very large sample size. So one way to assess whether our population follows a normal distribution model is to construct a histogram from our sample data and visually determine whether it “looks normal,” that is, approximately symmetric and bell‐shaped. This is a somewhat subjective decision, but with experience you should find that it becomes easier to discern clearly nonnormal histograms from those that are reasonably normal. For example, while the histogram in Figure 1.2 clearly looks like a normal density curve, the normality of the histogram of 30 sample sale prices in Figure 1.1 is less certain. A reasonable conclusion in this case would be that while this sample histogram is not perfectly symmetric and bell‐shaped, it is close enough that the corresponding (hypothetical) population histogram could well be normal.

An alternative way to assess normality is to construct a QQ‐plot (quantile–quantile plot), also known as a normal probability plot, as shown in Figure 1.4 (see computer help #22 in the software information files available from the book website). If the points in the QQ‐plot lie close to the diagonal line, then the corresponding population values could well be normal. If the points generally lie far from the line, then normality is in question. Again, this is a somewhat subjective decision that becomes easier to make with experience. In this case, given the fairly small sample size, the points are probably close enough to the line that it is reasonable to conclude that the population values could be normal.

Graph depicts the QQ-plot for the home prices example.

Figure 1.4 QQ‐plot for the home prices example.

There are also a variety of quantitative methods for assessing normality—brief details and references are provided in

Скачать книгу

Applied Regression Modeling. Iain Pardoe

Чтение книги онлайн.

Читать онлайн книгу Applied Regression Modeling - Iain Pardoe страница 13

Информация о книге: