Applied Regression Modeling. Iain Pardoe

Чтение книги онлайн.

Читать онлайн книгу Applied Regression Modeling - Iain Pardoe страница 23

Applied Regression Modeling - Iain Pardoe

Скачать книгу

error happening. The default value of 5% tends to work reasonably well in many applications at balancing both goals. However, other factors also affect the chance of a type 2 error happening for a specific significance level. For example, the chance of a type 2 error tends to decrease the greater the sample size.

      So far, we have focused on estimating a univariate population mean, images, and quantifying our uncertainty about the estimate via confidence intervals or hypothesis tests. In this section, we consider a different problem, that of “prediction.” In particular, rather than estimating the mean of a population of images‐values based on a sample, images, consider predicting an individual images‐value picked at random from the population.

      Intuitively, this sounds like a more difficult problem. Imagine that rather than just estimating the mean sale price of single‐family homes in the housing market based on our sample of 30 homes, we have to predict the sale price of an individual single‐family home that has just come onto the market. Presumably, we will be less certain about our prediction than we were about our estimate of the population mean (since it seems likely that we could be further from the truth with our prediction than when we estimated the mean—for example, there is a chance that the new home could be a real bargain or totally overpriced). Statistically speaking, Figure 1.5 illustrates this “extra uncertainty” that arises with prediction—the population distribution of data values, images (more relevant to prediction problems), is much more variable than the sampling distribution of sample means, images (more relevant to mean estimation problems).

      We can tackle prediction problems with a similar process to that of using a confidence interval to tackle estimating a population mean. In particular, we can calculate a prediction interval of the form “point estimate images uncertainty” or “(point estimate images uncertainty, point estimate images uncertainty).” The point estimate is the same one that we used for estimating the population mean, that is, the observed sample mean, images. This is because images is an unbiased estimate of the population mean, images, and we assume that the individual images‐value we are predicting is a member of this population. As discussed in the preceding paragraph, however, the “uncertainty” is larger for prediction intervals than for confidence intervals. To see how much larger, we need to return to the notion of a model that we introduced in Section 1.2.

      We can express the model we have been using to estimate the population mean, images, as

equation

      In particular, write the images‐value to be predicted as images, and decompose this into two pieces as above:

equation

      Then subtract images, which represents potential values of repeated sample means, from both sides of this equation:

      Thus, in estimating the population mean, the only error we have to worry about is estimation error, whereas in predicting an individual images‐value, we have to worry about both estimation error and random error.

      Recall from Section 1.5 that the form of a confidence interval for the population mean is

equation

      The term images in this formula is an estimate of the standard deviation of the sampling distribution of sample means, images, and is called the standard error of estimation. The square of this quantity, images, is the estimated variance of the sampling distribution of sample means, images. Then, thinking of images as some fixed, unknown constant,

Скачать книгу