estimated variance of the estimation error, , in expression (1.1).
The estimated variance of the random error, , in expression (1.1) is . It can then be shown that the estimated variance of the prediction error, , in expression (1.1) is . Then, is called the standard error of prediction.
Thus, in general, we can write a prediction interval for an individual ‐value, as
where is the sample mean, is the sample standard deviation, is the sample size, and the t‐percentile comes from a t‐distribution with degrees of freedom.
For example, for a 95% interval (i.e., with 2.5% in each tail), the 97.5th percentile would be needed, whereas for a 90% interval (i.e., with 5% in each tail), the 95th percentile would be needed. These percentiles can be obtained from Table C.1. For example, the 95% prediction interval for an individual value of picked at random from the population of single‐family home sale prices is calculated as
What about the interpretation of a prediction interval? Well, for the home prices example, loosely speaking, we can say that “we are 95% confident that the sale price for an individual home picked at random from all single‐family homes in this housing market will be between and .” More precisely, if we were to take a large number of random samples of size 30 from our population of sale prices and calculate a 95% prediction interval for each, then 95% of those prediction intervals would contain the (unknown) sale price for an individual home picked at random from the population.
Interpretation of a prediction interval for an individual ‐value:
Suppose we have calculated a 95% prediction interval for an individual ‐value to be (, ). Then we can say that we are 95% confident that the individual ‐value is between and .
As discussed at the beginning of this section, the 95% prediction interval for an individual value of , , is much wider than the 95% confidence interval for the population mean single‐family home sale price, which was calculated as
Unlike for confidence intervals for the population mean, statistical software does not generally provide an automated method to calculate prediction intervals for an individual ‐value. Thus, they have to be calculated by hand using the sample statistics, and . However, there is a trick that can get around this (although it makes use of simple linear regression, which we cover in Chapter 2). First, create a variable that consists only of the value 1 for all observations. Then, fit a simple linear regression model using this variable as the predictor variable and as the response variable, and restrict the model to fit without an intercept (see computer help #25 in the software information files available from the book website). The estimated regression equation for this model will be a constant value equal to the sample mean of the response variable. Prediction intervals for this model will be the same for each value of the predictor variable (see computer help #30), and will be the same as a prediction interval for an individual ‐value. As further practice, calculate a 90% prediction interval for an individual sale price (see Problem 1.10). Calculate it by hand or using the trick just described. You should find that the interval is (, ).
We derived the formula for a confidence interval for a univariate population mean from the t‐version of the central limit theorem, which does not require the data ‐values to be normally distributed. However, the formula for a prediction interval for an individual univariate ‐value tends to work better for datasets in which the ‐values are at least approximately normally distributed—see Problem 1.12.
1.8 Chapter Summary
We spent some time in this chapter coming to grips with summarizing data (graphically and numerically) and understanding sampling distributions, but the four major concepts that will carry us through the rest of the book are as follows:
1 Statistical thinking is the process of analyzing quantitative information about a random sample of observations and drawing conclusions (statistical inferences) about the population from which the sample was drawn. An example is using a univariate sample mean, , as an estimate of the corresponding population mean and calculating the sample standard deviation, , to evaluate the precision of this estimate.