Applied Regression Modeling. Iain Pardoe
Чтение книги онлайн.
Читать онлайн книгу Applied Regression Modeling - Iain Pardoe страница 25
3 Hypothesis testing provides another means of making decisions about the likely values of a population parameter. An example is hypothesis testing for a univariate population mean, whereby the magnitude of a calculated sample test statistic,indicates which of two hypotheses (about likely values for the population mean) we should favor.
4 Prediction intervals, while similar in spirit to confidence intervals, tackle the different problem of predicting the value of an individual observation picked at random from the population. An example is the prediction interval for an individual univariate ‐value, which takes the form
Problems
“Computer help” refers to the numbered items in the software information files available from the book website. There are brief answers to the even‐numbered problems in Appendix F (www.wiley.com/go/pardoe/AppliedRegressionModeling3e).
1 1.1 Assume that weekly orders of a popular mobile phone at a local store follow a normal distribution with mean and standard deviation . Find the scores, , that correspond to the:95th percentile (i.e., find such that );50th percentile (i.e., find such that );2.5th percentile (i.e., find such that ). Suppose represents potential values of repeated sample means from this population for samples of size . Use the normal version of the central limit theorem to find the mean scores, , that correspond to the:95th percentile (i.e., find such that );50th percentile (i.e., find such that );2.5th percentile (i.e., find such that ).How many phones should the store order to be 95% confident they can meet demand for a particular week?
2 1.2 Assume that final scores in a statistics course follow a normal distribution with mean and standard deviation . Find the scores, , that correspond to the:90th percentile (i.e., find such that );99th percentile (i.e., find such that );5th percentile (i.e., find such that ). Suppose represents potential values of repeated sample means from this population for samples of size (e.g., average class scores). Use the normal version of the central limit theorem to find the mean scores, , that correspond to the:90th percentile (i.e., find such that );99th percentile (i.e., find such that );5th percentile (i.e., find such that ).If the bottom 5% of the class fail, what is the cut‐off percentage to pass the class?The university requires the long‐term average class score for this course to be no higher than 75%. Does this requirement seem feasible?
3 1.3 The NBASALARY data file contains salary information for 214 guards in the National Basketball Association (NBA) for 2009–2010 (obtained from the online USA Today NBA Salaries Database).Construct a histogram of the variable, representing 2009–2010 salaries in thousands of dollars [computer help #14].What would we expect the histogram to look like if the data were normal?Construct a QQ‐plot of the variable [computer help #22].What would we expect the QQ‐plot to look like if the data were normal?Compute the natural logarithm of guard salaries (call this variable ) [computer help #6], and construct a histogram of this variable [computer help #14]. Hint: The “natural logarithm” transformation (also known as “log to base‐e,” or by the symbols or ln) is a way to transform (rescale) skewed data to make them more symmetric and normal.Construct a QQ‐plot of the variable [computer help #22].Based on the plots in parts (a), (c), (e), and (f), say whether salaries or log‐salaries more closely follow a normal curve, and justify your response.
4 1.4 A company's pension plan includes 50 mutual funds, with each fund expected to earn a mean, , of 3% over the risk‐free rate with a standard deviation of %. Based on the assumption that the funds are randomly selected from a population of funds with normally distributed returns in excess of the risk‐free rate, find the probability that an individual fund's return in excess of the risk‐free rate is, respectively, greater than 34.1%, greater than 15.7%, or less than %. In other words, if represents potential values of individual fund returns, find:;;. Use the normal version of the central limit theorem to approximate the probability that the pension plan's overall mean return in excess of the risk‐free rate is, respectively, greater than 7.4%, greater than 4.8%, or less than 0.7%. In other words, if represents potential values of repeated sample means, find:;;.
5 1.5 Consider the data on 2009–2010 salaries of 214 NBA guards from Problem 1.3.Calculate a 95% confidence interval for the population mean in thousands of dollars [computer help #23]. Hint: Calculate by hand (using the fact that the sample mean of is 3980.318, the sample standard deviation is 4525.378, and the 97.5th percentile of the t‐distribution with 213 degrees of freedom is approximately 1.971) and check your answer using statistical software.Consider , the natural logarithms of the salaries. The sample mean of is 7.664386. Re‐express this number in thousands of dollars (the original units of salary).Hint: To back‐transform a number in natural logarithms to its original scale, use the “exponentiation” function on a calculator [denoted exp(X) or , where X is the variable expressed in natural logarithms]. This is because exp((Y)) Y.Compute a 95% confidence interval for the population mean in natural logarithms of thousands of dollars [computer help #23].Hint: Calculate by hand (using the fact that the sample mean of is 7.664386, the sample standard deviation of is 1.197118, and the 97.5th percentile of the t‐distribution with 213 degrees of freedom is approximately 1.971) and check your answer using statistical software.Re‐express each interval endpoint of your 95% confidence interval computed in part (c) in thousands of dollars and say what this interval means in words.The confidence interval computed in part (a) is exactly symmetric about the sample mean of . Is the confidence interval computed in part (d) exactly symmetric about the sample mean of back‐transformed to thousands of dollars that you computed in part (b)? How does this relate to quantifying our uncertainty about the population mean salary?Hint: Looking at the histogram from Problem 3 part (a), if someone asked you to give lower and upper bounds on the population mean salary using your intuition rather than statistics, would you give a symmetric or an asymmetric interval?
6 1.6 The FINALSCORES data file contains values of variable , which measures final scores in a statistics course.Calculate the sample mean and sample standard deviation of [computer help #10].Calculate a 90% confidence interval for the population mean of [computer help #23]. Hint: Calculate by hand (using the sample mean and sample standard deviation from part (a), and the 95th percentile of the t‐distribution with 99 degrees of freedom, which is approximately 1.660) and check your answer using statistical software.
7 1.7 Gapminder is a “non‐profit venture promoting sustainable global development and achievement of the United Nations Millennium Development Goals.” It provides related time series data for all countries in the world at the website www.gapminder.org . For example, the COUNTRIES data file contains the 2010 population count (variable in millions) of the 55 most populous countries together with 2010 life expectancy at birth (variable in years).Calculate the sample mean and sample standard deviation of [computer help #10].Briefly say why calculating a confidence interval for the population mean would not be useful for understanding mean population counts for all countries in the world.Consider the variable , which represents the average number of years a newborn child would live if current mortality patterns were to stay the same. Suppose that for this variable, these 55 countries could be considered a random sample from the population of all countries in the world. Calculate a 95% confidence interval for the population mean of [computer help #23]. Hint: Calculate by hand (using the fact that the sample mean of is 69.787, the sample standard deviation is 9.2504, and the 97.5th percentile of the t‐distribution with 54 degrees of freedom is approximately 2.005) and check your answer using statistical software.
8 1.8 Consider the FINALSCORES data file from Problem 1.6.Do a hypothesis test to determine whether there is sufficient evidence at a significance level of 5% to conclude that the population mean of is greater than 66 [computer help #24].Repeat part (a) but test whether the population mean of is less than 73.Repeat part (a) but test whether the population mean of is not equal to 66.
9 1.9