Computational Statistics in Data Science. Группа авторов

Чтение книги онлайн.

Читать онлайн книгу Computational Statistics in Data Science - Группа авторов страница 55

Computational Statistics in Data Science - Группа авторов

Скачать книгу

href="#fb3_img_img_3aa5e4d9-0226-5d96-b997-c8fdff60270f.png" alt="script í’³"/>, and suppose that theta element-of double-struck upper R Superscript p are features of interest of upper F. Specifically, theta may be a combination of quantiles, means, and variances associated with upper F. Samples upper X 1 comma ellipsis comma upper X Subscript n Baseline are obtained via simulation either approximately or exactly from upper F, and a consistent estimator of theta, ModifyingAbove theta With Ì‚, is constructed so that, as n right-arrow infinity,

      Thus, even when upper F is a complicated distribution, Monte Carlo simulation allows for estimation of features of upper F. Throughout, we assume that either independent and identically distributed (IID) samples or MCMC samples from upper F can be obtained efficiently; see Refs [1–5] for various techniques.

      Sequential stopping rules rely on estimating the limiting Monte Carlo variance–covariance matrix (when p equals 1, this is the standard error of ModifyingAbove theta With Ì‚). This is a particularly challenging problem in MCMC due to serial correlation in the samples. We discuss these challenges in Section 4 and present estimators appropriate for large simulation sizes.

      Over a variety of examples in Section 7, we conclude that the simulation size required for a reliable estimation is often higher than what is commonly used by practitioners (see also Refs [6, 7]. Given modern computational power, the recommended strategies can easily be adopted in most estimation problems. We conclude the introduction with an example illustrating the need for careful sample size calculations.

      Example 1. Consider IID draws upper X 1 comma ellipsis comma upper X Subscript m Baseline tilde upper N left-parenthesis theta comma sigma squared right-parenthesis. An estimate of theta is upper X overbar equals m Superscript negative 1 Baseline sigma-summation Underscript i equals 1 Overscript m Endscripts upper X Subscript i, and sigma squared is estimated with the sample variance, s squared. Let z Subscript u be the uth quantile of a standard normal distribution, for 0 less-than u less-than 1. A large‐sample left-parenthesis 1 minus alpha right-parenthesis 100 percent-sign confidence interval for theta is

upper X overbar plus-or-minus z Subscript 1 minus alpha slash 2 Baseline StartFraction s Over StartRoot m EndRoot EndFraction

      Confidence intervals are notoriously difficult to understand at a first instance, and thus a standard Monte Carlo experiment in an introductory statistics course is that of repeating the above experiment multiple times and illustrating that on average about left-parenthesis 1 minus alpha right-parenthesis proportion of such confidence intervals will contain the true mean. That is, for t equals 1 comma ellipsis comma n, we generate upper X Subscript t Baseline 1 Baseline comma ellipsis comma upper X Subscript t m Baseline tilde upper N left-parenthesis theta comma sigma squared right-parenthesis, calculate the mean upper X overbar Subscript t and the sample variance s Subscript t Superscript 2, and define p Subscript t to be

p Subscript t Baseline equals upper I left-brace upper X overbar Subscript t Baseline minus z Subscript 1 minus alpha slash 2 Baseline StartFraction s Subscript t Baseline Over StartRoot m EndRoot EndFraction less-than theta less-than upper X overbar Subscript t Baseline plus z Subscript 1 minus alpha slash 2 Baseline StartFraction s Subscript t Baseline Over StartRoot m EndRoot EndFraction right-brace

      where

Скачать книгу