Читать онлайн книгу - Computational Statistics in Data Science. Группа авторов. Математика. LiveLib

Новинки Лучшее Рекомендации

Информация о книге:

Название:

Автор:

Жанр:

Серия:

Издательство:

Computational Statistics in Data Science - Группа авторов

Скачать книгу

href="#fb3_img_img_3aa5e4d9-0226-5d96-b997-c8fdff60270f.png" alt="script í’³"/>, and suppose that

theta element-of double-struck upper R Superscript p

are features of interest of

. Specifically,

may be a combination of quantiles, means, and variances associated with

. Samples

upper X 1 comma ellipsis comma upper X Subscript n Baseline

are obtained via simulation either approximately or exactly from

, and a consistent estimator of

, is constructed so that, as

(1)

Thus, even when is a complicated distribution, Monte Carlo simulation allows for estimation of features of . Throughout, we assume that either independent and identically distributed (IID) samples or MCMC samples from can be obtained efficiently; see Refs [1–5] for various techniques.

The foundation of Monte Carlo simulation methods rests on asymptotic convergence as indicated by (1). When enough samples are obtained, , and simulation can be terminated with reasonable confidence. For many estimators, an asymptotic sampling distribution is available in order to ascertain the variability in estimation via a central limit theorem (CLT) or application of the delta method on a CLT. Section 2 introduces estimators of , while Section 3 discusses sampling distributions of these estimators for IID and MCMC sampling.

Although Monte Carlo simulation relies on large‐sample frequentist statistics, it is fundamentally different in two ways. First, data is generated by a computer, and so often there is little cost to obtaining further samples. Thus, the reliance on asymptotics is reasonable. Second, data is obtained sequentially, so determining when to terminate the simulation can be based on the samples already obtained. As this implies a random simulation time, additional safeguards are necessary to ensure asymptotic validity. This has led to the study of sequential stopping rules, which we present in Section 5.

Sequential stopping rules rely on estimating the limiting Monte Carlo variance–covariance matrix (when , this is the standard error of ). This is a particularly challenging problem in MCMC due to serial correlation in the samples. We discuss these challenges in Section 4 and present estimators appropriate for large simulation sizes.

Over a variety of examples in Section 7, we conclude that the simulation size required for a reliable estimation is often higher than what is commonly used by practitioners (see also Refs [6, 7]. Given modern computational power, the recommended strategies can easily be adopted in most estimation problems. We conclude the introduction with an example illustrating the need for careful sample size calculations.

Example 1. Consider IID draws . An estimate of is , and is estimated with the sample variance, . Let be the th quantile of a standard normal distribution, for . A large‐sample confidence interval for is

upper X overbar plus-or-minus z Subscript 1 minus alpha slash 2 Baseline StartFraction s Over StartRoot m EndRoot EndFraction

Confidence intervals are notoriously difficult to understand at a first instance, and thus a standard Monte Carlo experiment in an introductory statistics course is that of repeating the above experiment multiple times and illustrating that on average about proportion of such confidence intervals will contain the true mean. That is, for , we generate , calculate the mean and the sample variance , and define to be

p Subscript t Baseline equals upper I left-brace upper X overbar Subscript t Baseline minus z Subscript 1 minus alpha slash 2 Baseline StartFraction s Subscript t Baseline Over StartRoot m EndRoot EndFraction less-than theta less-than upper X overbar Subscript t Baseline plus z Subscript 1 minus alpha slash 2 Baseline StartFraction s Subscript t Baseline Over StartRoot m EndRoot EndFraction right-brace

where

Скачать книгу

Computational Statistics in Data Science. Группа авторов

Чтение книги онлайн.

Читать онлайн книгу Computational Statistics in Data Science - Группа авторов страница 55

Информация о книге: