Читать онлайн книгу - Computational Statistics in Data Science. Группа авторов. Математика. LiveLib

Новинки Лучшее Рекомендации

Информация о книге:

Название:

Автор:

Жанр:

Серия:

Издательство:

Computational Statistics in Data Science - Группа авторов

Скачать книгу

alt="script í’ª left-parenthesis upper N right-parenthesis"/>, but there are many situations in which they scale

script í’ª left-parenthesis upper N squared right-parenthesis

[21, 22] or worse. Indeed, when

is large, it is often advantageous to use more advanced MCMC algorithms that use the gradient of the log‐posterior to generate better proposals. In this situation, the log‐likelihood gradient may also become a computational bottleneck [21].

2.2 Big P

One of the simplest models for big problems is ridge regression [23], but computing can become expensive even in this classical setting. Ridge regression estimates the coefficient by minimizing the distance between the observed and predicted values and along with a weighted square norm of :

StartLayout 1st Row 1st Column ModifyingAbove bold-italic theta With Ì‚ equals argmin left-brace double-vertical-bar bold y minus bold upper X bold-italic theta double-vertical-bar squared plus double-vertical-bar bold upper Phi Superscript 1 slash 2 Baseline bold-italic theta double-vertical-bar squared right-brace equals left-parenthesis bold upper X Superscript intercalate Baseline bold upper X plus bold upper Phi right-parenthesis Superscript negative 1 Baseline bold upper X Superscript intercalate Baseline bold y 2nd Column Blank EndLayout

For illustrative purposes, we consider the following direct method for computing .⁴ We can first multiply the design matrix by its transpose at the cost of and subsequently invert the matrix at the cost of . The total complexity shows that (i) a large number of parameters is often sufficient for making even the simplest of tasks infeasible and (ii) a moderate number of parameters can render a task impractical when there are a large number of observations. These two insights extend to more complicated models: the same complexity analysis holds for the fitting of generalized linear models (GLMs) as described in McCullagh and Nelder [12].

In the context of Bayesian inference, the length of the vector dictates the dimension of the MCMC state space. For the M‐H algorithm (Section 2.1) with ‐dimensional Gaussian target and proposal, Gelman et al. [25] show that the proposal distribution's covariance should be scaled by a factor inversely proportional to . Hence, as the dimension of the state space grows, it behooves one to propose states that are closer to the current state of the Markov chain, and one must greatly increase the number of MCMC iterations. At the same time, an increasing often slows down rate‐limiting likelihood calculations (Section 2.1). Taken together, one must generate many more, much slower MCMC iterations. The wide applicability of latent variable models [26] (Sections 3.1 and 3.2) for which each observation has its own parameter set (e.g., ) means M‐H simply does not work for a huge class of models popular with practitioners.

For these reasons, Hamiltonian Monte Carlo (HMC) [27] has become a popular algorithm for fitting Bayesian models with large numbers of parameters. Like M‐H, HMC uses an accept step (Equation 2). Unlike M‐H, HMC takes advantage of additional information about the target distribution in the form of the log‐posterior gradient. HMC works by doubling the state space dimension with an auxiliary Gaussian “momentum” variable independent to the “position” variable . The constructed Hamiltonian system has energy function given by the negative logarithm of the joint distribution

StartLayout 1st Row 1st Column upper H left-parenthesis bold-italic theta comma bold p right-parenthesis proportional-to minus log left-parenthesis normal pi left-parenthesis bold-italic theta vertical-bar bold upper X right-parenthesis times exp left-parenthesis minus bold p Superscript upper T Baseline bold upper M Superscript negative 1 Baseline bold p slash 2 right-parenthesis right-parenthesis proportional-to minus log normal pi left-parenthesis bold-italic theta vertical-bar bold upper X right-parenthesis plus bold p Superscript upper T Baseline bold upper M Superscript negative 1 Baseline bold p slash 2 2nd Column Blank EndLayout

and we produce proposals by simulating the system according to Hamilton's equations

StartLayout 1st Row 1st Column ModifyingAbove bold-italic theta With dot 2nd Column equals StartFraction partial-differential Over partial-differential bold p EndFraction upper H left-parenthesis bold-italic theta comma bold p right-parenthesis equals upper M Superscript negative 1 Baseline bold p slash 2 2nd Row 1st Column ModifyingAbove bold p With dot 2nd Column equals minus StartFraction partial-differential Over partial-differential bold-italic theta EndFraction upper H left-parenthesis bold-italic theta comma bold p right-parenthesis equals nabla log normal pi left-parenthesis bold-italic theta vertical-bar bold upper X right-parenthesis EndLayout

Thus,

Скачать книгу

Computational Statistics in Data Science. Группа авторов

Чтение книги онлайн.

Читать онлайн книгу Computational Statistics in Data Science - Группа авторов страница 19

Информация о книге:

2.2 Big P