Computational Statistics in Data Science. Группа авторов

Чтение книги онлайн.

Читать онлайн книгу Computational Statistics in Data Science - Группа авторов страница 19

Computational Statistics in Data Science - Группа авторов

Скачать книгу

alt="script í’ª left-parenthesis upper N right-parenthesis"/>, but there are many situations in which they scale script í’ª left-parenthesis upper N squared right-parenthesis [21, 22] or worse. Indeed, when upper P is large, it is often advantageous to use more advanced MCMC algorithms that use the gradient of the log‐posterior to generate better proposals. In this situation, the log‐likelihood gradient may also become a computational bottleneck [21].

      2.2 Big P

      One of the simplest models for big upper P problems is ridge regression [23], but computing can become expensive even in this classical setting. Ridge regression estimates the coefficient bold-italic theta by minimizing the distance between the observed and predicted values bold y and bold upper X bold-italic theta along with a weighted square norm of bold-italic theta:

StartLayout 1st Row 1st Column ModifyingAbove bold-italic theta With Ì‚ equals argmin left-brace double-vertical-bar bold y minus bold upper X bold-italic theta double-vertical-bar squared plus double-vertical-bar bold upper Phi Superscript 1 slash 2 Baseline bold-italic theta double-vertical-bar squared right-brace equals left-parenthesis bold upper X Superscript intercalate Baseline bold upper X plus bold upper Phi right-parenthesis Superscript negative 1 Baseline bold upper X Superscript intercalate Baseline bold y 2nd Column Blank EndLayout

      In the context of Bayesian inference, the length upper P of the vector bold-italic theta dictates the dimension of the MCMC state space. For the M‐H algorithm (Section 2.1) with upper P‐dimensional Gaussian target and proposal, Gelman et al. [25] show that the proposal distribution's covariance should be scaled by a factor inversely proportional to upper P. Hence, as the dimension of the state space grows, it behooves one to propose states bold-italic theta Superscript asterisk that are closer to the current state of the Markov chain, and one must greatly increase the number upper S of MCMC iterations. At the same time, an increasing upper P often slows down rate‐limiting likelihood calculations (Section 2.1). Taken together, one must generate many more, much slower MCMC iterations. The wide applicability of latent variable models [26] (Sections 3.1 and 3.2) for which each observation has its own parameter set (e.g., upper P proportional-to upper N) means M‐H simply does not work for a huge class of models popular with practitioners.

StartLayout 1st Row 1st Column upper H left-parenthesis bold-italic theta comma bold p right-parenthesis proportional-to minus log left-parenthesis normal pi left-parenthesis bold-italic theta vertical-bar bold upper X right-parenthesis times exp left-parenthesis minus bold p Superscript upper T Baseline bold upper M Superscript negative 1 Baseline bold p slash 2 right-parenthesis right-parenthesis proportional-to minus log normal pi left-parenthesis bold-italic theta vertical-bar bold upper X right-parenthesis plus bold p Superscript upper T Baseline bold upper M Superscript negative 1 Baseline bold p slash 2 2nd Column Blank EndLayout

      and we produce proposals by simulating the system according to Hamilton's equations

StartLayout 1st Row 1st Column ModifyingAbove bold-italic theta With dot 2nd Column equals StartFraction partial-differential Over partial-differential bold p EndFraction upper H left-parenthesis bold-italic theta comma bold p right-parenthesis equals upper M Superscript negative 1 Baseline bold p slash 2 2nd Row 1st Column ModifyingAbove bold p With dot 2nd Column equals minus StartFraction partial-differential Over partial-differential bold-italic theta EndFraction upper H left-parenthesis bold-italic theta comma bold p right-parenthesis equals nabla log normal pi left-parenthesis bold-italic theta vertical-bar bold upper X right-parenthesis EndLayout

Скачать книгу