Computational Statistics in Data Science. Группа авторов

Чтение книги онлайн.

Читать онлайн книгу Computational Statistics in Data Science - Группа авторов страница 20

Computational Statistics in Data Science - Группа авторов

Скачать книгу

ascent for the log‐posterior, forming an analogy with first‐order optimization. The cost is repeated gradient evaluations that may comprise a new computational bottleneck, but the result is effective MCMC for tens of thousands of parameters [21, 28]. The success of HMC has inspired research into other methods leveraging gradient information to generate better MCMC proposals when upper P is large [29].

      2.3 Big M

      Global optimization, or the problem of finding the minimum of a function with arbitrarily many local minima, is NP‐complete in general [30], meaning – in layman's terms – it is impossibly hard. In the absence of a tractable theory, by which one might prove one's global optimization procedure works, brute‐force grid and random searches and heuristic methods such as particle swarm optimization [31] and genetic algorithms [32] have been popular. Due to the overwhelming difficulty of global optimization, a large portion of the optimization literature has focused on the particularly well‐behaved class of convex functions [33, 34], which do not admit multiple local minima. Since Fisher introduced his “maximum likelihood” in 1922 [35], statisticians have thought in terms of maximization, but convexity theory still applies by a trivial negation of the objective function. Nonetheless, most statisticians safely ignored concavity during the twentieth century: exponential family log‐likelihoods are log‐concave, so Newton–Raphson and Fisher scoring are guaranteed optimality in the context of GLMs [12, 34].

      Nearing the end of the twentieth century, multimodality and nonconvexity became more important for statisticians considering high‐dimensional regression, that is, regression with many covariates (big upper P). Here, for purposes of interpretability and variance reduction, one would like to induce sparsity on the weights vector ModifyingAbove bold-italic theta With Ì‚ by performing best subset selection [36, 37]:

      where 0 less-than k less-than-or-equal-to upper P, and StartAbsoluteValue EndAbsoluteValue dot StartAbsoluteValue EndAbsoluteValue Subscript 0 denotes the script l 0‐norm, that is, the number of nonzero elements. Because best subset selection requires an immensely difficult nonconvex optimization, Tibshirani [38] famously replaces the script l 0‐norm with the script l 1‐norm, thereby providing sparsity, while nonetheless maintaining convexity.

StartLayout 1st Row 1st Column bold y tilde Normal Subscript upper N Baseline left-parenthesis bold upper X bold upper Gamma bold-italic theta comma sigma squared bold upper I Subscript upper N Baseline right-parenthesis for left-bracket bold upper Gamma right-bracket Subscript p p Sub Superscript prime Subscript Baseline equals Start 2 By 2 Matrix 1st Row 1st Column gamma Subscript p Baseline tilde Bernoulli left-parenthesis normal pi right-parenthesis 2nd Column p equals p Superscript prime Baseline 2nd Row 1st Column 0 2nd Column p not-equals p Superscript prime Baseline EndMatrix and normal pi element-of left-parenthesis 0 comma 1 right-parenthesis 2nd Column Blank EndLayout

      In the following section, we present an alternative Bayesian sparse regression approach that mitigates the combinatorial problem along with a state‐of‐the‐art computational technique that scales well both in upper N and upper P.

      These challenges will remain throughout the twenty‐first century, but it is possible to make significant advances for specific statistical tasks or classes of models. Section 3.1 considers Bayesian sparse regression based on continuous shrinkage priors, designed to alleviate the heavy multimodality (big upper M) of the more traditional spike‐and‐slab approach. This model presents a major computational challenge as upper N and upper P grow, but a recent computational advance makes the posterior inference feasible for many modern large‐scale applications.

      And because of the rise of data science, there are increasing opportunities for computational statistics to grow by enabling and extending statistical inference for scientific applications previously outside of mainstream statistics. Here, the science may dictate the development of structured models with complexity possibly growing in upper N and upper P. Section 3.2 presents a method for fast phylogenetic inference, where the primary structure of interest is a “family tree” describing a biological evolutionary history.

      3.1 Bayesian Sparse Regression in the Age of Big N and Big P

      With the goal of identifying a small subset of relevant features among a large number of

Скачать книгу