featured in a range of statistical and data science applications [46]. Traditionally, such techniques were commonly applied in the “” setting, and correspondingly computational algorithms focused on this situation [47], especially within the Bayesian literature [48].
Due to a growing number of initiatives for large‐scale data collections and new types of scientific inquiries made possible by emerging technologies, however, increasingly common are datasets that are “big ” and “big ” at the same time. For example, modern observational studies using health‐care databases routinely involve patients and clinical covariates [49]. The UK Biobank provides brain imaging data on patients, with , depending on the scientific question of interests [50]. Single‐cell RNA sequencing can generate datasets with (the number of cells) in millions and (the number of genes) in tens of thousands, with the trend indicating further growths in data size to come [51].
3.1.1 Continuous shrinkage: alleviating big M
Bayesian sparse regression, despite its desirable theoretical properties and flexibility to serve as a building block for richer statistical models, has always been relatively computationally intensive even before the advent of “big and big ” data [45, 52, 53]. A major source of its computational burden is severe posterior multimodality (big ) induced by the discrete binary nature of spike‐and‐slab priors (Section 2.3). The class of global–local continuous shrinkage priors is a more recent alternative to shrink s in a more continuous manner, thereby alleviating (if not eliminating) the multimodality issue [54, 55]. This class of prior is represented as a scale mixture of Gaussians:
The idea is that the global scale parameter would shrink most s toward zero, while the local scales, with its heavy‐tailed prior , allow a small number of and hence s to be estimated away from zero. While motivated by two different conceptual frameworks, the spike‐and‐slab can be viewed as a subset of global–local priors in which is chosen as a mixture of delta masses placed at and . Continuous shrinkage mitigates the multimodality of spike‐and‐slab by smoothly bridging small and large values of .
On the other hand, the use of continuous shrinkage priors does not address the increasing computational burden from growing and in modern applications. Sparse regression posteriors under global–local priors are amenable to an effective Gibbs sampler, a popular class of MCMC we describe further in Section 4.1. Under the linear and logistic models, the computational bottleneck of this Gibbs sampler stems from the need for repeated updates of from its conditional distribution
where is an additional parameter of diagonal matrix and .5 Sampling from this high‐dimensional Gaussian distribution requires operations with the standard approach [58]: for computing the term and for Cholesky factorization of . While an alternative approach by Bhattacharya et al. [48] provides the complexity of Скачать книгу