Computational Statistics in Data Science. Группа авторов

Чтение книги онлайн.

Читать онлайн книгу Computational Statistics in Data Science - Группа авторов страница 24

Computational Statistics in Data Science - Группа авторов

Скачать книгу

4, the creation of fast, flexible, and friendly statistical algo‐ware.

stat08324fgz001

      Section 4.1 presents Core Challenge 4, achieving “algo‐ware” (a neologism suggesting an equal emphasis on the statistical algorithm and its implementation) that is sufficiently efficient, broad, and user‐friendly to empower everyday statisticians and data scientists. Core Challenge 5 (Section 4.2) explores the mapping of these algorithms to computational hardware for optimal performance. Hardware‐optimized implementations often exploit model‐specific structures, but good, general‐purpose software should also optimize common routines.

      4.1 Fast, Flexible, and Friendly Statistical Algo‐Ware

      To accommodate the greatest range of models while remaining simple enough to encourage easy implementation, inference methods should rely solely on the quantities that can be computed algorithmically for any given model. The log‐likelihood (or log‐density in the Bayesian setting) is one such quantity, and one can employ the computational graph framework [77, 78] to evaluate conditional log‐likelihoods for any subset of model parameters as well as their gradients via backpropagation [79]. Beyond being efficient in terms of the first three Core Challenges, an algorithm should demonstrate robust performance on a reasonably wide range of problems without extensive tuning if it is to lend itself to successful software deployment.

      HMC (Section 2.2) is a prominent example of a general‐purpose algorithm for Bayesian inference, only requiring the log‐density and its gradient. The generic nature of HMC has opened up possibilities for complex Bayesian modeling as early as Neal [80], but its performance is highly sensitive to model parameterization and its three tuning parameters, commonly referred to as trajectory length, step size, and mass matrix [27]. Tuning issues constitute a major obstacle to the wider adoption of the algorithm, as evidenced by the development history of the popular HMC‐based probabilistic programming software Stan [81], which employs the No‐U‐Turn sampler (NUTS) of Hoffman and Gelman [82] to make HMC user‐friendly by obviating the need to tune its trajectory length. Bayesian software packages such as Stan empirically adapt the remaining step size and mass matrix [83]; this approach helps make the use of HMC automatic though is not without issues [84] and comes at the cost of significant computational overhead.

      Although HMC is a powerful algorithm that has played a critical role in the emergence of general‐purpose Bayesian inference software, the challenges involved in its practical deployment also demonstrate how an algorithm – no matter how versatile and efficient at its best – is not necessarily useful unless it can be made easy for practitioners to use. It is also unlikely that one algorithm works well in all situations. In fact, there are many distributions on which HMC performs poorly [83, 85, 86]. Additionally, HMC is incapable of handling discrete distributions in a fully general manner despite the progresses made in extending HMC to such situations [87, 88].

      On the other hand, Gibbs samplers often require little tuning and can take advantage of highly optimized algorithms for each conditional update, as done in the examples of Section 3. A clear advantage of the Gibbs sampler is that it tends to make software implementation quite modular; for example, each conditional update can be replaced with the latest state‐of‐the‐art samplers as they appear [92], and adding a new feature may amount to no more than adding a single conditional update [75]. In this way, an algorithm may not work in a completely model‐agnostic manner but with a broad enough scope can serve as a valuable recipe or meta‐algorithm for building model‐specific algorithms and software. The same is true for optimization methods. Even though its “E”‐step requires a derivation (by hand) for each new model, the EM algorithm [93] enables maximum‐likelihood estimation for a wide range of models. Similarly, variational inference (VI) for approximate Bayes requires manual derivations but provides a general framework to turn posterior computation into an optimization problem [94]. As meta‐algorithms, both EM and VI expand their breadth of use by replacing analytical derivations with Monte Carlo estimators but suffer losses in statistical and computational efficiency [95, 96]. Indeed, such trade‐offs will continue to haunt the creation of fast, flexible, and friendly statistical algo‐ware well into the twenty‐first century.

      4.2 Hardware‐Optimized Inference

      But successful statistical inference software must also interact with computational hardware in an optimal manner. Growing datasets require the computational statistician to give more and more thought to how the computer implements any statistical algorithm. To effectively leverage computational resources, the statistician must (i) identify the routine's computational bottleneck (Section

Скачать книгу