Читать онлайн книгу - Computational Statistics in Data Science. Группа авторов. Математика. LiveLib

Новинки Лучшее Рекомендации

Информация о книге:

Название:

Автор:

Жанр:

Серия:

Издательство:

Computational Statistics in Data Science - Группа авторов

Скачать книгу

4, the creation of fast, flexible, and friendly statistical algo‐ware.

Figure 1 A nontraditional and critically important application in computational statistics is the reconstruction of evolutionary histories in the form of phylogenetic trees. Here is a maximum clade credible tree of the Dengue virus example. The dataset consists of sequences of the serotype of the Dengue virus. Branches are coded by the posterior means of the branch‐specific evolutionary rates according to the gradient bar on the top left. The concentric circles indicate the timescale with the year numbers. The outer ring indicates the geographic locations of the samples by the color code on the bottom left. ‘I’ and ‘II’ indicate the two Brazilian lineages as in the original study.

4 Core Challenges 4 and 5

Section 3 provides examples of how computational statisticians might address Core Challenges 1–3 (big , big , and big ) for individual models. Such advances in computational methods must be accompanied by easy‐to‐use software to make them accessible to end users. As Gentle et al. [76] put it, “While referees and editors of scholarly journals determine what statistical theory and methods are published, the developers of the major statistical software packages determine what statistical methods are used.” We would like statistical software to be widely applicable yet computationally efficient at the same time. Trade‐offs invariably arise between these two desiderata, but one should nonetheless strive to design algorithms that are general enough to solve an important class of problems and as efficiently as possible in doing so.

Section 4.1 presents Core Challenge 4, achieving “algo‐ware” (a neologism suggesting an equal emphasis on the statistical algorithm and its implementation) that is sufficiently efficient, broad, and user‐friendly to empower everyday statisticians and data scientists. Core Challenge 5 (Section 4.2) explores the mapping of these algorithms to computational hardware for optimal performance. Hardware‐optimized implementations often exploit model‐specific structures, but good, general‐purpose software should also optimize common routines.

4.1 Fast, Flexible, and Friendly Statistical Algo‐Ware

To accommodate the greatest range of models while remaining simple enough to encourage easy implementation, inference methods should rely solely on the quantities that can be computed algorithmically for any given model. The log‐likelihood (or log‐density in the Bayesian setting) is one such quantity, and one can employ the computational graph framework [77, 78] to evaluate conditional log‐likelihoods for any subset of model parameters as well as their gradients via backpropagation [79]. Beyond being efficient in terms of the first three Core Challenges, an algorithm should demonstrate robust performance on a reasonably wide range of problems without extensive tuning if it is to lend itself to successful software deployment.

HMC (Section 2.2) is a prominent example of a general‐purpose algorithm for Bayesian inference, only requiring the log‐density and its gradient. The generic nature of HMC has opened up possibilities for complex Bayesian modeling as early as Neal [80], but its performance is highly sensitive to model parameterization and its three tuning parameters, commonly referred to as trajectory length, step size, and mass matrix [27]. Tuning issues constitute a major obstacle to the wider adoption of the algorithm, as evidenced by the development history of the popular HMC‐based probabilistic programming software Stan [81], which employs the No‐U‐Turn sampler (NUTS) of Hoffman and Gelman [82] to make HMC user‐friendly by obviating the need to tune its trajectory length. Bayesian software packages such as Stan empirically adapt the remaining step size and mass matrix [83]; this approach helps make the use of HMC automatic though is not without issues [84] and comes at the cost of significant computational overhead.

Although HMC is a powerful algorithm that has played a critical role in the emergence of general‐purpose Bayesian inference software, the challenges involved in its practical deployment also demonstrate how an algorithm – no matter how versatile and efficient at its best – is not necessarily useful unless it can be made easy for practitioners to use. It is also unlikely that one algorithm works well in all situations. In fact, there are many distributions on which HMC performs poorly [83, 85, 86]. Additionally, HMC is incapable of handling discrete distributions in a fully general manner despite the progresses made in extending HMC to such situations [87, 88].

But broader applicability comes with its own challenges. Among sampling‐based approaches to Bayesian inference, the Gibbs sampler [89, 90] is, arguably, the most versatile of the MCMC methods. The algorithm simplifies the task of dealing with a complex multidimensional posterior distribution by factorizing the posterior into simpler conditional distributions for blocks of parameters and iteratively updating parameters from their conditionals. Unfortunately, the efficiency of an individual Gibbs sampler depends on its specific factorization and the degree of dependence between its blocks of parameters. Without a careful design or in the absence of effective factorization, therefore, Gibbs samplers' performance may lag behind alternatives such as HMC [91].

On the other hand, Gibbs samplers often require little tuning and can take advantage of highly optimized algorithms for each conditional update, as done in the examples of Section 3. A clear advantage of the Gibbs sampler is that it tends to make software implementation quite modular; for example, each conditional update can be replaced with the latest state‐of‐the‐art samplers as they appear [92], and adding a new feature may amount to no more than adding a single conditional update [75]. In this way, an algorithm may not work in a completely model‐agnostic manner but with a broad enough scope can serve as a valuable recipe or meta‐algorithm for building model‐specific algorithms and software. The same is true for optimization methods. Even though its “E”‐step requires a derivation (by hand) for each new model, the EM algorithm [93] enables maximum‐likelihood estimation for a wide range of models. Similarly, variational inference (VI) for approximate Bayes requires manual derivations but provides a general framework to turn posterior computation into an optimization problem [94]. As meta‐algorithms, both EM and VI expand their breadth of use by replacing analytical derivations with Monte Carlo estimators but suffer losses in statistical and computational efficiency [95, 96]. Indeed, such trade‐offs will continue to haunt the creation of fast, flexible, and friendly statistical algo‐ware well into the twenty‐first century.

4.2 Hardware‐Optimized Inference

But successful statistical inference software must also interact with computational hardware in an optimal manner. Growing datasets require the computational statistician to give more and more thought to how the computer implements any statistical algorithm. To effectively leverage computational resources, the statistician must (i) identify the routine's computational bottleneck (Section

Скачать книгу

Computational Statistics in Data Science. Группа авторов

Чтение книги онлайн.

Читать онлайн книгу Computational Statistics in Data Science - Группа авторов страница 24

Информация о книге:

4 Core Challenges 4 and 5

4.1 Fast, Flexible, and Friendly Statistical Algo‐Ware

4.2 Hardware‐Optimized Inference