Computational Statistics in Data Science. Группа авторов
Чтение книги онлайн.
Читать онлайн книгу Computational Statistics in Data Science - Группа авторов страница 29
2 Statistical Software
Alfred G. Schissler and Alexander D. Knudson
The University of Nevada, Reno, NV, USA
This chapter discusses selected statistical software in a format that will inform users transitioning from basic applications to more advanced applications, including elaborate statistical modeling and machine learning (ML), simulation design, and big data situations. We begin with discussions on the most popular statistical software. In the course of these expositions, we provide some historical context for the computing environment, discuss the foundational principles for the development of the language (purpose), discuss user environments/workflows, and analyze strengths and shortcomings for the language (compared to other popular/notable statistical software), language support, among other software features.
Next, we briefly mention an array of software used for statistical applications. We discuss the specific purpose of each software and how the tool fills a need for data scientists. The aim here is to be fairly complete to provide a comprehensive viewpoint of the statistical software ecosystem and to leave readers with some familiarity with the most prevalent languages and software.
After the presentation of noteworthy software, we transition to describing a handful of emerging and promising statistical computing technologies. Our goal in these sections is to guide users who wish to be early adopters for a software application or readers facing a scale‐limiting aspect to their current statistical programming language. Some of the latest tools for big data statistical applications are discussed in these sections.
To orientate the reader to the discussion below, two tables are provided. Table 1 includes a list of the software described in the chapter. Throughout, we discuss user environments and workflow considerations to provide practical guidance, aiming to increase efficiency and describe typical use cases. Table 2 summarizes these environments included in the sections that follow.
1 User Development Environments
We begin by discussing user environments rather than focusing on specific statistical programming languages. The subsections below contain descriptions of some selected user development environments and related tools. This introductory material may be omitted if desired, and one can safely proceed to Section 2 for descriptions of the most popular statistical software.
Table 1 Summary of selected statistical software.
Software | Open source | Classification | Style | Notes |
---|---|---|---|---|
Python | Y | Popular | Programming | Versatile, popular |
R | Y | Popular | Programming | Academia/Industry, active community |
SAS | N | Popular | Programming | Strong historical following |
SPSS | N | Popular | GUI: menu, dialogs | Popular in scholarly work |
C++ | Y | Notable | Programming | Fast, low‐level |
Excel | N | Notable | GUI: menu, dialogs | Simple, works well for rectangular data |
GNU Octave | Y | Notable | Mixed | Open source counterpart to MATLAB |
Java | Y | Notable | Programming | Cross‐platform, portable |
JavaScript, Typescript | Y | Notable | Programming | Popular, cross‐platform |
Maple | N | Notable | Mixed | Academia, algebraic manipulation |
MATLAB | N | Notable | Mixed | Speedy, popular among engineers |
Minitab | N | Notable | GUI: menu, dialogs | Suitable for teaching and simple analysis |
SQL | Y | Notable | Programming | Necessary tool for databases |
Stata | N | Notable | GUI: menu, dialogs | Popular in scholary works |
Tableau | N | Notable | GUI: menu, dialogs | Popular for business analytics |
Julia | Y | Promising | Programming | Speedy, underdeveloped |
Scala | Y | Promising | Programming | Typed version of Java, less boilerplate code |
Table 2 Summary of selected user environments/workflows.
Software | Virtual environment | Multiple languages | Remote integration | Notes |
---|---|---|---|---|
Emacs, Vim | N | Y | Y | Extensible, steep learning curve |