The Big R-Book. Philippe J. S. De Brouwer

Чтение книги онлайн.

Читать онлайн книгу The Big R-Book - Philippe J. S. De Brouwer страница 56

The Big R-Book - Philippe J. S. De Brouwer

Скачать книгу

href="#u5a09e035-b732-5113-a1bc-319a73819ad6">Chapter 17 “DataWrangling in the tidyverse” on page 265.

       ggplot2 is a system to create graphics with a philosophy: it adheres to a “Grammar of Graphics” and is able to create really stunning results at a reasonable price (it is a notch more abstract to use than the core-R functionality). For more information, see Chapter 31 “A Grammar of Graphics with ggplot2” on page 687.ggplot2For both reasons, we will talk more about it in the sections about reporting: see Chapter 31 on page 687.

       readr expands R's standard5 functionality to read in rectangular6 data.readrIt is more robust, knows more data types and is faster than the core-R functionality. For more information, see Chapter 17.1.2 “Importing Flat Files in the Tidyverse” on page 267 and its subsections.

       purrr is casually mentioned in the section about the OO model in R (see Chapter 6 on page 87), and extensively used in Chapter 25.1 “Model Quality Measures” on page 476.purrrIt is a rather complete and consistent set of tools for working with functions and vectors. Using purrr it should be possible to replace most loops with call to purr functions that will work faster.

       tibble is a new take on the data frame of core-R. It provides a new base type: tibbles.tibbleTibbles are in essence data frames, that do a little less (so there is less clutter on the screen and less unexpected things happen), but rather give more feedback (showwhat went wrong instead of assuming that you have read all manuals and remember everything). Tibbles are introduced in the next section.

       stringr expands the standard functions to work with strings and provides a nice coherent set of functions that all start with str_.stringiThe package is built on top of stringi, which uses the ICU library that is written in C, so it is fast too. For more information, see Chapter 17.5 “String Manipulation in the tidyverse” on page 299.stringr

       forcats provides tools to address common problems when working with categorical variables7.forcats

      7.2.2 The Non-core Tidyverse

      Besides the core tidyverse packages – that are loaded with the command library(tidyverse), there are many other packages that are part of the tidyverse. In this section we will describe briefly the most important ones.

       Importing data: readxl for .xls and .xlsx files) and haven for SPSS, Stata, and SAS data.8readxlxlsxxls

       Wrangling data: lubridate for dates and date-times, hms for time-of-day values, blob for storing binary data. lubridate –for example – is discussed in Chapter 17.6 “Dates with lubridate” on page 314.lubridatehmsblob

       Programming: purrr for iterating within R objects, magrittr provides the famous pipe, %>% command plus some more specialised piping operators (like %$% and %<>%), and glue provides an enhancement to the paste() function.purrrmagrittrpaste()glue

       Modelling: this is not really ready, though recipes and rsample are already operational and show the direction this is taking. The aim is to replace modelr 9. Note that there is also the package broom that turns models into tidy data.recipesrsamplemodelrbroom

      image Warning –Work in progress

      While the core-tidyverse is stable, the packages that are not core tend still to change and improve. Check their online documentation when using them.

      7.3.1 Tibbles

      x <- seq(from = 0, to = 2 * pi, length.out = 100) s <- sin(x) c <- cos(x) z <- s + c plot(x, z, type = “l”,col=“red”, lwd=7) lines(x, c, col = “blue”, lwd = 1.5) lines(x, s, col = “darkolivegreen”, lwd = 1.5)Graph depicts the sum of sine and cosine illustrated.

      Imagine further that our purpose is not only to plot these functions, but to use them in other applications. Then it would make sense to put them in a data, frame. The following code does exactly the same using a data frame.

      x <- seq(from = 0, to = 2 * pi, length.out = 100) #df <- as.data.frame((x)) df <- rbind(as.data.frame((x)),cos(x),sin(x), cos(x) + sin(x)) # plot etc.

      This is already more concise. With the tidyverse, it would look as follows (still without using the piping):

      library(tidyverse) x <- seq(from = 0, to = 2 * pi, length.out = 100) tb <- tibble(x, sin(x), cos(x), cos(x) + sin(x))

Schematic illustration of a tibble plots itself like a data-frame.

      The code with a tibble is just a notch shorter, but that is not the point here. Themain advantage in using a tibble is that it will usually do things that make more sense for the modern R-user. For example, consider how a tibble prints itself (compared to what a data frame does).

      # Note how concise and relevant the output is: print(tb) ## # A tibble: 100 x 4 ## x `sin(x)` `cos(x)` `cos(x) + sin(x)` ## <dbl> <dbl> <dbl> <dbl> ## 1 0 0 1 1 ## 2 0.0635 0.0634 0.998 1.06 ## 3 0.127 0.127 0.992 1.12 ## 4 0.190 0.189 0.982 1.17 ## 5 0.254 0.251 0.968 1.22 ## 6 0.317 0.312 0.950 1.26 ## 7 0.381 0.372 0.928 1.30 ## 8 0.444 0.430 0.903 1.33 ## 9 0.508 0.486 0.874 1.36 ## 10 0.571 0.541 0.841 1.38 ## # … with 90 more rows # This does the same as for a data-frame:

Скачать книгу