Читать онлайн книгу - The Big R-Book. Philippe J. S. De Brouwer. Математика. LiveLib

Новинки Лучшее Рекомендации

Информация о книге:

Название:

Автор:

Жанр:

Серия:

Издательство:

The Big R-Book - Philippe J. S. De Brouwer

Скачать книгу

<- 2 * x + 4 + rnorm(10, mean=0, sd=0.5)) %>% lm(y ~ x) ## Error in as.data.frame.default(data): cannot coerce class ““formula”” to a data.frame

The aforementioned code fails. This is because R will not automatically add something like data = t and use the “t” as far as defined till the line before. The function lm() expects as first argument the formula, where the pipe command would put the data in the first argument. Therefore, magrittr provides a special pipe operator that basically passes on the variables of the data frame of the line before, so that they can be addressed directly: the %$%.

# The Tidyverse only makes the %>% pipe available. So, to use the # special pipes, we need to load magrittr library(magrittr) ## ## Attaching package: ‘magrittr’ ## The following object is masked from ‘package:purrr’: ## ## set_names ## The following object is masked from ‘package:tidyr’: ## ## extract lm2 <- tibble(“x” = runif(10)) %>% within(y <- 2 * x + 4 + rnorm(10, mean=0,sd=0.5)) %$% lm(y ~ x) summary(lm2) ## ## Call: ## lm(formula = y ~ x) ## ## Residuals: ## Min 1Q Median 3Q Max ## -0.6101 -0.3534 -0.1390 0.2685 0.8798 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 4.0770 0.3109 13.115 1.09e-06 *** ## x 2.2068 0.5308 4.158 0.00317 ** ## --- ## Signif. codes: ## 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 ## ## Residual standard error: 0.5171 on 8 degrees of freedom ## Multiple R-squared: 0.6836,Adjusted R-squared: 0.6441 ## F-statistic: 17.29 on 1 and 8 DF, p-value: 0.003174

This can be elaborated further:

coeff <- tibble(“x” = runif(10)) %>% within(y <- 2 * x + 4 + rnorm(10, mean=0,sd=0.5)) %$% lm(y ~ x) %>% summary %>% coefficients coeff ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 4.131934 0.2077024 19.893534 4.248422e-08 ## x 1.743997 0.3390430 5.143882 8.809194e-04

Note – Using functions without brackets

Note how we can omit the brackets for functions that do not take any argument.

7.3.4.2 The T-Pipe

This works nice, but now imagine that we want to keep “t” as the tibble, but still add some operations on it – for example plot it. In that case, there is the special %T>% “T-pipe” that will rather pass on the left side of the expression than the right side. The output of the code below is the plot in Figure 7.3 on page 136.

Graph depicts a linear model fit on generated data to illustrate the piping command.

Figure 7.3: A linear model fit on generated data to illustrate the piping command.

library(magrittr) t <- tibble(“x” = runif(100)) %>% within(y <- 2 * x + 4 + rnorm(10, mean=0, sd=0.5)) %T>% plot(col=“red”) # The function plot does not return anything # so we used the %T>% pipe. lm3 <- t %$% lm(y ~ x) %T>% # pass on the linear model summary %T>% # further pass on the linear model coefficients tcoef <- lm3 %>% coefficients # we anyhow need the coefficients # Add the model (the solid line) to the previous plot: abline(a = tcoef[1], b=tcoef[2], col=“blue”, lwd=3)

7.3.4.3 The Assignment Pipe

This last variation of the pipe operator allows us to simplify the first line, by providing an assignment with a special piping operator.

x <- c(1,2,3) # The following line: x <- x %>% mean # is equivalent with the following: x %<>% mean # Show x: x ## [1] 2

Note that the original meaning of “x” is gone.

Warning – Assignment pipe

We recommend to use this pipe operator only when no confusion is possible. We also argue that this pipe operator makes code less readable, while not really making the code shorter.

7.3.5 Conclusion

When you come from a background of compiled languages that provides fine graded control over memory management (such as C or C++), you might not directly see the need for pipes that much. However, it does reduce the amount of text that needs to be typed and makes the code more readable.

Indeed, the piping operator will not provide a speed increase nor memory advantage even if we would create a new variable at every line. R has a pretty good memory management and it does only copy columns when they are really modified. For example, have a careful look at the following:

library(pryr) x <- runif(100) object_size(x) ## 840 B y <- x # x and y together do not take more memory than only x. object_size(x,y) ## 840 B y <- y * 2 # Now, they are different and are stored separately in memory. object_size(x,y) ## 1.68 kB

The piping operator can be confusing at first and is not really necessary (unless to read code that is using it). However, it has the advantage to make code more readable – once used to it – and it also makes code shorter. Finally, it allows the reader of the code to focus more on what is going on (the actions instead of the data, since that is passed over invisibly).

Hint – Use pipes sparingly

Pipes are as spices in the kitchen. Use them, but do so with moderation. A good rule of thumb is that five lines is enough, and simple one-line commands do not need to be broken down in more lines in order to use a pipe.

Notes

1 1 According to the Tiobe-index (see https://www.tiobe.com/tiobe-index), R is the 14th most popular programming language and still on the rise.

2 2 More information can be found in this article of Hadley Wickham: https://tidyverse.tidyverse.org/articles/manifesto.html.