The Big R-Book. Philippe J. S. De Brouwer
Чтение книги онлайн.
Читать онлайн книгу The Big R-Book - Philippe J. S. De Brouwer страница 57
![The Big R-Book - Philippe J. S. De Brouwer The Big R-Book - Philippe J. S. De Brouwer](/cover_pre848614.jpg)
Digression – Special characters in column names
Note the back-ticks in `sin(x)`
when the tibble reports on itself. That is of course because in R variables are not allowed to use brackets in their names. The tibble does allow in the names of columns non-R-compliant variable names. To address this column by name, we need to refer to the column by its number or use back-ticks.
tb$`sin(x)`[1] ## [1] 0
This convention is not specific to tibbles, it is used throughout R (e.g. the same back-ticks are needed in ggplot2, tidyr, dyplr, etc.).
Be aware of the saying “They have to recognize that great responsibility is an inevitable consequence of great power.”10 It is not because you can do something that you must. Indeed, you can use a numeric column names in a tibble and the following is valid code.
tb <- tibble(`1` = 1:3, `2` = sin(`1`), `1`*pi, 1*pi) tb ## # A tibble: 3 x 4 ## `1` `2` `\`1\` * pi` `1 * pi` ## <int> <dbl> <dbl> <dbl> ## 1 1 0.841 3.14 3.14 ## 2 2 0.909 6.28 3.14 ## 3 3 0.141 9.42 3.14
However, is this good practice?
So, why use a tibble instead of a data frame?
1 It will do less things (such as changing strings into factors, creating row names, change names of variables, no partial matching, but a warning message when you try to access a column that does not exist, etc.).
2 A tibble will report more errors instead of doing something silently (data type conversions, import, etc.), so they are safer to use.
3 The specific print function for the tibble, print.tibble(), will not overrun your screen with thousands of lines, it reports only on the ten first. If you need to see all columns, then the traditional head(tibble) will still work, or you can tweak the behaviour of the print function via the function options().print()head()
4 The name of the class itself is not confusing. Where the function print.data.frame() potentially can be the specific method for the print function for a data.frame, it can also be the specific method for the print.data function for a frame object. The name of the class tibble does not use the dot and hence cannot be confusing.
To illustrate some of these differences, consider the following code:
# -- data frame -- df <- data.frame(“value” = pi, “name” = “pi”) df$na # partial matching of column names ## [1] pi ## Levels: pi # automatic conversion to factor, plus data frame # accepts strings: df[,“name”] ## [1] pi ## Levels: pi df[,c(“name”, “value”)] ## name value ## 1 pi 3.141593 # -- tibble -- df <- tibble(“value” = pi, “name” = “pi”) df$name # column name ## [1] “pi” df$nam # no partial matching but error msg. ## Warning: Unknown or uninitialised column: ‘nam’. ## NULL df[,“name”] # this returns a tibble (no simplification) ## # A tibble: 1 x 1 ## name ## <chr> ## 1 pi df[,c(“name”, “value”)] # no conversion to factor ## # A tibble: 1 x 2 ## name value ## <chr> <dbl> ## 1 pi 3.14
This partial matching is one of the nicer functions of R, and certainly was an advantage for interactive use. However when using R in batch mode, thismight be dangerous. Partialmatching is especially dangerous in a corporate environment: datasets can have hundreds of columns and many names look alike, e.g. BAL180801, BAL180802, and BAL180803. Till a certain point it is safe to use partial matching since it will only work when R is sure that it can identify the variable uniquely. But it is bound to happen that you create new rows and suddenly someone else's code will stop working (because now R got confused).
Digression – Changing how a tibble is printed
To adjust the default behaviour of print on a tibble, run the function options
as follows:
options(
tibble.print_max=n, # If there are more than n
tibble.print_min=m, # rows, only print the m first
# (set n to Inf to show all)
tibble.width = l # max nbr of columns to print
# (set to Inf to show all)
)
options()
Tibbles are also data frames, and most older functions – that are unaware of tibbles – will work just fine. However, it may happen that some function would not work. If that happens, it is possible to coerce the tibble back into data frame with the function as.data.frame()
.
tb <- tibble(c(“a”, “b”, “c”), c(1,2,3), 9L,9) is.data.frame(tb) ## [1] TRUE # Note also that tibble did no conversion to factors, and # note that the tibble also recycles the scalars: tb ## # A tibble: 3 x 4 ## `c(“a”, “b”, “c”)` `c(1, 2, 3)` `9L` `9` ## <chr> <dbl> <int> <dbl> ## 1 a 1 9 9 ## 2 b 2 9 9 ## 3 c 3 9 9 # Coerce the tibble to data-frame: as.data.frame(tb) ## c(“a”, “b”, “c”) c(1, 2, 3) 9L 9 ## 1 a 1 9 9 ## 2 b 2 9 9 ## 3 c 3 9 9 # A tibble does not recycle shorter vectors, so this fails: fail <- tibble(c(“a”, “b”, “c”), c(1,2)) ## Error: Tibble columns must have consistent lengths, only values of length one are recycled: ## * Length 2: Column ‘c(1, 2)’ ## * Length 3: Column ‘c(“a”, “b”, “c”)’ # That is a major advantage and will save many programming errors.
The function view(tibble)
works as expected and is most useful when working with RStudio where it will open the tibble in a special tab.
While on the surface a tibble does the same as a data.frame, they have some crucial advantages and we warmly recommend to use them.
7.3.2 Piping with R
This section is not about creating beautiful music, it explains an argument passing system in R. Similar to the pipe in Linux, the pipe operator, |
, the operator %>%
from the package magrittr
allows to pass the output of one line to the first argument of the function on the next line.11