The Big R-Book. Philippe J. S. De Brouwer

Чтение книги онлайн.

Читать онлайн книгу The Big R-Book - Philippe J. S. De Brouwer страница 57

The Big R-Book - Philippe J. S. De Brouwer

Скачать книгу

TRUE

      Digression – Special characters in column names

      tb$`sin(x)`[1] ## [1] 0

      This convention is not specific to tibbles, it is used throughout R (e.g. the same back-ticks are needed in ggplot2, tidyr, dyplr, etc.).

      image Hint

      tb <- tibble(`1` = 1:3, `2` = sin(`1`), `1`*pi, 1*pi) tb ## # A tibble: 3 x 4 ## `1` `2` `\`1\` * pi` `1 * pi` ## <int> <dbl> <dbl> <dbl> ## 1 1 0.841 3.14 3.14 ## 2 2 0.909 6.28 3.14 ## 3 3 0.141 9.42 3.14

      However, is this good practice?

      So, why use a tibble instead of a data frame?

      1 It will do less things (such as changing strings into factors, creating row names, change names of variables, no partial matching, but a warning message when you try to access a column that does not exist, etc.).

      2 A tibble will report more errors instead of doing something silently (data type conversions, import, etc.), so they are safer to use.

      3 The specific print function for the tibble, print.tibble(), will not overrun your screen with thousands of lines, it reports only on the ten first. If you need to see all columns, then the traditional head(tibble) will still work, or you can tweak the behaviour of the print function via the function options().print()head()

      4 The name of the class itself is not confusing. Where the function print.data.frame() potentially can be the specific method for the print function for a data.frame, it can also be the specific method for the print.data function for a frame object. The name of the class tibble does not use the dot and hence cannot be confusing.

      To illustrate some of these differences, consider the following code:

      # -- data frame -- df <- data.frame(“value” = pi, “name” = “pi”) df$na # partial matching of column names ## [1] pi ## Levels: pi # automatic conversion to factor, plus data frame # accepts strings: df[,“name”] ## [1] pi ## Levels: pi df[,c(“name”, “value”)] ## name value ## 1 pi 3.141593 # -- tibble -- df <- tibble(“value” = pi, “name” = “pi”) df$name # column name ## [1] “pi” df$nam # no partial matching but error msg. ## Warning: Unknown or uninitialised column: ‘nam’. ## NULL df[,“name”] # this returns a tibble (no simplification) ## # A tibble: 1 x 1 ## name ## <chr> ## 1 pi df[,c(“name”, “value”)] # no conversion to factor ## # A tibble: 1 x 2 ## name value ## <chr> <dbl> ## 1 pi 3.14

      This partial matching is one of the nicer functions of R, and certainly was an advantage for interactive use. However when using R in batch mode, thismight be dangerous. Partialmatching is especially dangerous in a corporate environment: datasets can have hundreds of columns and many names look alike, e.g. BAL180801, BAL180802, and BAL180803. Till a certain point it is safe to use partial matching since it will only work when R is sure that it can identify the variable uniquely. But it is bound to happen that you create new rows and suddenly someone else's code will stop working (because now R got confused).

      Digression – Changing how a tibble is printed

       options(

       tibble.print_max=n, # If there are more than n

       tibble.print_min=m, # rows, only print the m first

       # (set n to Inf to show all)

       tibble.width = l # max nbr of columns to print

       # (set to Inf to show all)

       )

       options()

      Tibbles are also data frames, and most older functions – that are unaware of tibbles – will work just fine. However, it may happen that some function would not work. If that happens, it is possible to coerce the tibble back into data frame with the function as.data.frame().

      tb <- tibble(c(“a”, “b”, “c”), c(1,2,3), 9L,9) is.data.frame(tb) ## [1] TRUE # Note also that tibble did no conversion to factors, and # note that the tibble also recycles the scalars: tb ## # A tibble: 3 x 4 ## `c(“a”, “b”, “c”)` `c(1, 2, 3)` `9L` `9` ## <chr> <dbl> <int> <dbl> ## 1 a 1 9 9 ## 2 b 2 9 9 ## 3 c 3 9 9 # Coerce the tibble to data-frame: as.data.frame(tb) ## c(“a”, “b”, “c”) c(1, 2, 3) 9L 9 ## 1 a 1 9 9 ## 2 b 2 9 9 ## 3 c 3 9 9 # A tibble does not recycle shorter vectors, so this fails: fail <- tibble(c(“a”, “b”, “c”), c(1,2)) ## Error: Tibble columns must have consistent lengths, only values of length one are recycled: ## * Length 2: Column ‘c(1, 2)’ ## * Length 3: Column ‘c(“a”, “b”, “c”)’ # That is a major advantage and will save many programming errors.

      image Hint – Viewing the content of a tibble

      The function view(tibble) works as expected and is most useful when working with RStudio where it will open the tibble in a special tab.

      While on the surface a tibble does the same as a data.frame, they have some crucial advantages and we warmly recommend to use them.

      7.3.2 Piping with R

      

Скачать книгу