The Big R-Book. Philippe J. S. De Brouwer

Чтение книги онлайн.

Читать онлайн книгу The Big R-Book - Philippe J. S. De Brouwer страница 55

The Big R-Book - Philippe J. S. De Brouwer

Скачать книгу

S3 class types; this means that in general functions will accept data frames (or tibbles). More low-level functions will work with the base R vector types.

       Reuse data structures in your code. The idea here is that there is a better option than always over-writing a variable or create a new one in every line: pass on the output of one line to the next with a “pipe”: %>%. To be accepted in the tidyverse, the functions in a package need to be able to use this pipe.3pipe

       Keep functions concise and clear. For example, do not mix side-effects and transformations, function names should be verbs where ever possible (unless they become too generic or meaningless of course), and keep functions short (they do only one thing, but do it well).

       Embrace R as a functional programming language. This means that reflexes that youmight have from say C++, C#, python, PHP, etc., will have to be mended. This means for example that it is best to use immutable objects and copy-on-modify semantics and avoid using the refclass model (see Section 6.4 “The Reference Class, refclass, RC or R5 Model” on page 113). Use where possible the generic functions provided by S3 and S4. Avoid writing loops (such as repeat and for but use the apply family of functions (or refer to the package purrr).

       Keep code clean and readable for humans. For example, prefer meaningful but long variable names over short but meaningless ones, be considerate towards people using auto-complete in RStudio (so add an id in the first and not last letters of a function name), etc.

      Tidyverse is in permanent development as core R itself and many other packages. For further and most up-to-date information we refer to the website of the Tidyverse: http://tidyverse.tidyverse.org.

      Tidy Data

      Tidy data is in essence data that is easy to understand by people and is formatted and structured with the following rules in mind.

      1 a tibble/data-frame for each dataset,

      2 a column for each variable,

      3 a row for each observation,

      4 a value (or NA) in each cell (a “cell” is the intersection between row and column).

      The concept of tidy data is so important that we will devote a whole section to tidy data (Section 17.2 “Tidy Data” on page 275) and how to make data tidy (Chapter 17 “Data Wrangling in the tidyverse” on page 265). For now, it is sufficient to have the previous rules in mind. This will allow us to introduce the tools of the tidyverse first and then later come back to making data tidy by using these tools.

      Tidy Conventions

      For example, we remember the convention that R uses to implement it is S3 object oriented programming framework from Chapter 6.2 “S3 Objects” on page 91. In that section we have explained how R finds for example the right method (function) to use when printing an object via the generic dispatcher function print(). When an object of class “glm” is passed to print(), then the function will dispatch the handling to the function print.glm().

      However, this is also true for data-frames: the handling is dispatched to print.data.frame(). This example illustrate how at this point it becomes unclear if the function print.data.frame() is the specific case for a data.frame for the print() function or if it is the special case to print a “frame” in the framework of a call to “print.data().” Therefore, the tidyverse recommends naming conventions to avoid the dot ( .). And use the snake_style or UpperCase style instead.

      image Further information – Tidyverse philosophy

      More about programming style in the tidyverse can be found in the online manifesto of the tidyverse website: https://tidyverse.tidyverse.org/articles/manifesto.html.

       tidyverse

      # we assume that you installed the package before: # install.packages(“tidyverse”) # so load it: library(tidyverse) ## - Attaching packages ----------- tidyverse 1.3.0 - ## v ggplot2 3.2.1 v purrr 0.3.3 ## v tibble 2.1.3 v dplyr 0.8.3 ## v tidyr 1.0.0 v stringr 1.4.0 ## v readr 1.3.1 v forcats 0.4.0 ## - Conflicts ------------- tidyverse_conflicts() - ## x purrr::compose() masks pryr::compose() ## x dplyr::filter() masks stats::filter() ## x dplyr::lag() masks stats::lag() ## x purrr::partial() masks pryr::partial()

      So, loading the library tidyverse, loads actually a series of other packages. The collection of these packages are called “core-tidyverse.”

       filter()

      Digression – Calling methods of not loaded packages

      When a package is not loaded, it is still possible to call its member functions. To call a function from a certain package, we can use the :: operator.

      In other words, when we use the :: operator, we specify in which package this function should be found. Therefore it is possible to use a function froma package that is not loaded or is superseded by a function with the same name from a package that got loaded later.

      R allows you to stand on the shoulders of giants: when making your analysis, you can rely on existing packages. It is best to use packages that are part of the tidyverse, whenever there is choice. Doing so, your code can be more consistent, readable, and it will become overall a more satisfying experience to work with R.

      7.2.1 The Core Tidyverse

      The core tidyverse includes some packages that are commonly used in data wrangling and modelling. Here is a word of explanation already. Later we will explore some of those packages more in detail.

       tidyr provides a set of functions that help you get to tidy up data and make adhering to the rules of tidy data easier.tidyrThe idea of tidy data is really simple: it is data where every variable has its own column, and every column is a variable. For more information, see Chapter 17.3 “Tidying Up Data with tidyr” on page 277.

       dplyr provides a grammar of data manipulation, providing a consistent set of verbs that solve the most common data manipulation

Скачать книгу