Читать онлайн книгу - The Big R-Book. Philippe J. S. De Brouwer. Математика. LiveLib

Новинки Лучшее Рекомендации

Информация о книге:

Название:

Автор:

Жанр:

Серия:

Издательство:

The Big R-Book - Philippe J. S. De Brouwer

Скачать книгу

frames are the prototype of all two-dimensional data (also known as “rectangular data”). For statistical analysis this is obviously an important data-type.

data frame

rectangular data

Data frames are very useful for statistical modelling; they are objects that contain data in a tabular way. Unlike a matrix in data frame each column can contain different types of data. For example, the first column can be factorial, the second logical, and the third numerical. It is a composite data type consisting of a list of vectors of equal length.

Data frames are created using the data.frame() function.

data.frame()

# Create the data frame. data_test <- data.frame( Name = c(“Piotr”, “Pawel”,“Paula”,“Lisa”,“Laura”), Gender = c(“Male”, “Male”,“Female”, “Female”,“Female”), Score = c(78,88,92,89,84), Age = c(42,38,26,30,35) ) print(data_test) ## Name Gender Score Age ## 1 Piotr Male 78 42 ## 2 Pawel Male 88 38 ## 3 Paula Female 92 26 ## 4 Lisa Female 89 30 ## 5 Laura Female 84 35 # The standard plot function on a data-frame (Figure 4.3) # with the pairs() function: plot(data_test)

pairs()

Figure 4.3: The standard plot for a data frame in R shows each column printed in function of each other. This is useful to see correlations or how generally the data is structured.

4.3.8.2 Accessing Information from a Data Frame

Most data is rectangular, and in almost any analysis we will encounter data that is structured in a data frame. The following functions can be helpful to extract information from the data frame, investigate its structure and study the content.

summary()

head()

tail()

# Get the structure of the data frame: str(data_test) ## ‘data.frame’: 5 obs. of 4 variables: ## $ Name : Factor w/ 5 levels “Laura”,“Lisa”,..: 5 4 3 2 1 ## $ Gender: Factor w/ 2 levels “Female”,“Male”: 2 2 1 1 1 ## $ Score : num 78 88 92 89 84 ## $ Age : num 42 38 26 30 35 # Note that the names became factors (see warning below) # Get the summary of the data frame: summary(data_test) ## Name Gender Score Age ## Laura:1 Female:3 Min. :78.0 Min. :26.0 ## Lisa :1 Male :2 1st Qu.:84.0 1st Qu.:30.0 ## Paula:1 Median :88.0 Median :35.0 ## Pawel:1 Mean :86.2 Mean :34.2 ## Piotr:1 3rd Qu. :89.0 3rd Qu.:38.0 ## Max. :92.0 Max. :42.0 # Get the first rows: head(data_test) ## Name Gender Score Age ## 1 Piotr Male 78 42 ## 2 Pawel Male 88 38 ## 3 Paula Female 92 26 ## 4 Lisa Female 89 30 ## 5 Laura Female 84 35 # Get the last rows: tail(data_test) ## Name Gender Score Age ## 1 Piotr Male 78 42 ## 2 Pawel Male 88 38 ## 3 Paula Female 92 26 ## 4 Lisa Female 89 30 ## 5 Laura Female 84 35 # Extract the column 2 and 4 and keep all rows data_test.1 <- data_test[,c(2,4)] print(data_test.1) ## Gender Age ## 1 Male 42 ## 2 Male 38 ## 3 Female 26 ## 4 Female 30 ## 5 Female 35 # Extract columns by name and keep only selected rows data_test[c(2:4),c(2,4)] ## Gender Age ## 2 Male 38 ## 3 Female 26 ## 4 Female 30

Warning – Avoiding conversion to factors

The default behaviour of R is to convert strings to factors when a data.frame is created. Decades ago this was useful for performance reasons. Now, this is usually unwanted behaviour.^a To avoid this put stringsAsFactors = FALSE in the data.frame() function.

d <- data.frame( Name = c(“Piotr”, “Pawel”,“Paula”,“Lisa”,“Laura”), Gender = c(“Male”, “Male”,“Female”, “Female”,“Female”), Score = c(78,88,92,89,84), Age = c(42,38,26,30,35), stringsAsFactors = FALSE ) d$Gender <- factor(d$Gender) # manually factorize gender str(d) ## ‘data.frame’: 5 obs. of 4 variables: ## $ Name : chr “Piotr” “Pawel” “Paula” “Lisa” … ## $ Gender: Factor w/ 2 levels “Female”,“Male”: 2 2 1 1 1 ## $ Score : num 78 88 92 89 84 ## $ Age : num 42 38 26 30 35

4.3.8.3 Editing Data in a Data Frame

While one usually reads in large amounts of data and uses an IDE such as RStudio that facilitates the visualization and manual modification of data frames, it is useful to know how this is done when no graphical interface is available. Even when working on a server, all these functions will always be available.

de()

data.entry()

edit()

de(x) # fails if x is not defined de(x <- c(NA)) # works x <- de(x <- c(NA)) # will also save the changes data.entry(x) # de is short for data.entry x <- edit(x) # use the standard editor (vi in *nix)

Of course, there are also multiple ways to address data directly in R.

# The following lines do the same. data_test$Score[1] <- 80 data_test[3,1] <- 80

4.3.8.4 Modifying Data Frames

Add Columns to a Data-frame

Typically, the variables are in the columns and adding a column corresponds to adding a new, observed variable. This is done via the function cbind().

cbind()

# Expand the data frame, simply define the additional column: data_test$End_date <- as.Date(c(“2014-03-01”, “2017-02-13”, “2014-10-10”, “2015-05-10”,“2010-08-25”)) print(data_test) ## Name Gender Score Age End_date ## 1 Piotr Male 80 42 2014-03-01 ## 2 Pawel Male 88 38 2017-02-13 ## 3 <NA> Female 92 26 2014-10-10 ## 4 Lisa Female 89 30 2015-05-10 ## 5 Laura Female 84 35 2010-08-25 # Or use the function

Скачать книгу

В начало
<
27
28
29
30
31
32
33
34
35
36
>
В конец

The Big R-Book. Philippe J. S. De Brouwer

Чтение книги онлайн.

Читать онлайн книгу The Big R-Book - Philippe J. S. De Brouwer страница 32

Информация о книге:

4.3.8.2 Accessing Information from a Data Frame

Warning – Avoiding conversion to factors

4.3.8.3 Editing Data in a Data Frame

4.3.8.4 Modifying Data Frames

Add Columns to a Data-frame

The Big R-Book. Philippe J. S. De Brouwer

Чтение книги онлайн.

Читать онлайн книгу The Big R-Book - Philippe J. S. De Brouwer страница 32

Информация о книге:

4.3.8.2 Accessing Information from a Data Frame

Warning – Avoiding conversion to factors

4.3.8.3 Editing Data in a Data Frame

4.3.8.4 Modifying Data Frames Add Columns to a Data-frame

4.3.8.4 Modifying Data Frames

Add Columns to a Data-frame