The Big R-Book. Philippe J. S. De Brouwer
Чтение книги онлайн.
Читать онлайн книгу The Big R-Book - Philippe J. S. De Brouwer страница 31
![The Big R-Book - Philippe J. S. De Brouwer The Big R-Book - Philippe J. S. De Brouwer](/cover_pre848614.jpg)
Apart from performance considerations, it might also be necessary to convert parts of a list to a vector, because some functions will expect vectors and will not work on lists.
4.3.7 Factors
Factors are the objects which hold a series of labels. They store the vector along with the distinct values of the elements in the vector as label. Factors are in many ways similar to the enum
data type in C, C++ or Java, here they aremainly used to store named constants. The labels are always of the character-type2 irrespective of data type of the elements in the input vector.
factors
4.3.7.1 Creating Factors
Factors are created using factor()
the function.
factor()
# Create a vector containing all your observations: feedback <- c(‘Good’,‘Good’,‘Bad’,‘Average’,‘Bad’,‘Good’) # Create a factor object: factor_feedback <- factor(feedback) # Print the factor object: print(factor_feedback) ## [1] Good Good Bad Average Bad Good ## Levels: Average Bad Good
From the aforementioned example it is clear that the factor-object “is aware” of all the labels for all observations as well as the different levels (or different labels) that exist. The next code fragment makes clear that some functions – such as plot()
– will recognize the factor-object and produce results that make sense for this type of object. The following line of code is enough to produce the output that is shown in Figure 4.1.
# Plot the histogram -- note the default order is alphabetic plot(factor_feedback)
Figure 4.1: The plot-function will result in a bar-chart for a factor-object.
There are a few specific functions for the factor-object. For example, the function nlevels()
returns the number of levels in the factor object.
nlevels()
# The nlevels function returns the number of levels: print(nlevels(factor_feedback)) ## [1] 3
Digression – The reduced importance of factors
When R was in its infancy, both computing power and memory were not at the level as today and in most cases it made sense to coerce strings to factors. For example, the base-R functions to load data in a data-frame (i.e. two dimensional data) will silently convert strings to factors. Today, that is most probably not what you need. Therefore, we recommend to make it a habit to use the functions from the tidyverse
(see Chapter 7 “Tidy R with the Tidyverse” on page 161).
4.3.7.2 Ordering Factors
In the example about creating a factor-object for feedback one will have noticed that the plotfunction does show the labels in alphabetical order and not in an order that for us – humans – would be logical. It is possible to coerce a certain order in the labels by providing the levels – in the correct order – while creating the factor-object.
feedback <- c(‘Good’,‘Good’,‘Bad’,‘Average’,‘Bad’,‘Good’) factor_feedback <- factor(feedback, levels=c(“Bad”,“Average”,“Good”)) plot(factor_feedback)
In Figure 4.2 on page 63 we notice that the order is now as desired (it is the order that we have provided via the attribute labels
in the function factor()
.
Generate Factors with the Function gl()
Function use for gl()
gl(n, k, length = n*k, labels = seq_len(n), ordered = FALSE) with
n: The number of levels
k: The number of replications (for each level)
length (optional): An integer giving the length of the result
labels (optional): A vector with the labels
ordered: A boolean variable indicating whether the results should be ordered.
gl()
gl(3,2,,c(“bad”,“average”,“good”),TRUE) ## [1] bad bad average average good good ## Levels: bad < average < good
Figure 4.2: The factor objects appear now in a logical order.
Use the dataset mtcars (from the library MASS) and explore the distribution of number of gears. Then explore the correlation between gears and transmission.
Then focus on the transmission and create a factor-object with the words “automatic” and “manual” instead of the numbers 0 and 1.
Use the ?mtcars
to find out the exact definition of the data.
mtcars
Use the dataset mtcars (fromthe libraryMASS) and explore the distribution of the horsepower (hp). How would you proceed to make a factoring (e.g. Low, Medium, High) for this attribute? Hint: Use the function cut()
.
cut()
4.3.8 Data Frames
4.3.8.1