Probability with R. Jane M. Horgan
Чтение книги онлайн.
Читать онлайн книгу Probability with R - Jane M. Horgan страница 15
![Probability with R - Jane M. Horgan Probability with R - Jane M. Horgan](/cover_pre848404.jpg)
demo()
which gives a list of all available demonstrations.
Demonstrations on specific topics can be obtained by inserting an argument. For example,
demo(plotmath)
gives some examples of the use of mathematical notation.
A more specific way of getting help, when working in the R environment, is to type the name of the function you require. For example,
help(read.table)
will provide details on the exact syntactic structure of the instruction read.table
.
An alternative is
?read.table
To obtain all that is available on a particular topic, use apropos
.
apropos ("boxplot")
returns
"boxplot", "boxplot.default", "boxplot.stats"
which are all of the objects that contain the word “boxplot.”
1.6 Data Entry
Before carrying out a statistical analysis, it is necessary to get the data into the computer. How you do this varies depending on the amount of data involved.
1.6.1 Reading and Displaying Data on Screen
A small data set, for example, a small set of repeated measurements on a single variable, may be entered directly from the screen. It is usually stored as a vector, which is essentially a list of numbers.
Example 1.1 Entering data from the screen to a vector
The total downtime occurring in the last month of 23 workstations in a computer laboratory was observed (in minutes) as follows:
To input these data from the screen environment of R, write
downtime <- c(0, 1, 2, 12, 12, 14, 18, 21, 21, 23, 24, 25, 28, 29, 30, 30, 30, 33, 36, 44, 45, 47, 51)
The construct
To view the contents of the vector, type
downtime
which will display all the values in the vector
R handles a vector as a single object. Calculations can be done with vectors like ordinary numbers provided they are the same length.
1.6.2 Reading Data from a File to a Data Frame
When the data set is large, it is better to set up a text file to store the data than to enter them directly from the screen.
A large data set is usually stored as a matrix, which consists of columns and rows. The columns denote the variables, while the rows are the observations on the variables. In R, this type of data set is stored in what is referred to as a data frame.
Definition 1.1 Data frame
A data frame is an object with rows and columns or equivalently it is a list of vectors of the same length. Each vector consists of repeated observations of some variable. The variables may be numbers, strings or factors.
Example 1.2 Reading data from a file into a data frame
The examination results for a class of 119 students pursuing a computing degree are given on our companion website (www.wiley.com/go/Horgan/probabilitywithr2e
) as a text file called
gender arch1 prog1 arch2 prog2 m 99 98 83 94 m NA NA 86 77 m 97 97 92 93 m 99 97 95 96 m 89 92 86 94 m 91 97 91 97 m 100 88 96 85 f 86 82 89 87 m 89 88 65 84 m 85 90 83 85 m 50 91 84 93 m 96 71 56 83 f 98 80 81 94 m 96 76 59 84 ....
The first row of the file contains the headings, gender and arch1, prog1, arch2, prog2, which are abbreviations for Architecture and Programming from Semester 1 and Semester 2, respectively. The remaining rows are the marks (%) obtained for each student. NA denotes that the marks are not available in this particular case.
The construct for reading this type of data into a data frame is read.table
.
results <- read.table ("F:/data/results.txt", header = T)
assuming that your data file
header = T
or equivalently header = TRUE
specifies that the first line is a header, in this case containing the names of the variables. Notice that the forward slash ((\)
which would be expected in the windows environment. The backslash has itself a meaning within R, and cannot be used in this context: / or \\
are used instead. Thus, we could have written
results <- read.table ("F:\\data\\results.txt", header = TRUE)
with the same effect.
The contents of the file results may be listed on screen by typing
results
which gives
gender arch1 prog1 arch2 prog2 1 m 99 98 83 94 2 m NA NA 86 77 3 m 97 97 92 93 4 m 99 97 95 96 5 m 89 92 86 94 6 m 91 97 91 97 7 m 100 88 96 85 8 f 86 82 89 87 9 m 89 88 65 84 10 m 85 90 83 85 11 m 50 91 84 93 12 m 96 71 56 83 13 f 98 80 81 94 14 m 96 76 59 84 ....
Notice that the gender variable is a factor with two levels “f” and “m,”while the remaining four variables are numeric. The figures in the first column on the left are the row numbers, and allows us to access individual elements in the data frame.
While we could list the entire data frame on the screen, this is inconvenient