Probability with R. Jane M. Horgan
Чтение книги онлайн.
Читать онлайн книгу Probability with R - Jane M. Horgan страница 23
![Probability with R - Jane M. Horgan Probability with R - Jane M. Horgan](/cover_pre848404.jpg)
to get Fig. 3.10. The ylim = c(0, 35)
ensures that the
Figure 3.10 Histogram of Each Subject in Each Semester
Up until now, we have invoked the default parameters of the histogram, notably the bin widths are equal and the frequency in each bin is calculated. These parameters may be changed as appropriate. For example, you may want to specify the bin break‐points to represent the failures and the various classes of passes and honors.
bins <- c(0, 40, 60, 80, 100)hist(prog1, xlab ="Marks (%)", main = "Programming Semester 1", breaks = bins)
yields Fig. 3.11.
Figure 3.11 A Histogram with Breaks of a Specified Width
In Fig. 3.11, observe that the
To get a histogram of percentages, write in R
h <- hist(prog1, plot = FALSE, breaks = 5) #this postpones the plot display h$density <- h$counts/sum(h$counts)*100 #this calculates percentages plot(h, xlab = "Marks (%)", freq = FALSE, ylab = "Percentage", main = "Programming Semester 1")
The output is given in Fig. 3.12. The # allows for a comment. Anything written after # is ignored.
Figure 3.12 Histogram with Percentages
3.3 STEM AND LEAF
The stem and leaf diagram is a more modern way of displaying data than the histogram. It is a depiction of the shape of the data using the actual numbers observed. Similar to the histogram, the stem and leaf gives the frequencies of categories of the variable, but it goes further than that and gives the actual values in each category.
The marks obtained in Programming in Semester 1 are depicted as a stem and leaf diagram using
stem(prog1)
which yields Fig. 3.13.
The decimal point is 1 digit(s) to the right of the | 1 | 2344 1 | 59 2 | 11 2 | 5556777889999 3 | 0113 3 | 6 4 | 00000000 4 | 6779 5 | 12223344 5 | 56679 6 | 0011123444 6 | 566777888999 7 | 0112344 7 | 5666666899 8 | 001112222334 8 | 5678899 9 | 0122 9 | 7778
FIGURE 3.13 A Stem and Leaf Diagram
From Fig. 3.13, we are able to see the individual observations, as well as the shape of the data as a whole. Notice that there are many marks of exactly 40, whereas just one student obtains a mark between 35 and 40. One wonders if this has anything to do with the fact that 40 is a pass, and that the examiner has been generous to borderline students. This point would go unnoticed with a histogram.
3.4 SCATTER PLOTS
Plots of data are useful to investigate relationships between variables. To examine, for example, the relationship between the performance of students in Programming in Semesters 1 and 2, we could write
plot(prog1, prog2, xlab = "Programming Semester 1", ylab = "Programming Semester 2")
to obtain Fig. 3.14.
Figure 3.14 A Scatter Plot
When more than two variables are involved, R provides a facility for producing scatter plots of all possible pairs.
To do this, first create a data frame of all the variables that you want to compare.
courses <- results[2:5]
This creates a data frame
pairs(courses)
or equivalently
pairs(results[2:5])
will generate Fig. 3.15, which, as you can see, gives scatter plots for all possible pairs.
Figure 3.15 Use of the
3.5 THE LINE OF BEST FIT
Returning to Fig. 3.14, we can see that there is a
In the case of the Programming subjects, we have a set of points (