Probability with R. Jane M. Horgan
Чтение книги онлайн.
Читать онлайн книгу Probability with R - Jane M. Horgan страница 21
![Probability with R - Jane M. Horgan Probability with R - Jane M. Horgan](/cover_pre848404.jpg)
skew(downtime)
gives
[1] -0.04818095
which indicates that the
Looking again at the data given Example 2.1, let us calculate the skewness coefficient
skew(usage) [1] 1.322147
which illustrates that the data is highly skewed. Recall that the first two values are outliers in the sense that they are very much larger than the other values in the data set. If we calculate the skewness with those values removed, we get
skew(usage[3:9]) [1] 0.4651059
which is very much smaller than that obtained with the full set.
2.4.2 Scripts
There are various ways of developing programs in R.
The most useful way of writing programs is by means of R 's own built‐in editor called
When you want to execute a line or group of lines, highlight them and press Ctrl/R, that is, Ctrl and the letter R simultaneously. The commands are then transferred to the control window and executed.
Alternatively, if the program is short, it may be developed interactively while working at your computer.
Programs may also be developed in a text editor, like Notepad, saved with the .R extension and retrieved using the source
statement.
source("C:\\test")
retrieves the program named test.R from the C directory. Another way of doing this, while working in R, is to click on
Exercises 2.1
1 For the class of 50 students of computing detailed in Exercise 1.1, use R to:obtain the summary statistics for each gender, and for the entire class;calculate the deciles for each gender and for the entire class;obtain the skewness coefficient for the females and for the males.
2 It is required to estimate the number of message buffers in use in the main memory of the computer system at Power Products Ltd. To do this, 20 programs were run, and the number of message buffers in use were found to beCalculate the average number of buffers used. What is the standard deviation? Would you say these data are skewed?
3 To get an idea of the runtime of a particular server, 20 jobs were processed and their execution times (in seconds) were observed as follows:Examine these data and calculate appropriate measures of central tendency and dispersion.
4 Ten data sets were used to run a program and measure the execution time. The results (in milliseconds) were observed as follows:Use appropriate measures of central tendency and dispersion to describe these data.
5 The following data give the amount of time (in minutes) in one day spent on Facebook by each of 15 students.Obtain appropriate measures of central tendency and measures of dispersion for these data.
2.5 Project
Write the skewness program, and use it to calculate the skewness coefficient of the four examination subjects in results.txt. What can you say about these data?
Pearson has given an approximate formula for the skewness that is easier to calculate than the exact formula given in Equation 2.1.
Write a program to calculate this, and apply it to the data in results.txt. Is it a reasonable approximation?
3 Graphical Displays
In addition to numerical summaries of statistical data, there are various pictorial representations and graphical displays available in R that have a more dramatic impact and make for a better understanding of the data. The ease and speed with which graphical displays can be produced is one of the important features of R. By writing
demo(graphics)
you will see examples of the many graphical procedures of R, along with the code needed to implement them. In this chapter, we will examine some of the most common of these.
3.1 BOXPLOTS
A boxplot is a graphical summary based on the median, quartiles, and extreme values. To display the downtime data given in Example 1.1 using a boxplot, write
boxplot(downtime)
which gives Fig. 3.1. Often called the Box and Whiskers Plot, the box represents the interquartile range that contains 50% of cases. The whiskers are the lines that extend from the box to the highest and lowest values. The line across the box indicates the median.
Figure 3.1 A Simple Boxplot
To improve the look of the graph, we could label the axes as follows:
boxplot(downtime, xlab = "Downtime", ylab = "Minutes")
which gives Fig. 3.2.
Figure 3.2 A Boxplot with Axis Labels
Multiple boxplots can be displayed on the same axis, by adding extra arguments to the boxplot function. For example,
boxplot(results$arch1, results$arch2, xlab = "Architecture Semesters 1 and 2")
or simply
boxplot(arch1,