Читать онлайн книгу - Probability with R. Jane M. Horgan. Математика. LiveLib

Новинки Лучшее Рекомендации

Информация о книге:

Название:

Автор:

Жанр:

Серия:

Издательство:

Скачать книгу

63.57 83.25 100.00 3.00

An entire data frame may be summarized by using the summary command. Let us do this in the data frame

. First, it is wise to make a declaration about the categorical variable gender.

gender <- factor(gender)

designates the variable gender as a factor, and ensures that it is treated as such in the summary function.

summary(results) gender arch1 prog1 arch2 prog2 f: 19 Min. : 3.00 Min. :12.00 Min. : 6.00 Min. : 5.00 m:100 1st Qu.: 46.75 1st Qu.:40.00 1st Qu.:40.00 1st Qu.:30.00 Median : 68.50 Median :64.00 Median :48.00 Median :57.00 Mean : 63.57 Mean :59.02 Mean :51.97 Mean :53.78 3rd Qu.: 83.25 3rd Qu.:78.00 3rd Qu.:61.00 3rd Qu.:76.50 Max. :100.00 Max. :98.00 Max. :98.00 Max. :97.00 NA's : 3.00 NA's : 2.00 NA's : 4.00 NA's : 8.00

Notice how the display for gender is different than that for the other variables; we are simply given the frequency for each gender.

2.4 Programming in R

One of the great benefits of R is that it is possible to write your own programs and use them as functions in your analysis. Programming is extremely simple in R because of the way it handles vectors and data frames. To illustrate, let us write a program to calculate the mean of

. The formula for the mean of a variable

with

values is given by

In standard programming languages, implementing this formula would necessitate initialization and loops, but with R, statistical calculations such as these are much easier to implement. For example,

sum(downtime)

gives

576

which is the sum of the elements in

length(downtime)

gives

gives the number of elements in

To calculate the mean, write

meandown <- sum(downtime)/length(downtime) meandown [1] 25.04348

Let us also look at how to calculate the standard deviation of the data in

The formula for the standard deviation of

data points stored in an

vector is

We illustrate step by step how this is calculated for

First, subtract the mean from each data point.

downtime - meandown [1] -25.04347826 -24.04347826 -23.04347826 -13.04347826 -13.04347826 [6] -11.04347826 -7.04347826 -4.04347826 -4.04347826 -2.04347826 [11] -1.04347826 -0.04347826 3.95652174 2.95652174 4.95652174 [16] 4.95652174 4.95652174 7.95652174 10.95652174 18.95652174 [21] 19.95652174 21.95652174 25.95652174

Then, obtain the squares of these differences.

(downtime - meandown)^2 [1] 6.271758e+02 5.780888e+02 5.310019e+02 1.701323e+02 1.701323e+02 [6] 1.219584e+02 4.961059e+01 1.634972e+01 1.634972e+01 4.175803e+00 [11] 1.088847e+00 1.890359e-03 1.565406e+01 8.741021e+00 2.456711e+01 [16] 2.456711e+01 2.456711e+01 6.330624e+01 1.200454e+02 3.593497e+02 [21] 3.982628e+02 4.820888e+02 6.737410e+02

Sum the squared differences.

sum((downtime - meandown)^2) [1] 4480.957

Finally, divide this sum by length(downtime)‐1 and take the square root.

sqrt(sum((downtime -meandown)^2)/(length(downtime)-1)) [1] 14.27164

You will recall that R has built‐in functions to calculate the most commonly used statistical measures. You will also recall that the mean and the standard deviation can be obtained directly with

mean(downtime) [1] 25.04348 sd(downtime) [1] 14.27164

We took you through the calculations to illustrate how easy it is to program in R.

2.4.1 Creating Functions

Occasionally, you might require some statistical functions that are not available in R. You will need to create your own function. Let us take, as an example, the skewness coefficient, which measures how much the data differ from symmetry.

The skewness coefficient is defined as

(2.1)

A perfectly symmetrical set of data will have a skewness of 0; when the skewness coefficient is substantially greater than 0, the data are positively asymmetric with a long tail to the right, and a negative skewness coefficient means that data are negatively asymmetric with a long tail to the left. As a rule of thumb, if the skewness is outside the interval

, the data are considered to be highly skewed. If it is between

1 and

0.5 or 0.5 and 1, the data are moderately skewed.

Example 2.2 A program to calculate skewness

The following syntax calculates the skewness coefficient of a set of data and assigns it to a function called

that has one argument

skew <- function(x) { xbar <- mean(x) sum2 <- sum((x-xbar)^2, na.rm = T) sum3 <- sum((x-xbar)^3, na.rm = T) skew <- (sqrt(length(x))* sum3)/(sum2^(1.5)) skew}

You will agree that the conventions of vector calculations make it very easy to calculate statistical functions.

When skew has been defined, you can calculate the skewness on any data set. For example,

Скачать книгу

Probability with R. Jane M. Horgan

Чтение книги онлайн.

Читать онлайн книгу Probability with R - Jane M. Horgan страница 20

Информация о книге:

2.4 Programming in R

2.4.1 Creating Functions

Example 2.2 A program to calculate skewness