The Big R-Book. Philippe J. S. De Brouwer

Чтение книги онлайн.

Читать онлайн книгу The Big R-Book - Philippe J. S. De Brouwer страница 63

The Big R-Book - Philippe J. S. De Brouwer

Скачать книгу

44.192661 4700.86694 cor(d) ## mpg wt hp ## mpg 1.0000000 -0.8676594 -0.7761684 ## wt -0.8676594 1.0000000 0.6587479 ## hp -0.7761684 0.6587479 1.0000000

       var()

       cov()

       cor()

      cov2cor(cov(d)) ## mpg wt hp ## mpg 1.0000000 -0.8676594 -0.7761684 ## wt -0.8676594 1.0000000 0.6587479 ## hp -0.7761684 0.6587479 1.0000000

       cov2cor()

      8.3.2 8.3.2 The Spearman Correlation

      x <- c(-10:10) df <- data.frame(x=x, x_sq=x2, x_abs=abs(x), x_exp=exp(x)) cor(df) ## x x_sq x_abs x_exp ## x 1.000000 0.0000000 0.0000000 0.5271730 ## x_sq 0.000000 1.0000000 0.9671773 0.5491490 ## x_abs 0.000000 0.9671773 1.0000000 0.4663645 ## x_exp 0.527173 0.5491490 0.4663645 1.0000000

      The correlation between x and x2 is zero, and the correlation between x and exp(x) is a meagre 0.527173.

       correlation – Spearman

      The Spearman correlation is the correlation applied to the ranks of the data. It is one if an increase in the variable X is always accompanied with an increase in variable Y.

      cor(rank(df$x), rank(df$x_exp)) ## [1] 1

      The Spearman correlation checks for a relationship that can bemore general than only linear. It will be one if X increases when Y increases.

      image Question #10

      Consider the vectors

      1 x = c(1, 2, 33, 44) and y = c(22, 23, 100, 200),

      2 x = c(1 : 10) and y = 2 * x,

      3 x = c(1 : 10) and y = exp(x),

      Plot y in function of x. What is their Pearson correlation? What is their Spearman correlation? How do you understand that?

      image Warning – Correlation is more specific than relation

      Not even the Spearman correlation will discover all types of dependencies. Consider the example above with x2.

      x <- c(-10:10) cor(rank(x), rank(x2)) ## [1] 0

      8.3.3 Chi-square Tests

       test – chi square

      For example, we can build a dataset with observations on people’s ice-cream buying pattern and try to correlate the gender of a person with the flavour of the ice-cream they prefer. If a correlation is found, we can plan for appropriate stock of flavours by knowing the number of gender of people visiting.

      Chi-Square test in R

      Function use for chisq.test()

      chisq.test(data)

      where data is the data in form of a table containing the count value of the variables

      For example, we can use the mtcars dataset that is most probably loaded when R was initialised.

      # we use the dataset mtcars from MASS df <- data.frame(mtcars$cyl,mtcars$am) chisq.test(df) ## Warning in chisq.test(df): Chi-squared approximation may be incorrect ## ## Pearson’s Chi-squared test ## ## data: df ## X-squared = 25.077, df = 31, p-value = 0.7643

       chisq.test()

      The chi-square test reports a p-value. This p-value is the probability that the correlations is actually insignificant. It appears that in practice a correlation lower than 5% can be considered as insignificant. In this example, the p-value is higher than 0.05, so there is no significant correlation.

      The names of the functions related to statistical distributions in R are composed of two sections: the first letter refers to the function (in the following) and the remainder is the distribution name.

       d: The pdf (probability density function)

       p: The cdf (cumulative probability density function)

       q: The quantile function

       r: The random number generator.

       pdf

       probability density function

       cdf

       cumulative density function

       quantile function

       random

       distribution – normal

       distribution – exponential

       distribution – log-normal

       distribution – logistic

       distribution – geometric

       distribution – Poisson

       distribution – t

       distribution –

Скачать книгу