Читать онлайн книгу - The Big R-Book. Philippe J. S. De Brouwer. Математика. LiveLib

Новинки Лучшее Рекомендации

Информация о книге:

Название:

Автор:

Жанр:

Серия:

Издательство:

The Big R-Book - Philippe J. S. De Brouwer

Скачать книгу

44.192661 4700.86694 cor(d) ## mpg wt hp ## mpg 1.0000000 -0.8676594 -0.7761684 ## wt -0.8676594 1.0000000 0.6587479 ## hp -0.7761684 0.6587479 1.0000000

var()

cov()

cor()

cov2cor(cov(d)) ## mpg wt hp ## mpg 1.0000000 -0.8676594 -0.7761684 ## wt -0.8676594 1.0000000 0.6587479 ## hp -0.7761684 0.6587479 1.0000000

cov2cor()

8.3.2 8.3.2 The Spearman Correlation

The measure for correlation, as defined in previous section, actually tests for a linear relation. This means that even the presence of a strong non-linear relationship can go undetected.

x <- c(-10:10) df <- data.frame(x=x, x_sq=x^∧2, x_abs=abs(x), x_exp=exp(x)) cor(df) ## x x_sq x_abs x_exp ## x 1.000000 0.0000000 0.0000000 0.5271730 ## x_sq 0.000000 1.0000000 0.9671773 0.5491490 ## x_abs 0.000000 0.9671773 1.0000000 0.4663645 ## x_exp 0.527173 0.5491490 0.4663645 1.0000000

The correlation between x and x² is zero, and the correlation between x and exp(x) is a meagre 0.527173.

correlation – Spearman

The Spearman correlation is the correlation applied to the ranks of the data. It is one if an increase in the variable X is always accompanied with an increase in variable Y.

cor(rank(df$x), rank(df$x_exp)) ## [1] 1

The Spearman correlation checks for a relationship that can bemore general than only linear. It will be one if X increases when Y increases.

Question #10

Consider the vectors

1 x = c(1, 2, 33, 44) and y = c(22, 23, 100, 200),

2 x = c(1 : 10) and y = 2 * x,

3 x = c(1 : 10) and y = exp(x),

Plot y in function of x. What is their Pearson correlation? What is their Spearman correlation? How do you understand that?

Warning – Correlation is more specific than relation

Not even the Spearman correlation will discover all types of dependencies. Consider the example above with x².

x <- c(-10:10) cor(rank(x), rank(x^∧2)) ## [1] 0

8.3.3 Chi-square Tests

Chi-square test is a statistical method to determine if two categorical variables have a significant correlation between them. Both those variables should be from same population, and they should be categorical like “Yes/No,” “Male/Female,” “Red/Amber/Green,” etc.

test – chi square

For example, we can build a dataset with observations on people’s ice-cream buying pattern and try to correlate the gender of a person with the flavour of the ice-cream they prefer. If a correlation is found, we can plan for appropriate stock of flavours by knowing the number of gender of people visiting.

Chi-Square test in R

Function use for chisq.test()

chisq.test(data)

where data is the data in form of a table containing the count value of the variables

For example, we can use the mtcars dataset that is most probably loaded when R was initialised.

# we use the dataset mtcars from MASS df <- data.frame(mtcars$cyl,mtcars$am) chisq.test(df) ## Warning in chisq.test(df): Chi-squared approximation may be incorrect ## ## Pearson’s Chi-squared test ## ## data: df ## X-squared = 25.077, df = 31, p-value = 0.7643

chisq.test()

The chi-square test reports a p-value. This p-value is the probability that the correlations is actually insignificant. It appears that in practice a correlation lower than 5% can be considered as insignificant. In this example, the p-value is higher than 0.05, so there is no significant correlation.

8.4. Distributions

R is a statistical language and most of thework in R will include statistics. Therefore we introduce the reader to how statistical distributions are implemented in R and how they can be used.

The names of the functions related to statistical distributions in R are composed of two sections: the first letter refers to the function (in the following) and the remainder is the distribution name.

d: The pdf (probability density function)

p: The cdf (cumulative probability density function)

q: The quantile function

r: The random number generator.

pdf

probability density function

cdf

cumulative density function

quantile function

random

distribution – normal

distribution – exponential

distribution – log-normal

distribution – logistic

distribution – geometric

distribution – Poisson

distribution – t

distribution –

Скачать книгу

В начало
<
58
59
60
61
62
63
64
65
66
67
>
В конец

The Big R-Book. Philippe J. S. De Brouwer

Чтение книги онлайн.

Читать онлайн книгу The Big R-Book - Philippe J. S. De Brouwer страница 63

Информация о книге:

8.3.2 8.3.2 The Spearman Correlation

Question #10

Warning – Correlation is more specific than relation

8.3.3 Chi-square Tests

Chi-Square test in R

Function use for chisq.test()

8.4. Distributions

The Big R-Book. Philippe J. S. De Brouwer

Чтение книги онлайн.

Читать онлайн книгу The Big R-Book - Philippe J. S. De Brouwer страница 63

Информация о книге:

8.3.2 8.3.2 The Spearman Correlation

Question #10

Warning – Correlation is more specific than relation

8.3.3 Chi-square Tests

Chi-Square test in R Function use for chisq.test()

8.4. Distributions

Chi-Square test in R

Function use for chisq.test()