The Big R-Book. Philippe J. S. De Brouwer

Чтение книги онлайн.

Читать онлайн книгу The Big R-Book - Philippe J. S. De Brouwer страница 62

The Big R-Book - Philippe J. S. De Brouwer

Скачать книгу

“N/A”, “undesired”, “great”, “great”) my_mode(x1) ## [1] “great” # text from https://www.r-project.org/about.html t <- “R is available as Free Software under the terms of the Free Software Foundation’s GNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS.” v <- unlist(strsplit(t,split=” “)) my_mode(v) ## [1] “and”

       unique()

       Linux

       FreeBSD

       tabulate()

       uniqv()

      While this function works fine on the examples provided, it only returns the first mode encountered. In general, however, the mode is not necessarily unique and it might make sense to return them all. This can be done by modifying the code as follows:

      # my_mode # Finds the mode(s) of a vector v # Arguments: # v -- numeric vector or factor # return.all -- boolean -- set to true to return all modes # Returns: # the modal elements my_mode <- function(v, return.all = FALSE) { uniqv <- unique(v) tabv <- tabulate(match(v, uniqv)) if (return.all) { uniqv[tabv == max(tabv)] } else { uniqv[which.max(tabv)] } } # example: x <- c(1,2,2,3,3,4,5) my_mode(x) ## [1] 2 my_mode(x, return.all = TRUE) ## [1] 2 3

      image Hint – Use default values to keep code backwards compatible

       measures of spread

      Definition: Variance

equation

       variance

      8.2.1 Standard Deviation

      Definition: Standard deviation

equation

       spread – standard deviation

       standard deviation

      The estimator for standard deviation is:

equation

      t <- rnorm(100, mean=, sd=20) var(t) ## [1] 248.2647 sd(t) ## [1] 15.75642 sqrt(var(t)) ## [1] 15.75642 sqrt(sum((t - mean(t))2)/(length(t) - 1)) ## [1] 15.75642

       sd()

      8.2.2 Median absolute deviation

      Definition: mad

equation

       mad

       median absolute deviation

      mad(t) ## [1] 14.54922 mad(t,constant=1) ## [1] 9.813314

       mad()

equation

      for Xi distributed as N(μ, σ2) and large n.

       covariation

      The basic measure for linear interdependence is covariance, defined as

equation

      8.3.1 8.3.1 The Pearson Correlation

      An important metric for linear relationship is the Pearson correlation coefficient ρ.

       correlation – Pearson

       Definition: Pearson Correlation Coefficient

equation

      cor(mtcars$hp,mtcars$wt) ## [1] 0.6587479

       cor()

      Of course, we also have functions that provide the covariance matrix and functions that convert the one in the other.

      d <- data.frame(mpg = mtcars$mpg, wt = mtcars$wt, hp = mtcars$hp) # Note that we can feed a whole data-frame in the functions. var(d) ## mpg wt hp ## mpg 36.324103 -5.116685 -320.73206 ## wt -5.116685 0. 957379 44.19266 ## hp -320.732056 44.192661 4700.86694 cov(d) ## mpg wt hp ## mpg 36.324103 -5.116685 -320.73206 ##

Скачать книгу