The Big R-Book. Philippe J. S. De Brouwer

Чтение книги онлайн.

Читать онлайн книгу The Big R-Book - Philippe J. S. De Brouwer страница 61

The Big R-Book - Philippe J. S. De Brouwer

Скачать книгу

      image Hint – Outliers

      The mean is highly influenced by the outliers. To mitigate this to some extend the parameter trim allows to remove the tails. It will sort all values and then remove the x% smallest and x% largest observations.

      v <- c(1,2,3,4,5,6000) mean(v) ## [1] 1002.5 mean(v, trim = 0.2) ## [1] 3.5

      8.1.1.2 Generalised Means

       mean – generalized

      More generally, a mean can be defined as follows:

      Definition: f-mean

equation

      f(x) = x : arithmetic mean,

      images: harmonic mean,

      f(x) = xm: power mean,

      f(x) = lnx : geometric mean, images

       arithmetic mean

       mean – harmonic

       harmonic mean

       mean – power

       power mean

       mean – geometric

       geometric mean

      The Power Mean

      One particular generalized mean is the power mean or Hölder mean. It is defined for a set of K positive numbers xk by

equation

       holder mean

       mean – holder

      by choosing particular values for m one can get the quadratic, arithmetic, geometric and harmonic means.

       mean – quadratic

      m → ∞: maximum of xk

      m = 2: quadratic mean

      m = 1: arithmetic mean

      m → 0: geometric mean

      m = 1: harmonic mean

      m → −∞: minimum of xk

      Example: Whichmeanmakes most sense?

      returns <- c(0.5,-0.5,0.5,-0.5) # Arithmetic mean: aritmean <- mean(returns) # The ln-mean: log_returns <- returns for(k in 1:length(returns)) { log_returns[k] <- log( returns[k] + 1) } logmean <- mean(log_returns) exp(logmean) - 1 ## [1] -0.1339746 # What is the value of the investment after these returns: V_0 <- 1 V_T <- V_0 for(k in 1:length(returns)) { V_T <- V_T * (returns[k] + 1) } V_T ## [1] 0.5625 # Compare this to our predictions: ## mean of log-returns V_0 * (exp(logmean) - 1) ## [1] -0.1339746 ## mean of returns V_0 * (aritmean + 1) ## [1] 1

      8.1.2 The Median

       median

      While the mean (and the average in particular) is widely used, it is actually quite vulnerable to outliers. It would therefore, make sense to have a measure that is less influenced by the outliers and rather answers the question: what would a typical observation look like. The median is such measure.

       central tendency – median

      The median is the middle-value so that 50% of the observations are lower and 50% are higher.

      x <- c(1:5,5e10,NA) x ## [1] 1e+00 2e+00 3e+00 4e+00 5e+00 5e+10 NA median(x) # no meaningful result with NAs ## [1] NA median(x,na.rm = TRUE) # ignore the NA ## [1] 3.5 # Note how the median is not impacted by the outlier, # but the outlier dominates the mean: mean(x, na.rm = TRUE) ## [1] 8333333336

      8.1.3 The Mode

       mode

       central tendency – mode

      In R, the function mode() or storage.mode() returns a character string describing how a variable is stored. In fact, R does not have a standard function to calculate mode, so let us create our own:

       mode()

       storage.mode()

      # my_mode # Finds the first mode (only one) # Arguments: # v -- numeric vector or factor # Returns: # the first mode my_mode <- function(v) { uniqv <- unique(v) tabv <- tabulate(match(v, uniqv)) uniqv[which.max(tabv)] } # now test this function x <- c(1,2,3,3,4,5,60,NA) my_mode(x) ## [1] 3 x1 <-

Скачать книгу