Читать онлайн книгу - The Big R-Book. Philippe J. S. De Brouwer. Математика. LiveLib

Новинки Лучшее Рекомендации

Информация о книге:

Название:

Автор:

Жанр:

Серия:

Издательство:

The Big R-Book - Philippe J. S. De Brouwer

Скачать книгу

style="font-size:15px;"> 3 3 A notable exception here is ggplot2 This package uses operator overloading instead of piping (overloading of the + operator).

4 4 Here we use the notation package1::function1() to make clear that the function1 is the one as defined in package1.

5 5 The standard functions to read in data are covered in Section 4.8 “Selected Data Interfaces” on page 75.

6 6 Rectangular data is data that – when printed – looks like a rectangle, for example movies and pictures are not rectangular data, while a CSV file or a database table are rectangular data.

7 7 Categorical variables are variables that have a fixed and known set of possible values. These values might or might not have a (strict) order relation. For example, “sex” (M or F) would not have an order, but salary brackets might have.

8 8 Of course, if you need something else you will want to use the package that does exactly what you want. Here are some good ones that adhere largely to the tidyverse philosophy: jsonlite for JSON, xml2 for XML, httr for web APIs, rvest for web scraping, DBI for relational databases—a good resources is http://db.rstudio.com.

9 9 The lack of coherent support for the modelling and reporting area makes clear that the tidyverse is not yet a candidate to service the whole development cycle of the company yet. Modelling departments might want to have a look at the tidymodels package.tidymodels

10 10 This quote is generally attributed to the Voltaire (pen-name of Jean-Marie Arouet; 1694–1778) and is published in the French National Convention of 8 May, 1793 (see con (1793) – page 72). After that many leaders and writers of comic books have used many variants of this phrase.

11 11 R's piping operator is very similar to the piping command that youmight know fromthe most of the CLI shells of popular *nix systems where messages like the following can go a long way: dmesg | grep “Bluetooth”, though differences will appear in more complicated commands.

12 12 The function lm() generates a linear model in R of the form . More information can be found in Section 21.1 “Linear Regression” on page 375. The functions summary() and coefficients() that are used on the following pages are also explained there.

♣8♣ Elements of Descriptive Statistics

statistics

8.1. Measures of Central Tendency

Ameasure of central tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data. As such, measures of central tendency are sometimes called measures of central location. They are also classed as summary statistics. The mean (often called the average) is most likely the measure of central tendency that you are most familiar with, but there are others, such as the median and the mode.

central tendency

measure – central tendency

The mean, median, and mode are all valid measures of central tendency, but under different conditions, some measures of central tendency become more appropriate to use than others. In the following sections, we will look at the mean, mode, and median, and learn how to calculate them and under what conditions they are most appropriate to be used.

8.1.1 Mean

mean

Probably the most used measure of central tendency is the “mean.” In this section we will start from the arithmetic mean, but illustrate some other concepts that might be more suited in some situations too.

central tendency – mean

8.1.1.1 The Arithmetic Mean

mean – arithmetic

The most popular type of mean is the “arithmetic mean.” It is the average of a set of numerical values; and it is calculated by adding those values first together and then dividing by the number of values in the aforementioned set.

mean – arithmetic

Definition: Arithmetic mean

(for discrete distributions)

(for continuous distributions)

The unbiased estimator of the mean for K observations x_k is:

mean

P()

probability

f()

Not surprisingly, the arithmetic mean in R is calculated by the function mean().

probability density function

mean()

This is a dispatcher function¹ and it will work in a meaningful way for a variety of objects, such as vectors, matrices, etc.

# The mean of a vector: x <- c(1,2,3,4,5,60) mean(x) ## [1] 12.5 # Missing values will block the override the result: x <- c(1,2,3,4,5,60,NA) mean(x) ## [1] NA # Missing values can be ignored with na.rm = TRUE: mean(x, na.rm = TRUE) ## [1] 12.5 # This works also for a matrix: M <- matrix(c(1,2,3,4,5,60), nrow=3)