Читать онлайн книгу - Probability with R. Jane M. Horgan. Математика. LiveLib

Новинки Лучшее Рекомендации

Информация о книге:

Название:

Автор:

Жанр:

Серия:

Издательство:

Скачать книгу

lm(prog2∼prog1)

calculates what is referred to as the linear model (lm) of on , or simply the line

that best fits the data.

The output is

Call: lm(formula = prog2∼prog1) Coefficients: (Intercept) prog1 -5.455 0.960

Therefore, the line that best fits these data is

To draw this line on the scatter diagram, write

plot(prog2, prog1) abline(lm(prog2∼prog1))

which gives Fig. 3.16.

Figure 3.16 The Line of Best Fit

The line of best fit may be used to make predictions. For example, we might be able to predict how students will do in Semester 2 from the results that they obtained in Semester 1. If the mark on Programming 1 for a particular student is 70, that student would be expected to do well also in Programming 2, estimated to obtain images . A student doing badly in Programming 1, 30 say, would also be expected to do badly in Programming 2. images . These predictions may not be exact but, if the linear trend is strong and past trends continue, they will be reasonably close.

A word of warning is appropriate here. The estimated values are based on the assumption that the past trend continues. This may not always be the case. For example, students who do badly in Semester 1, may get such a shock that they work harder in Semester 2, and change the pattern. Similarly, students getting high marks in Semester 1 may be lulled into a sense of false security and take it easy in Semester 2. Consequently, they may not do as well as expected. Hence, the Semester 1 trends may not continue, and the model may no longer be valid.

3.6 MACHINE LEARNING AND THE LINE OF BEST FIT

Machine learning is the science of getting computer systems to use algorithms and statistical models to study patterns and learn from data. Supervised learning is the machine learning task of using past data to learn a function in order to predict a future output.

The line of best fit is one of the many techniques that machine learning has borrowed from the field of Probability and Statistics to “train” the machine to make predictions. In this case of what is also known as the simple linear regression line in statistics, a set of pairs images of data is obtained, images is referred to as the independent variable, and images is the dependent variable. The objective is to estimate images from images . The line of best fit, images , is obtained by choosing the intercept images and slope images so that the sum of the squared distances from the observed images to the estimated images is minimized. The algebraic details of the derivations of images and images are given in Appendix B.

Often, the data for supervised learning are randomly divided into two parts, one for training and the other for testing. In machine learning, we derive the line of best fit from the training set

The testing set is used to see how well the line actually fits. Usually, an images breakdown of the data is made, the 80% is used for “training,” that is, to obtain the line, and the 20% is used to decide if the line really fits the data, and to ascertain if the model is appropriate for future predictions. The model is updated as new data become available.

Example 3.1

Suppose there are 50 pairs images of observations available for obtaining the line that best fits the data in order to predict images from images . The data are randomly divided into the training set and testing set, using 40 observations for training (Table 3.1), and 10 for testing (Table 3.2).

TABLE 3.1 The Training Set

Observation Numbers			Observation Numbers
1	11.8	31.3	21	15.1	80.1
2	10.8	59.9	22	14.7	66.9
3	8.6	27.6	23	10.5	42.0 Скачать книгу В начало < 19 20 21 22 23 24 25 26 27 28 > В конец e-mail: [email protected]

Probability with R. Jane M. Horgan

Чтение книги онлайн.

Читать онлайн книгу Probability with R - Jane M. Horgan страница 24

Информация о книге:

3.6 MACHINE LEARNING AND THE LINE OF BEST FIT

Example 3.1