Probability with R. Jane M. Horgan

Чтение книги онлайн.

Читать онлайн книгу Probability with R - Jane M. Horgan страница 26

Probability with R - Jane M. Horgan

Скачать книгу

∼ x_train))

c03f018

      The coefficients of the line are obtained in R with

      lm(formula = y_train ∼ x_train) Coefficients: (Intercept) x_train -0.9764 4.9959

      The estimated values images are calculated in R as follows:

      y_est <- - 0.9764 + 4.9959 * x_test round(y_est, 1)

      which gives

      y_est 41.5 46.0 26.0 57.5 31.5 50.5 62.5 54.0 76.0 13.0

      We now compare these estimated values with the observed values.

      y_test 49.4 43.0 19.3 56.4 28.7 53.7 58.1 54.0 80.7 13.6plot(x_test, y_test, main = "Testing Data", font.main = 1) abline(lm(y_train ∼ x_train)) # plot the line of best fit segments(x_test, y_test, x_test, y_est)

      Figure 3.19 shows the observed values, images, along with the values estimated from the line, images. The vertical lines illustrate the differences between them. A decision has to be made then as to whether or not the line is a “good fit” or whether an alternative model should be investigated.

c03f019
Values in the Testing Set

       Determine if there is a relationship between the dependent variable and the independent variables;

       Fit the model to the training data;

       Test the suitability of the model by predicting the ‐values in the testing data from the model and by comparing the observed and predicted ‐values.

      The predictions from these models assumes that the trend, based on the data analyzed, continues to exist. Should the trend change, for example, when a house pricing model is estimated from data before an economic crash, the predictions will not be valid.

      Regression analysis is just one of the many techniques from the area of Probability and Statistics that machine learning invokes. We will encounter more in later chapters. Should you wish to go into this topic more deeply, we recommend the book, A First Course in Machine Learning by Girolami (2015).

Скачать книгу

Data Set 1 Data Set 2 Data Set 3 Data Set 4
x1 y1 x2 y2 x3 y3 x4 y4
10 8.04 10 9.14 10 7.46 8 6.58
8 6.95 8 8.14 8 6.77 8 5.76
13 7.58 13 8.74 13 12.74 8 7.71
9 8.81 9 8.77 9 7.11 8 8.84
11 8.33 11 9.26 11 7.81 8 8.47
14 9.96 14 8.10 14 8.84 8 7.04
6 7.24 6 6.13 6 6.08 8 5.25
4 4.26 4 3.10 4 5.39 19 12.50
12 10.84