Data Science For Dummies. Lillian Pierson

Чтение книги онлайн.

Читать онлайн книгу Data Science For Dummies - Lillian Pierson страница 29

Data Science For Dummies - Lillian Pierson

Скачать книгу

Any values that lie beyond these whiskers are outliers. Figure 4-7 shows outliers as they appear within a Tukey boxplot that was generated in Python.

Schematic illustration of spotting outliers with a Tukey boxplot.

      Credit: Python for Data Science Essential Training Part 1, LinkedIn.com

      Detecting outliers with multivariate analysis

      Sometimes outliers show up only within combinations of data points from disparate variables. These outliers wreak havoc on machine learning algorithms, so it’s important to detect and remove them. You can use multivariate analysis of outliers to do this. A multivariate approach to outlier detection involves considering two or more variables at a time and inspecting them together for outliers. You can use one of several methods, including:

       A scatter-plot matrix

       Boxplotting

       Density-based spatial clustering of applications with noise (DBScan) — as discussed in Chapter 5

       Principal component analysis (PCA, as shown in Figure 4-8)

Schematic illustration of using PCA to spot outliers.

      Credit: Python for Data Science Essential Training Part 2, LinkedIn.com

      FIGURE 4-8: Using PCA to spot outliers.

      A time series is just a collection of data on attribute values over time. Time series analysis is performed to predict future instances of the measure based on the past observational data. To forecast or predict future values from data in your dataset, use time series techniques.

      Identifying patterns in time series

Schematic illustration of a comparison of patterns exhibited by time series.

      FIGURE 4-9: A comparison of patterns exhibited by time series.

      Take a look at the solid lines shown earlier, in Figure 4-9. These represent the mathematical models used to forecast points in the time series. The mathematical models shown represent good, precise forecasts because they’re a close fit to the actual data. The actual data contains some random error, thus making it impossible to forecast perfectly.

      

For help getting started with time series within the context of the R programming language, be sure to visit the companion website to this book (http://businessgrowth.ai/), where you’ll find a free training and coding demonstration of time series data visualization in R.

      Modeling univariate time series data

      Similar to how multivariate analysis is the analysis of relationships between multiple variables, univariate analysis is the quantitative analysis of only one variable at a time. When you model univariate time series, you’re modeling time series changes that represent changes in a single variable over time.

      

To use the ARMA model for reliable results, you need to have at least 50 observations.

Schematic illustration of an example of an ARMA forecast model.

      FIGURE 4-10 An example of an ARMA forecast model.

      Конец ознакомительного фрагмента.

      Текст предоставлен ООО «ЛитРес».

      Прочитайте эту книгу целиком, купив полную легальную версию на ЛитРес.

      Безопасно оплатить книгу можно банковской картой Visa, MasterCard, Maestro, со счета мобильного телефона, с платежного терминала, в салоне МТС или Связной, через PayPal, WebMoney, Яндекс.Деньги, QIWI Кошелек, бонусными картами или другим удобным Вам способом.

/9j/4AAQSkZJRgABAQEBLAEsAAD/7SVgUGhvdG9zaG9wIDMuMAA4QklNBAQAAAAAAAccAgAAAgAA ADhCSU0EJQAAAAAAEOjxXPMvwRihontnrcVk1bo4QklNBDoAAAAAAS8AAAAQAAAAAQAAAAAAC3By aW50T3V0cHV0AAAABQAAAABQc3RTYm9vbAEAAAAASW50ZWVudW0AAAAASW50ZQAAAABDbHJtAAAA D3ByaW50U2l4dGVlbkJpdGJvb2wAAAAAC3ByaW50ZXJOYW1lVEVYVAAAACYATQBpAGMAcgBvAHMA b

Скачать книгу