Machine Learning for Time Series Forecasting with Python. Francesca Lazzeri
Чтение книги онлайн.
Читать онлайн книгу Machine Learning for Time Series Forecasting with Python - Francesca Lazzeri страница 6
Figure 1.1: Example of time series forecasting applied to the energy load use case
This first chapter of the book is dedicated to the conceptual introduction—with some practical examples—of time series, where you can learn the essential aspects of time series representations, modeling, and forecasting.
Specifically, we will discuss the following:
Flavors of Machine Learning for Time Series Forecasting – In this section, you will learn a few standard definitions of important concepts, such as time series, time series analysis, and time series forecasting, and discover why time series forecasting is a fundamental cross-industry research area.
Supervised Learning for Time Series Forecasting – Why would you want to reframe a time series forecasting problem as a supervised learning problem? In this section you will learn how to reshape your forecasting scenario as a supervised learning problem and, as a consequence, get access to a large portfolio of linear and nonlinear machine learning algorithms.
Python for Time Series Forecasting – In this section we will look at different Python libraries for time series data and how libraries such as pandas, statsmodels, and scikit-learn can help you with data handling, time series modeling, and machine learning, respectively.
Experimental Setup for Time Series Forecasting – This section will provide you general advice for setting up your Python environment for time series forecasting.
Let's get started and learn some important elements that we must consider when describing and modeling a time series.
Flavors of Machine Learning for Time Series Forecasting
In this first section of Chapter 1, we will discover together why time series forecasting is a fundamental cross-industry research area. Moreover, you will learn a few important concepts to deal with time series data, perform time series analysis, and build your time series forecasting solutions.
One example of the use of time series forecasting solutions would be the simple extrapolation of a past trend in predicting next week hourly temperatures. Another example would be the development of a complex linear stochastic model for predicting the movement of short-term interest rates. Time-series models have been also used to forecast the demand for airline capacity, seasonal energy demand, and future online sales.
In time series forecasting, data scientists' assumption is that there is no causality that affects the variable we are trying to forecast. Instead, they analyze the historical values of a time series data set in order to understand and predict their future values. The method used to produce a time series forecasting model may involve the use of a simple deterministic model, such as a linear extrapolation, or the use of more complex deep learning approaches.
Due to their applicability to many real-life problems, such as fraud detection, spam email filtering, finance, and medical diagnosis, and their ability to produce actionable results, machine learning and deep learning algorithms have gained a lot of attention in recent years. Generally, deep learning methods have been developed and applied to univariate time series forecasting scenarios, where the time series consists of single observations recorded sequentially over equal time increments (Lazzeri 2019a).
For this reason, they have often performed worse than naïve and classical forecasting methods, such as exponential smoothing and autoregressive integrated moving average (ARIMA). This has led to a general misconception that deep learning models are inefficient in time series forecasting scenarios, and many data scientists wonder whether it's really necessary to add another class of methods, such as convolutional neural networks (CNNs) or recurrent neural networks (RNNs), to their time series toolkit (we will discuss this in more detail in Chapter 5, “Introduction to Neural Networks for Time Series Forecasting”) (Lazzeri 2019a).
In time series, the chronological arrangement of data is captured in a specific column that is often denoted as time stamp, date, or simply time. As illustrated in Figure 1.2, a machine learning data set is usually a list of data points containing important information that are treated equally from a time perspective and are used as input to generate an output, which represents our predictions. On the contrary, a time structure is added to your time series data set, and all data points assume a specific value that is articulated by that temporal dimension.
Figure 1.2: Machine learning data set versus time series data set
Now that you have a better understanding of time series data, it is also important to understand the difference between time series analysis and time series forecasting. These two domains are tightly related, but they serve different purposes: time series analysis is about identifying the intrinsic structure and extrapolating the hidden traits of your time series data in order to get helpful information from it (like trend or seasonal variation—these are all concepts that we will discuss later on in the chapter).
Data scientists usually leverage time series analysis for the following reasons:
Acquire clear insights of the underlying structures of historical time series data.
Increase the quality of the interpretation of time series features to better inform the problem domain.
Preprocess and perform high-quality feature engineering to get a richer and deeper historical data set.
Time series analysis is used for many applications such as process and quality control, utility studies, and census analysis. It is usually considered the first step to analyze and prepare your time series data for the modeling step, which is properly called time series forecasting.
Time series forecasting involves taking machine learning models, training them on historical time series data, and consuming them to forecast future predictions. As illustrated in Figure 1.3, in time series forecasting that future output is unknown, and it is based on how the machine learning model is trained on the historical input data.
Figure 1.3: Difference between time series analysis historical input data and time series forecasting output data
Different historical and current phenomena may affect the values of your data in a time series, and these events are diagnosed as components of a time series. It is very important to recognize these