Machine Learning for Time Series Forecasting with Python. Francesca Lazzeri
Чтение книги онлайн.
Читать онлайн книгу Machine Learning for Time Series Forecasting with Python - Francesca Lazzeri страница 7
As illustrated in Figure 1.4, there are four main categories of components in time series analysis: long-term movement or trend, seasonal short-term movements, cyclic short-term movements, and random or irregular fluctuations.
Figure 1.4: Components of time series
Let's have a closer look at these four components:
Long-term movement or trend refers to the overall movement of time series values to increase or decrease during a prolonged time interval. It is common to observe trends changing direction throughout the course of your time series data set: they may increase, decrease, or remain stable at different moments. However, overall you will see one primary trend. Population counts, agricultural production, and items manufactured are just some examples of when trends may come into play.
There are two different types of short-term movements:Seasonal variations are periodic temporal fluctuations that show the same variation and usually recur over a period of less than a year. Seasonality is always of a fixed and known period. Most of the time, this variation will be present in a time series if the data is recorded hourly, daily, weekly, quarterly, or monthly. Different social conventions (such as holidays and festivities), weather seasons, and climatic conditions play an important role in seasonal variations, like for example the sale of umbrellas and raincoats in the rainy season and the sale of air conditioners in summer seasons.Cyclic variations, on the other side, are recurrent patterns that exist when data exhibits rises and falls that are not of a fixed period. One complete period is a cycle, but a cycle will not have a specific predetermined length of time, even if the duration of these temporal fluctuations is usually longer than a year. A classic example of cyclic variation is a business cycle, which is the downward and upward movement of gross domestic product around its long-term growth trend: it usually can last several years, but the duration of the current business cycle is unknown in advance.As illustrated in Figure 1.5, cyclic variations and seasonal variations are part of the same short-term movements in time series forecasting, but they present differences that data scientists need to identify and leverage in order to build accurate forecasting models:Figure 1.5: Differences between cyclic variations versus seasonal variations
Random or irregular fluctuations are the last element to cause variations in our time series data. These fluctuations are uncontrollable, unpredictable, and erratic, such as earthquakes, wars, flood, and any other natural disasters.
Data scientists often refer to the first three components (long-term movements, seasonal short-term movements, and cyclic short-term movements) as signals in time series data because they actually are deterministic indicators that can be derived from the data itself. On the other hand, the last component (random or irregular fluctuations) is an arbitrary variation of the values in your data that you cannot really predict, because each data point of these random fluctuations is independent of the other signals above, such as long-term and short-term movements. For this reason, data scientists often refer to it as noise, because it is triggered by latent variables difficult to observe, as illustrated in Figure 1.6.
Figure 1.6: Actual representation of time series components
Data scientists need to carefully identify to what extent each component is present in the time series data to be able to build an accurate machine learning forecasting solution. In order to recognize and measure these four components, it is recommended to first perform a decomposition process to remove the component effects from the data. After these components are identified and measured, and eventually utilized to build additional features to improve the forecast accuracy, data scientists can leverage different methods to recompose and add back the components on forecasted results.
Understanding these four time series components and how to identify and remove them represents a strategic first step for building any time series forecasting solution because they can help with another important concept in time series that may help increase the predictive power of your machine learning algorithms: stationarity. Stationarity means that statistical parameters of a time series do not change over time. In other words, basic properties of the time series data distribution, like the mean and variance, remain constant over time. Therefore, stationary time series processes are easier to analyze and model because the basic assumption is that their properties are not dependent on time and will be the same in the future as they have been in the previous historical period of time. Classically, you should make your time series stationary.
There are two important forms of stationarity: strong stationarity and weak stationarity. A time series is defined as having a strong stationarity when all its statistical parameters do not change over time. A time series is defined as having a weak stationarity when its mean and auto-covariance functions do not change over time.
Alternatively, time series that exhibit changes in the values of their data, such as a trend or seasonality, are clearly not stationary, and as a consequence, they are more difficult to predict and model. For accurate and consistent forecasted results to be received, the nonstationary data needs to be transformed into stationary data. Another important reason for trying to render a time series stationary is to be able to obtain meaningful sample statistics such as means, variances, and correlations with other variables that can be used to get more insights and better understand your data and can be included as additional features in your time series data set.
However, there are cases where unknown nonlinear relationships cannot be determined by classical methods, such as autoregression, moving average, and autoregressive integrated moving average methods. This information can be very helpful when building machine learning models, and it can be used in feature engineering and feature selection processes. In reality, many economic time series are far from stationary when visualized in their original units of measurement, and even after seasonal adjustment they will typically still exhibit trends, cycles, and other nonstationary characteristics.
Time series forecasting involves developing and using a predictive model on data where there is an ordered relationship between observations. Before data scientists get started with building their forecasting solution, it is highly recommended to define the following forecasting aspects:
The inputs and outputs of your forecasting model – For data scientists who are about to build a forecasting solution, it is critical to think about the data they have available to make the forecast and what they want to forecast about the future. Inputs are historical time series data provided to feed the model in order to make a forecast about future values. Outputs are the prediction results for a future time step. For example, the last seven days of energy consumption data collected by sensors in an electrical grid is considered input data, while the predicted values of energy consumption to forecast for the next day are defined as output data.
Granularity