Machine Learning for Time Series Forecasting with Python. Francesca Lazzeri
Чтение книги онлайн.
Читать онлайн книгу Machine Learning for Time Series Forecasting with Python - Francesca Lazzeri страница 11
Previously and in Figure 1.8, I used examples of univariate time series: these are data sets where only a single variable is observed at each time, such as energy load at each hour. However, the sliding window method can be applied to a time series data set that includes more than one historical variable observed at each time step and when the goal is to predict more than one variable in the future: this type of time series data set is called multivariate time series (I will discuss this concept in more detail later in this book).
We can reframe this time series data set as a supervised learning problem with a window width of one. This means that we will use the prior time step values of Value 1 and Value 2. We will also have available the next time step value for Value 1. We will then predict the next time step value of Value 2. As illustrated in Figure 1.9, this will give us three input features and one output value to predict for each training pattern.
Figure 1.9: Multivariate time series as supervised learning problem
In the example of Figure 1.9, we were predicting two different output variables (Value 1 and Value 2), but very often data scientists need to predict multiple time steps ahead for one output variable. This is called multi-step forecasting. In multi-step forecasting, data scientists need to specify the number of time steps ahead to be forecasted, also called forecast horizon in time series. Multi-step forecasting usually presents two different formats:
One-step forecast: When data scientists need to predict the next time step (t + 1)
Multi-step forecast: When data scientists need to predict two or more (n) future time steps (t + n)
For example, demand forecasting models predict the quantity of an item that will be sold the following week and the following two weeks given the sales up until the current week. In the stock market, given the stock prices up until today one can predict the stock prices for the next 24 hours and 48 hours. Using a weather forecasting engine, one can predict the weather for the next day and for the entire week (Brownlee 2017).
The sliding window method can be applied also on a multi-step forecasting solution to transform it into a supervised learning problem. As illustrated in Figure 1.10, we can use the same univariate time series data set from Figure 1.8 as an example, and we can structure it as a two-step forecasting data set for supervised learning with a window width of one (Brownlee 2017).
Figure 1.10: Univariate time series as multi-step supervised learning
As illustrated in Figure 1.10, data scientists cannot use the first row (time stamp 01/01/2020) and the second to last row (time stamp 01/04/2020) of this sample data set to train their supervised model; hence we suggest removing it. Moreover, this new version of this supervised data set only has one variable Value x that data scientists can exploit to predict the last row (time stamp 01/05/2020) of both Value y and Value y2.
In the next section you will learn about different Python libraries for time series data and how libraries such as pandas, statsmodels, and scikit-learn can help you with data handling, time series modeling, and machine learning, respectively.
Originally developed for financial time series such as daily stock market prices, the robust and flexible data structures in different Python libraries can be applied to time series data in any domain, including marketing, health care, engineering, and many others. With these tools you can easily organize, transform, analyze, and visualize your data at any level of granularity—examining details during specific time periods of interest and zooming out to explore variations on different time scales, such as monthly or annual aggregations, recurring patterns, and long-term trends.
Python for Time Series Forecasting
In this section, we will look at different Python libraries for time series data and how libraries such as pandas, statsmodels, and scikit-learn can help you with data handling, time series modeling, and machine learning, respectively. The Python ecosystem is the dominant platform for applied machine learning.
The primary rationale for adopting Python for time series forecasting is that it is a general-purpose programming language that you can use both for experimentation and in production. It is easy to learn and use, primarily because the language focuses on readability. Python is a dynamic language and very suited to interactive development and quick prototyping with the power to support the development of large applications.
Python is also widely used for machine learning and data science because of the excellent library support, and it has a few libraries for time series, such as NumPy, pandas, SciPy, scikit-learn, statsmodels, Matplotlib, datetime, Keras, and many more. Below we will have a closer look at the time series libraries in Python that we will use in this book:
SciPy: SciPy is a Python-based ecosystem of open-source software for mathematics, science, and engineering. Some of the core packages are NumPy (a base n-dimensional array package), SciPy library (a fundamental library for scientific computing), Matplotlib (a comprehensive library for 2-D plotting), IPython (an enhanced interactive console), SymPy (a library for symbolic mathematics), and pandas (a library for data structure and analysis). Two SciPy libraries that provide a foundation for most others are NumPy and Matplotlib:NumPy is the fundamental package for scientific computing with Python. It contains, among other things, the following:A powerful n-dimensional array objectSophisticated (broadcasting) functionsTools for integrating C/C++ and Fortran codeUseful linear algebra, Fourier transform, and random number capabilitiesThe most up-to-date NumPy documentation can be found at numpy.org/devdocs /. It includes a user guide, full reference documentation, a developer guide, meta information, and “NumPy Enhancement Proposals” (which include the NumPy Roadmap and detailed plans for major new features).Matplotlib: Matplotlib is a Python plotting library that produces publication-quality figures in a variety of hardcopy formats and interactive environments across platforms. Matplotlib can be used in Python scripts, the Python and IPython shells, the Jupyter notebook, web application servers, and four graphical user interface toolkits.Matplotlib is useful to generate plots, histograms, power spectra, bar charts, error charts, scatterplots, and so on with just a few lines of code. The most up-to-date Matplotlib documentation can be found in the Matplotlib user's guide (matplotlib.org/3.1.1/users/index.html).Moreover, there are three higher-level SciPy libraries that provide the key features for time series forecasting in Python, they are pandas, statsmodels, and scikit-learn for data handling, time series modeling, and machine learning, respectively:Pandas: Pandas is an open-source, BSD-licensed library