Machine Learning for Time Series Forecasting with Python. Francesca Lazzeri

Чтение книги онлайн.

Читать онлайн книгу Machine Learning for Time Series Forecasting with Python - Francesca Lazzeri страница 8

Machine Learning for Time Series Forecasting with Python - Francesca Lazzeri

Скачать книгу

forecasting model – Granularity in time series forecasting represents the lowest detailed level of values captured for each time stamp. Granularity is related to the frequency at which time series values are collected: usually, in Internet of Things (IoT) scenarios, data scientists need to handle time series data that has been collected by sensors every few seconds. IoT is typically defined as a group of devices that are connected to the Internet, all collecting, sharing, and storing data. Examples of IoT devices are temperature sensors in an air-conditioning unit and pressure sensors installed on a remote oil pump. Sometimes aggregating your time series data can represent an important step in building and optimizing your time series model: time aggregation is the combination of all data points for a single resource over a specified period (for example, daily, weekly, or monthly). With aggregation, the data points collected during each granularity period are aggregated into a single statistical value, such as the average or the sum of all the collected data points.

       Horizon of your forecasting model – The horizon of your forecasting model is the length of time into the future for which forecasts are to be prepared. These generally vary from short-term forecasting horizons (less than three months) to long-term horizons (more than two years). Short-term forecasting is usually used in short-term objectives such as material requirement planning, scheduling, and budgeting; on the other hand, long-term forecasting is usually used to predict the long-term objectives covering more than five years, such as product diversification, sales, and advertising.

       The endogenous and exogenous features of your forecasting model – Endogenous and exogenous are economic terms to describe internal and external factors, respectively, affecting business production, efficiency, growth, and profitability. Endogenous features are input variables that have values that are determined by other variables in the system, and the output variable depends on them. For example, if data scientists need to build a forecasting model to predict weekly gas prices, they can consider including major travel holidays as endogenous variables, as prices may go up because the cyclical demand is up.On the other hand, exogenous features are input variables that are not influenced by other variables in the system and on which the output variable depends. Exogenous variables present some common characteristics (Glen 2014), such as these:They are fixed when they enter the model.They are taken as a given in the model.They influence endogenous variables in the model.They are not determined by the model.They are not explained by the model.In the example above of predicting weekly gas prices, while the holiday travel schedule increases demand based on cyclical trends, the overall cost of gasoline could be affected by oil reserve prices, sociopolitical conflicts, or disasters such as oil tanker accidents.

       The structured or unstructured features of your forecasting model – Structured data comprises clearly defined data types whose pattern makes them easily searchable, while unstructured data comprises data that is usually not as easily searchable, including formats like audio, video, and social media postings. Structured data usually resides in relational databases, whose fields store length delineated data such as phone numbers, Social Security numbers, or ZIP codes. Even text strings of variable length like names are contained in records, making it a simple matter to search (Taylor 2018).Unstructured data has internal structure but is not structured via predefined data models or schema. It may be textual or non-textual, and human or machine generated. Typical human-generated unstructured data includes spreadsheets, presentations, email, and logs. Typical machine-generated unstructured data includes satellite imagery, weather data, landforms, and military movements.In a time series context, unstructured data doesn't present systematic time-dependent patterns, while structured data shows systematic time dependent patterns, such as trend and seasonality.

       The univariate or multivariate nature of your forecasting model – A univariate data is characterized by a single variable. It does not deal with causes or relationships. Its descriptive properties can be identified in some estimates such as central tendency (mean, mode, median), dispersion (range, variance, maximum, minimum, quartile, and standard deviation), and the frequency distributions. The univariate data analysis is known for its limitation in the determination of relationship between two or more variables, correlations, comparisons, causes, explanations, and contingency between variables. Generally, it does not supply further information on the dependent and independent variables and, as such, is insufficient in any analysis involving more than one variable.To obtain results from such multiple indicator problems, data scientists usually use multivariate data analysis. This multivariate approach will not only help consider several characteristics in a model but will also bring to light the effect of the external variables.Time series forecasting can either be univariate or multivariate. The term univariate time series refers to one that consists of single observations recorded sequentially over equal time increments. Unlike other areas of statistics, the univariate time series model contains lag values of itself as independent variables (itl.nist.gov/div898/handbook/pmc/section4/pmc44.htm). These lag variables can play the role of independent variables as in multiple regression. The multivariate time series model is an extension of the univariate case and involves two or more input variables. It does not limit itself to its past information but also incorporates the past of other variables. Multivariate processes arise when several related time series are observed simultaneously over time instead of a single series being observed as in univariate case. Examples of the univariate time series are the ARIMA models that we will discuss in Chapter 4, “Introduction to Some Classical Methods for Time Series Forecasting.” Considering this question with regard to inputs and outputs may add a further distinction. The number of variables may differ between the inputs and outputs; for example, the data may not be symmetrical. You may have multiple variables as input to the model and only be interested in predicting one of the variables as output. In this case, there is an assumption in the model that the multiple input variables aid and are required in predicting the single output variable.

       Single-step or multi-step structure of your forecasting model – Time series forecasting describes predicting the observation at the next time step. This is called a one-step forecast as only one time step is to be predicted. In contrast to the one-step forecast are the multiple-step or multi-step time series forecasting problems, where the goal is to predict a sequence of values in a time series. Many time series problems involve the task of predicting a sequence of values using only the values observed in the past (Cheng et al. 2006). Examples of this task include predicting the time series for crop yield, stock prices, traffic volume, and electrical power consumption. There are at least four commonly used strategies for making multi-step forecasts (Brownlee 2017):Direct multi-step forecast: The direct method requires creating a separate model for each forecast time stamp. For example, in the case of predicting energy consumption for the next two hours, we would need to develop a model for forecasting energy consumption on the first hour and a separate model for forecasting energy consumption on the second hour.Recursive multi-step forecast: Multi-step-ahead forecasting can be handled recursively, where a single time series model is created to forecast next time stamp, and the following forecasts are then computed using previous forecasts. For example, in the case of forecasting energy consumption for the next two hours, we would need to develop a one-step forecasting model. This model would then be used to predict next hour energy consumption, then this prediction would be used as input in order to predict the energy consumption in the second hour.Direct-recursive hybrid multi-step forecast: The direct and recursive strategies can be combined to offer the benefits of both methods (Brownlee 2017). For example, a distinct model can be built for each future time stamp, however each model may leverage the forecasts made by models at prior time steps as input values. In the case of predicting energy consumption for the next two hours, two models can be built, and the output from the first model is used as an input for the second model.Multiple output forecast: The multiple output strategy requires developing one model that is capable of predicting the entire forecast sequence. For example, in the case of predicting energy consumption for the next two hours, we would develop one model and apply it to predict the next two hours in one single computation (Brownlee 2017).

       Contiguous or noncontiguous time series values of your forecasting model – A time series that

Скачать книгу