Applied Data Mining for Forecasting Using SAS. Tim Rey

Чтение книги онлайн.

Читать онлайн книгу Applied Data Mining for Forecasting Using SAS - Tim Rey страница 20

Автор:
Жанр:
Серия:
Издательство:
Applied Data Mining for Forecasting Using SAS - Tim Rey

Скачать книгу

the ease-of-use of the corresponding packages. Most of the time the tools based on building blocks (such as SAS Enterprise Miner) or the high-performance forecasting tools (such as SAS Forecast Server) cost more. However, the increased productivity they deliver is significant. An additional advantage is the shorter learning and product adaptation time, which lowers the total cost.

       Functionality—it is strongly recommended that you carefully check whether the necessary technical functionality is available, as described in the previous sections, and to avoid any compromises. The capability to add new methods is also recommended.

       Ease-of-use is enhanced by programming based on building blocks, a high level of automation for data pre-processing and model generation, an interactive graphic interface, and minimal programming necessary to deploy models (all features of SAS Enterprise Guide, for example).

       Report generation is a significant step during model development as well as during model deployment and when transferring ownership to clients. During the model building phase many detailed reports with time series analysis, model diagnostics, or variable selection results are needed for successful decision-making. For model deployment, good reporting capabilities for model performance and value tracking are critical in order to keep the client happy.

       The learning effort required depends on the software's ease-of-use, users' experience in statistics and forecasting within the organization, and the training courses and materials offered by the vendor. Products with a steep learning curve can significantly delay implementation efforts and reduce the impact of the technology for data mining in forecasting.

       Global support 24/7—a fast, professional response to model development and implementation issues that is available globally is critical for the success of data mining for forecasting in industry. This is one of the key factors to consider when selecting the proper software vendor. Very few have the capacity to provide this type of service.

      Developing and maintaining a data infrastructure that can reliably supply the data to the developed and apply forecasting models is a critical step for the final success. The data infrastructure for data mining in forecasting consists of two key parts: internal data from the business and external data from various sources, such as Global Insight, Bloomberg, CMAI, and so on. The essence of both cases is described briefly in the following sections.

      Very often creating an internal data infrastructure for data mining in forecasting is the key bottleneck of the whole effort. There are several issues that contribute to this situation. The first issue is the diverse nature of data sources in different parts of the business. This issue is especially difficult to resolve during the transition period after mergers and acquisitions when various types of databases need to be integrated. The second issue is the different time interval and duration with which historical data are kept in the system. Very often the time interval (week, month, or quarter) is different and inconsistent for the historical periods of interest. A similar situation is observed with the duration of historical data. In many cases time history is too short to represent the patterns necessary to build and validate a good forecasting model. The third issue is the structural changes in the business since corresponding models need to be rebuilt with revised history after each significant change.

      The internal data infrastructure depends on the corporate data infrastructure. One option to communicate and synchronize the extracts is by using a separate server. (See the example in Figure 3.1.) At the basis of data infrastructure design is the metadata (the data about the data) definition. The cost for maintenance and support of the internal data infrastructure depends on the internal cost structure derived by corporate IT.

      Usually, the data about potential economic drivers are not available internally and need to be delivered by external sources. Examples of such sources are the Bloomberg services5 with various types of financial data, such as equities, commodities, foreign exchange rates, and the Global Insight services6 with more than 30 million time series of different nature across the globe, such as prices, economic indicators, and labor costs. The external data are generally consistent, collected in a timely manner, and some have forecast values for a given forecasting horizon. The last feature is very beneficial in the case of using these data as inputs in the multivariate in X forecasting models.

      There are two options for delivering external data. The first one is based on accessing the necessary data by direct extracts from the key sources. The second option is based on building an internal database of the most frequently used external data. The advantage of the second approach is the synchronized update of all needed external data, fast search of the specific economic drivers, and more reliable maintenance of deployed models. However, this option requires allocating internal resources for the design and maintenance of the database and training of potential users.

      An example of integrating different external and internal data sets in a data set that is appropriate for data mining in forecasting is shown in Figure 3.2. It includes three external data sets (Bloomberg, Global Insight, and CMAI) and two internal data sets. The different data are integrated in the forecasting data set based on a selected starting time and time interval (month, quarter, or year). Those time series with different time intervals are appropriately expanded or contracted in a previous step as described in Chapter 6.

images

      The cost of maintaining and supporting the external data infrastructure depends on the subscription services cost, the cost of developing and maintaining an internal database, and the internal cost of corporate IT.

      The objective of this section is to give the reader possible ways to build an organizational infrastructure for data mining in forecasting in a business. We briefly discuss organizing model developers and forecasting users, selecting a proper work process, and integrating everything into the corporate IT environment.

      A key strategic business decision related to a forecasting organization is deciding how much to invest in people that can develop forecasting models. The type of the forecasting development effort and its size depend on the projected demand for forecasting projects in the organization. Other factors that have to be taken into account are as follows:

       the available internal personnel in corporate IT who can support forecasting models by managing the data, infrastructure, and operations

       the strategic commitment of key users for time and resources

       the available internal skills in the area of modeling, statistics, data mining, and forecasting

       the level of experience in applying forecasting projects

      Below we briefly discuss three ways to organize developers: (1) external consultant services, (2) distributed developers in organizations (key users of forecasting services), and (3) a centralized group of developers.

      External consultant services

      This is the minimum-investment solution for when you have low expected demand, no strategic commitment,

Скачать книгу