Applied Data Mining for Forecasting Using SAS. Tim Rey

Чтение книги онлайн.

Читать онлайн книгу Applied Data Mining for Forecasting Using SAS - Tim Rey страница 18

Applied Data Mining for Forecasting Using SAS - Tim Rey

Скачать книгу


      Chapter 3: Data Mining for Forecasting Infrastructure

       3.1 Introduction

       3.2 Hardware Infrastructure

       3.2.1 Personal Computers Network Infrastructure

       3.2.2 Client/Server Infrastructure

       3.2.3 Cloud Computing Infrastructure

       3.3 Software Infrastructure

       3.3.1 Data Collection Software

       3.3.2 Data Preparation Software

       3.3.3 Data Mining Software

       3.3.4 Forecasting Software

       3.3.5 Software Selection Criteria

       3.4 Data Infrastructure

       3.4.1 Internal Data Infrastructure

       3.4.2 External Data Infrastructure

       3.5 Organizational Infrastructure

       3.5.1 Developers Infrastructure

       3.5.2 Users Infrastructure

       3.5.3 Work Process Implementation

       3.5.4 Integration with IT

      Applying data mining for forecasting in a business requires serious investments in hardware, software, and training, but a cultural change must also take place. It is very important to estimate the size of the investment based on technical requirements and the products that are available in the market. The four main components of any forecasting infrastructure are hardware, software, data, and organizational. The first three components build the technical basis to support applied data mining for forecasting, and the fourth component is critical to effectively change the culture of the organization. This chapter is focused on an enterprise-wide implementation strategy of data mining for forecasting. The importance of integrating the selected options into the existing corporate infrastructure is discussed at the end of the chapter.

      The objective of this section is to give the reader a condensed overview of the potential hardware architectures for implementing data mining for forecasting systems in an industrial setting. The following three options: (1) PC network, (2) client/server, and (3) cloud computing infrastructures are discussed briefly below. However, due to rapid technology changes today's recommendations can easily become obsolete tomorrow.

      The least expensive hardware solution for implementing data mining for forecasting systems in an industrial setting is to avoid any additional hardware expenses and use the existing information system infrastructure. Usually, this is based on a PC network. The key advantages of this option are as follows:

       low cost

       easy integration in the existing information system infrastructure

       minimal installation and maintenance efforts

       robust performance due to the decentralized architecture

      The main limitations of the PC network infrastructure solution for implementing data mining for forecasting systems are as follows:

       limitations for large data set processing

       slower processing speed relative to servers

       limited operating systems options

      The client/server model assumes a division of the computing resources between clients or workstations with local processing capabilities and servers with large memory and disk space and more powerful processors. The clients request services such as data, and the servers retrieve resources and deliver the requested information. The number of servers required depends on the number of clients, network speed and capacity, global and local operation, reliability, and so on.

      An example of a minimal client/server infrastructure based on SAS is shown in Figure 3.1. The example includes four types of servers and two types of clients—modeler PC and final user PC. One server is allocated to handle metadata. A data mart server, based on Oracle, interacts with the large database cluster containing the corporate data. The third server includes the SAS server and is devoted to intensive computing tasks. Several clients can share the server resources either for developing new models or running developed models as stored processes.

      The key advantages of the client/server infrastructure for implementing data mining for forecasting are given below:

       very powerful processing capabilities

       large memory and high-throughput disk

       the use of different operating systems

       capacity to process large data sets.


      The disadvantages of this option are as follows:

       high cost

       more complex maintenance and support

       lower reliability if servers are

Скачать книгу