Applied Data Mining for Forecasting Using SAS. Tim Rey
Чтение книги онлайн.
Читать онлайн книгу Applied Data Mining for Forecasting Using SAS - Tim Rey страница 18
Chapter 3: Data Mining for Forecasting Infrastructure
3.2.1 Personal Computers Network Infrastructure
3.2.2 Client/Server Infrastructure
3.2.3 Cloud Computing Infrastructure
3.3.1 Data Collection Software
3.3.2 Data Preparation Software
3.3.5 Software Selection Criteria
3.4.1 Internal Data Infrastructure
3.4.2 External Data Infrastructure
3.5 Organizational Infrastructure
3.5.1 Developers Infrastructure
3.5.3 Work Process Implementation
3.1 Introduction
Applying data mining for forecasting in a business requires serious investments in hardware, software, and training, but a cultural change must also take place. It is very important to estimate the size of the investment based on technical requirements and the products that are available in the market. The four main components of any forecasting infrastructure are hardware, software, data, and organizational. The first three components build the technical basis to support applied data mining for forecasting, and the fourth component is critical to effectively change the culture of the organization. This chapter is focused on an enterprise-wide implementation strategy of data mining for forecasting. The importance of integrating the selected options into the existing corporate infrastructure is discussed at the end of the chapter.
3.2 Hardware Infrastructure
The objective of this section is to give the reader a condensed overview of the potential hardware architectures for implementing data mining for forecasting systems in an industrial setting. The following three options: (1) PC network, (2) client/server, and (3) cloud computing infrastructures are discussed briefly below. However, due to rapid technology changes today's recommendations can easily become obsolete tomorrow.
3.2.1 Personal Computers Network Infrastructure
The least expensive hardware solution for implementing data mining for forecasting systems in an industrial setting is to avoid any additional hardware expenses and use the existing information system infrastructure. Usually, this is based on a PC network. The key advantages of this option are as follows:
low cost
easy integration in the existing information system infrastructure
minimal installation and maintenance efforts
robust performance due to the decentralized architecture
The main limitations of the PC network infrastructure solution for implementing data mining for forecasting systems are as follows:
limitations for large data set processing
slower processing speed relative to servers
limited operating systems options
3.2.2 Client/Server Infrastructure
The client/server model assumes a division of the computing resources between clients or workstations with local processing capabilities and servers with large memory and disk space and more powerful processors. The clients request services such as data, and the servers retrieve resources and deliver the requested information. The number of servers required depends on the number of clients, network speed and capacity, global and local operation, reliability, and so on.
An example of a minimal client/server infrastructure based on SAS is shown in Figure 3.1. The example includes four types of servers and two types of clients—modeler PC and final user PC. One server is allocated to handle metadata. A data mart server, based on Oracle, interacts with the large database cluster containing the corporate data. The third server includes the SAS server and is devoted to intensive computing tasks. Several clients can share the server resources either for developing new models or running developed models as stored processes.
The key advantages of the client/server infrastructure for implementing data mining for forecasting are given below:
very powerful processing capabilities
large memory and high-throughput disk
the use of different operating systems
capacity to process large data sets.
Figure 3.1: An example of client/server infrastructure based on SAS
The disadvantages of this option are as follows:
high cost
more complex maintenance and support
lower reliability if servers are