Profit Driven Business Analytics. Baesens Bart
Чтение книги онлайн.
Читать онлайн книгу Profit Driven Business Analytics - Baesens Bart страница 6
In the analytics step, an analytical model will be estimated on the preprocessed and transformed data. Depending on the business objective and the exact task at hand, a particular analytical technique will be selected and implemented by the data scientist. In Table 1.1, an overview was provided of various tasks and types of analytics. Alternatively, one may consider the various types of analytics listed in Table 1.1 to be the basic building blocks or solution components that a data scientist employs to solve the problem at hand. In other words, the business problem needs to be reformulated in terms of the available tools enumerated in Table 1.1.
Finally, once the results are obtained, they will be interpreted and evaluated by the business experts. Results may be clusters, rules, patterns, or relations, among others, all of which will be called analytical models resulting from applying analytics. Trivial patterns (e.g., an association rule is found stating that spaghetti and spaghetti sauce are often purchased together) that may be detected by the analytical model are interesting as they help to validate the model. But of course, the key issue is to find the unknown yet interesting and actionable patterns (sometimes also referred to as knowledge diamonds) that can provide new insights into your data that can then be translated into new profit opportunities. Before putting the resulting model or patterns into operation, an important evaluation step is to consider the actual returns or profits that will be generated, and to compare these to a relevant base scenario such as a do-nothing decision or a change-nothing decision. In the next section, an overview of various evaluation criteria is provided; these are discussed to validate analytical models.
Once the analytical model has been appropriately validated and approved, it can be put into production as an analytics application (e.g., decision support system, scoring engine). Important considerations here are how to represent the model output in a user-friendly way, how to integrate it with other applications (e.g., marketing campaign management tools, risk engines), and how to make sure the analytical model can be appropriately monitored and backtested on an ongoing basis.
It is important to note that the process model outlined in Figure 1.1 is iterative in nature in the sense that one may have to return to previous steps during the exercise. For instance, during the analytics step, a need for additional data may be identified that will necessitate additional data selection, cleaning, and transformation. The most time-consuming step typically is the data selection and preprocessing step, which usually takes around 80 % of the total efforts needed to build an analytical model.
ANALYTICAL MODEL EVALUATION
Before adopting an analytical model and making operational decisions based on the obtained clusters, rules, patterns, relations, or predictions, the model needs to be thoroughly evaluated. Depending on the exact type of output, the setting or business environment, and the particular usage characteristics, different aspects may need to be assessed during evaluation in order to ensure the model is acceptable for implementation.
A number of key characteristics of successful analytical models are defined and explained in Table 1.7. These broadly defined evaluation criteria may or may not apply, depending on the exact application setting, and will have to be further specified in practice.
Table 1.7 Key Characteristics of Successful Business Analytics Models
Various challenges may occur when developing and implementing analytical models, possibly leading to difficulties in meeting the objectives as expressed by the key characteristics of successful analytical models discussed in Table 1.7. One such challenge may concern the dynamic nature of the relations or patterns retrieved from the data, impacting the usability and lifetime of the model. For instance, in a fraud detection setting, it is observed that fraudsters constantly try to out-beat detection and prevention systems by developing new strategies and methods (Baesens et al. 2015). Therefore, adaptive analytical models and detection and prevention systems are required in order to detect and resolve fraud as soon as possible. Closely monitoring the performance of the model in such a setting is an absolute must.
Another common challenge in a binary classification setting such as predicting customer churn concerns the imbalanced class distribution, meaning that one class or type of entity is much more prevalent than the other. When developing a customer churn prediction model typically many more nonchurners are present in the historical dataset than there are churners. Furthermore, the costs and benefits related to detecting or missing either class are often strongly imbalanced and may need to be accounted for to optimize decision making in the particular business context. In this book, various approaches are discussed for dealing with these specific challenges. Other issues may arise as well, often requiring ingenuity and creativity to be solved. Hence, both are key characteristics of a good data scientist, as is discussed in the following section.
ANALYTICS TEAM
The analytics process is essentially a multidisciplinary exercise where many different job profiles need to collaborate. First of all, there is the database or data warehouse administrator (DBA). The DBA ideally is aware of all the data available within the firm, the storage details and the data definitions. Hence, the DBA plays a crucial role in feeding the analytical modeling exercise with its key ingredient, which is data. Since analytics is an iterative exercise, the DBA may continue to play an important role as the modeling exercise proceeds.
Another very important profile is the business expert. This could, for instance, be a credit portfolio manager, brand manager, fraud investigator, or e-commerce manager. The business expert has extensive business experience and business common sense, which usually proves very valuable and crucial for success. It is precisely this knowledge that will help to steer the analytical modeling exercise and interpret its key findings. A key challenge here is that much of the expert knowledge is tacit and may be hard to elicit at the start of the modeling exercise.
Legal experts are gaining in importance since not all data can be used in an analytical model because of factors such as privacy and discrimination. For instance, in credit risk modeling, one typically cannot discriminate good and bad customers based on gender, beliefs, ethnic origin, or religion. In Web analytics, information is typically gathered by means of cookies, which are files that are stored on the user's browsing computer. However, when gathering information using cookies, users should be appropriately informed. This is subject to regulation at various levels (regional and national, and supranational, e.g., at the European level). A key challenge here is that privacy and other regulatory issues vary highly depending on the geographical region. Hence, the legal expert should have good knowledge about which data can be used when, and which regulation applies in which location.
The software tool vendors should also be mentioned as an important part of the analytics team. Different types of tool vendors can be distinguished here. Some vendors only provide tools to automate specific steps of the analytical modeling process (e.g., data preprocessing). Others sell software that covers the entire analytical modeling process. Some vendors also provide analytics-based solutions for specific application areas, such as risk management, marketing analytics, or campaign management.
The data scientist, modeler, or analyst is the person responsible for doing the actual analytics. The data scientist