Profit Driven Business Analytics. Baesens Bart
Чтение книги онлайн.
Читать онлайн книгу Profit Driven Business Analytics - Baesens Bart страница 5
1. To establish the relation or detect dependencies between characteristics or independent variables and an observed dependent target variable(s) or outcome value.
2. To estimate or predict the unobserved or future value of the target variable as a function of the independent variables.
For instance, in a medical setting, the purpose of analyzing data may be to establish the impact of smoking behavior on the life expectancy of an individual. A regression model may be estimated that explains the observed age at death of a number of subjects in terms of characteristics such as gender and number of years that the subject smoked. Such a model will establish or quantify the impact or relation between each characteristic and the observed outcome, and allows for testing the statistical significance of the impact and measuring the uncertainty of the result (Cao 2016; Peto, Whitlock, and Jha 2010).
A clear distinction exists with estimating a regression model for, as an example, software effort prediction, as introduced in Table 1.5. In such applications where the aim is mainly to predict, essentially we are not interested in what drivers explain how much effort it will take to develop new software, although this may be a useful side result. Instead we mainly wish to predict as accurately as possible the effort that will be required for completing a project. Since the model's main use will be to produce an estimate allowing cost projection and planning, it is the exactness or accuracy of the prediction and the size of the errors that matters, rather than the exact relation between the effort and characteristics of the project.
Typically, in a business setting, the aim is to predict in order to facilitate improved or automated decision making. Explaining, as indicated for the case of software effort prediction, may have use as well since useful insights may be derived. For instance, from the predictive model, it may be found what the exact impact is of including more or less senior and junior programmers in a project team on the required effort to complete the project, allowing the team composition to be optimized as a function of project characteristics.
In this book, several versatile and powerful profit-driven approaches are discussed. These approaches facilitate the adoption of a value-centric business perspective toward analytics in order to boost the returns. Table 1.6 provides an overview of the structure of the book. First, we lay the foundation by providing a general introduction to analytics in Chapter 2, and by discussing the most important and popular business applications in detail in Chapter 3.
Table 1.6 Outline of the Book
Chapter 4 discusses approaches toward uplift modeling, which in essence is about distilling or estimating the net effect of a decision and then contrasting the expected result for alternative scenarios. This allows, for instance, the optimization of marketing efforts by customizing the contact channel and the format of the incentive for the response to the campaign to be maximal in terms of returns being generated. Standard analytical approaches may be adopted to develop uplift models. However, specialized approaches tuned toward the particular problem characteristics of uplift modeling have also been developed, and they are discussed in Chapter 4.
As such, Chapter 4 forms a bridge to Chapter 5 of the book, which concentrates on various advanced analytical approaches that can be adopted for developing profit-driven models by allowing us to account for profit when learning or applying a predictive or descriptive model. Profit-driven predictive analytics for classification and regression are discussed in the first part of Chapter 5, whereas the second part focuses on descriptive analytics and introduces profit-oriented segmentation and association analysis.
Chapter 6 subsequently focuses on approaches that are tuned toward a business-oriented evaluation of predictive models – for example, in terms of profits. Note that traditional statistical measures, when applied to customer churn prediction models, for instance, do not differentiate among incorrectly predicted or classified customers, whereas it definitely makes sense from a business point of view to account for the value of the customers when evaluating a model. For instance, incorrectly predicting a customer who is about to churn with a high value represents a higher loss or cost than not detecting a customer with a low value who is about to churn. Both, however, are accounted for equally by nonbusiness and, more specifically, non-profit-oriented evaluation measures. Both Chapters 4 and 6 allow using standard analytical approaches as discussed in Chapter 2, with the aim to maximize profitability by adopting, respectively, a profit-centric setup or profit-driven evaluation. The particular business application of the model will appear to be an important factor to account for in maximizing profitability.
Finally, Chapter 7 concludes the book by adopting a broader perspective toward the use of analytics in an organization by looking into the economic impact, as well as by zooming into some practical concerns related to the development, implementation, and operation of analytics within an organization.
ANALYTICS PROCESS MODEL
Figure 1.1 provides a high-level overview of the analytics process model (Hand, Mannila, and Smyth 2001; Tan, Steinbach, and Kumar 2005; Han and Kamber 2011; Baesens 2014). This model defines the subsequent steps in the development, implementation, and operation of analytics within an organization.
Figure 1.1 The analytics process model.
(Baesens 2014)
As a first step, a thorough definition of the business problem to be addressed is needed. The objective of applying analytics needs to be unambiguously defined. Some examples are: customer segmentation of a mortgage portfolio, retention modeling for a postpaid Telco subscription, or fraud detection for credit-cards. Defining the perimeter of the analytical modeling exercise requires a close collaboration between the data scientists and business experts. Both parties need to agree on a set of key concepts; these may include how we define a customer, transaction, churn, or fraud. Whereas this may seem self-evident, it appears to be a crucial success factor to make sure a common understanding of the goal and some key concepts is agreed on by all involved stakeholders.
Next, all source data that could be of potential interest need to be identified. This is a very important step as data are the key ingredient to any analytical exercise and the selection of data will have a deterministic impact on the analytical models that will be built in a subsequent step. The golden rule here is: the more data, the better! The analytical model itself will later decide which data are relevant and which are not for the task at hand. All data will then be gathered and consolidated in a staging area which could be, for example, a data warehouse, data mart, or even a simple spreadsheet file. Some basic exploratory data analysis can then be considered using for instance OLAP facilities for multidimensional analysis (e.g., roll-up, drill down, slicing and dicing). This will be followed by a