Profit Driven Business Analytics. Baesens Bart
Чтение книги онлайн.
Читать онлайн книгу Profit Driven Business Analytics - Baesens Bart страница 3
To achieve the stated objectives, we have chosen to adopt a pragmatic approach in explaining techniques and concepts. We do not focus on providing extensive mathematical proof or detailed algorithms. Instead, we pinpoint the crucial insights and underlying reasoning, as well as the advantages and disadvantages, related to the practical use of the discussed approaches in a business setting. For this, we ground our discourse on solid academic research expertise as well as on many years of practical experience in elaborating industrial analytics projects in close collaboration with data science professionals. Throughout the book, a plethora of illustrative examples and case studies are discussed. Example datasets, code, and implementations are provided on the book's companion website, www.profit-analytics.com, to further support the adoption of the discussed approaches.
In this chapter, we first introduce business analytics. Next, the profit-driven perspective toward business analytics that will be elaborated in this book is presented. We then introduce the subsequent chapters of this book and how the approaches introduced in these chapters allow us to adopt a value-centric approach for maximizing profitability and, as such, to increase the return on investment of big data and analytics. Next, the analytics process model is discussed, detailing the subsequent steps in elaborating an analytics project within an organization. Finally, the chapter concludes by characterizing the ideal profile of a business data scientist.
Data is the new oil is a popular quote pinpointing the increasing value of data and – to our liking – accurately characterizes data as raw material. Data are to be seen as an input or basic resource needing further processing before actually being of use. In a subsequent section in this chapter, we introduce the analytics process model that describes the iterative chain of processing steps involved in turning data into information or decisions, which is quite similar actually to an oil refinery process. Note the subtle but significant difference between the words data and information in the sentence above. Whereas data fundamentally can be defined to be a sequence of zeroes and ones, information essentially is the same but implies in addition a certain utility or value to the end user or recipient. So, whether data are information depends on whether the data have utility to the recipient. Typically, for raw data to be information, the data first need to be processed, aggregated, summarized, and compared. In summary, data typically need to be analyzed, and insight, understanding, or knowledge should be added for data to become useful.
Applying basic operations on a dataset may already provide useful insight and support the end user or recipient in decision making. These basic operations mainly involve selection and aggregation. Both selection and aggregation may be performed in many ways, leading to a plentitude of indicators or statistics that can be distilled from raw data. The following illustration elaborates a number of sales indicators in a retail setting.
Providing insight by customized reporting is exactly what the field of business intelligence (BI) is about. Typically, visualizations are also adopted to represent indicators and their evolution in time, in easy-to-interpret ways. Visualizations provide support by facilitating the user's ability to acquire understanding and insight in the blink of an eye. Personalized dashboards, for instance, are widely adopted in the industry and are very popular with managers to monitor and keep track of business performance. A formal definition of business intelligence is provided by Gartner (http://www.gartner.com/it-glossary):
Example
For managerial purposes, a retailer requires the development of real-time sales reports. Such a report may include a wide variety of indicators that summarize raw sales data. Raw sales data, in fact, concern transactional data that can be extracted from the online transaction processing (OLTP) system that is operated by the retailer. Some example indicators and the required selection and aggregation operations for calculating these statistics are:
◼ Total amount of revenues generated over the last 24 hours: Select all transactions over the last 24 hours and sum the paid amounts, with paid meaning the price net of promotional offers.
◼ Average paid amount in online store over the last seven days: Select all online transactions over the last seven days and calculate the average paid amount;
◼ Fraction of returning customers within one month: Select all transactions over the last month and select customer IDs that appear more than once; count the number of IDs.
Remark that calculating these indicators involves basic selection operations on characteristics or dimensions of transactions stored in the database, as well as basic aggregation operations such as sum, count, and average, among others.
Business intelligence is an umbrella term that includes the applications, infrastructure and tools, and best practices that enable access to and analysis of information to improve and optimize decisions and performance.
Note that this definition explicitly mentions the required infrastructure and best practices as an essential component of BI, which is typically also provided as part of the package or solution offered by BI vendors and consultants. More advanced analysis of data may further support users and optimize decision making. This is exactly where analytics comes into play. Analytics is a catch-all term covering a wide variety of what are essentially data-processing techniques. In its broadest sense, analytics strongly overlaps with data science, statistics, and related fields such as artificial intelligence (AI) and machine learning. Analytics, to us, is a toolbox containing a variety of instruments and methodologies allowing users to analyze data for a diverse range of well-specified purposes. Table 1.1 identifies a number of categories of analytical tools that cover diverse intended uses or, in other words, allow users to complete a diverse range of tasks.
Table 1.1 Categories of Analytics from a Task-Oriented Perspective
A first main group of tasks identified in Table 1.1 concerns prediction. Based on observed variables, the aim is to accurately estimate or predict an unobserved value. The applicable subtype of predictive analytics depends on the type of target variable, which we intend to model as a function of a set of predictor variables. When the target variable is categorical in nature, meaning the variable can only take a limited number of possible values (e.g., churner or not, fraudster or not, defaulter or not), then we have a classification problem. When the task concerns the estimation of a continuous target variable (e.g., sales amount, customer lifetime value, credit loss), which can take any value over a certain range of possible values, we are dealing with regression. Survival analysis and forecasting explicitly account for the time dimension by either predicting the timing of events (e.g., churn, fraud, default) or the evolution of a target variable in time (e.g., churn rates, fraud rates, default rates). Table 1.2 provides simplified example datasets and analytical models for each type of predictive analytics for illustrative purposes.
Table 1.2 Example Datasets and Predictive Analytical Models
The second main group of analytics comprises descriptive analytics that, rather than predicting a target variable, aim at identifying specific types of patterns. Clustering or segmentation aims at grouping entities (e.g., customers, transactions, employees, etc.) that are similar in nature. The objective of association analysis is to find groups of events that frequently co-occur and therefore appear to be associated. The basic observations that are being analyzed in this problem setting consist of variable groups of events; for instance, transactions involving various products that are being