Profit Driven Business Analytics. Baesens Bart
Чтение книги онлайн.
Читать онлайн книгу Profit Driven Business Analytics - Baesens Bart страница 7
Whereas in a previous section we discussed the characteristics of a good analytical model, in this paragraph we elaborate on the key characteristics of a good data scientist from the perspective of the hiring manager. It is based on our consulting and research experience, having collaborated with many companies worldwide on the topic of big data and analytics.
A Data Scientist Should Have Solid Quantitative Skills
Obviously, a data scientist should have a thorough background in statistics, machine learning and/or data mining. The distinction between these various disciplines is becoming more and more blurred and is actually no longer that relevant. They all provide a set of quantitative techniques to analyze data and find business-relevant patterns within a particular context such as fraud detection or credit risk management. A data scientist should be aware of which technique can be applied, when, and how, and should not focus too much on the underlying mathematical (e.g., optimization) details but, rather, have a good understanding of what analytical problem a technique solves, and how its results should be interpreted. In this context, the education of engineers in computer science and/or business/industrial engineering should aim at an integrated, multidisciplinary view, with graduates formed in both the use of the techniques, and with the business acumen necessary to bring new endeavors to fruition. Also important is to spend enough time validating the analytical results obtained so as to avoid situations often referred to as data massage and/or data torture, whereby data are (intentionally) misrepresented and/or too much time is expended in discussing spurious correlations. When selecting the optimal quantitative technique, the data scientist should consider the specificities of the context and the business problem at hand. Key requirements for business models have been discussed in the previous section, and the data scientist should have a basic understanding of, and intuition for, all of those. Based on a combination of these requirements, the data scientist should be capable of selecting the best analytical technique to solve the particular business problem.
A Data Scientist Should Be a Good Programmer
As per definition, data scientists work with data. This involves plenty of activities such as sampling and preprocessing of data, model estimation, and post-processing (e.g., sensitivity analysis, model deployment, backtesting, model validation). Although many user-friendly software tools are on the market nowadays to automate and support these tasks, every analytical exercise requires tailored steps to tackle the specificities of a particular business problem and setting. In order to successfully perform these steps, programming needs to be done. Hence, a good data scientist should possess sound programming skills in, for example, SAS, R, or Python, among others. The programming language itself is not that important, as long as the data scientist is familiar with the basic concepts of programming and knows how to use these to automate repetitive tasks or perform specific routines.
A Data Scientist Should Excel in Communication and Visualization Skills
Like it or not, analytics is a technical exercise. At this moment, there is a huge gap between the analytical models and the business users. To bridge this gap, communication and visualization facilities are key! Hence, a data scientist should know how to represent analytical models and their accompanying statistics and reports in user-friendly ways by using, for example, traffic light approaches, OLAP (online analytical processing) facilities, or if-then business rules, among others. A data scientist should be capable of communicating the right amount of information without getting lost in complex (e.g., statistical) details, which will inhibit a model's successful deployment. By doing so, business users will better understand the characteristics and behavior in their (big) data, which will improve their attitude toward and acceptance of the resulting analytical models. Educational institutions must learn to balance between theory and practice, since it is known that many academic degrees mold students who are skewed to either too much analytical or too much practical knowledge.
A Data Scientist Should Have a Solid Business Understanding
While this might seem obvious, we have witnessed (too) many data science projects that failed since the respective data scientist did not understand the business problem at hand. By business we refer to the respective application area. Several examples of such application areas have been introduced in Table 1.5. Each of those fields has its own particularities that are important for a data scientist to know and understand in order to be able to design and implement a customized solution. The more aligned the solution with the environment, the better its performance will be, as evaluated according to each of the dimensions or criteria discussed in Table 1.7.
A Data Scientist Should Be Creative!
A data scientist needs creativity on at least two levels. First, on a technical level, it is important to be creative with regard to feature selection, data transformation and cleaning. These steps of the standard analytics process have to be adapted to each particular application and often the right guess could make a big difference. Second, big data and analytics is a fast-evolving field. New problems, technologies, and corresponding challenges pop up on an ongoing basis. Therefore, it is crucial that a data scientist keeps up with these new evolutions and technologies and has enough creativity to see how they can create new opportunities. Figure 1.2 summarizes the key characteristics and strengths constituting the ideal data scientist profile.
Figure 1.2 Profile of a data scientist.
CONCLUSION
Profit-driven business analytics is about analyzing data for making optimized operational business decisions. In this first chapter, we discussed how adopting a business perspective toward analytics diverges from a purely technical or statistical perspective. Adopting such a business perspective leads to a real need for approaches that allow data scientists to take into account the specificities of the business context. The objective of this book therefore is to provide an in-depth overview of selected sets of such approaches, which may serve a wide and diverse range of business purposes. The book adopts a practitioner's perspective in detailing how to practically apply and implement these approaches, with example datasets, code, and implementations provided on the book's companion website, www.profit-analytics.com.
REVIEW QUESTIONS
Question 1
Which is not a possible evaluation criterion for assessing an analytical model?
a. Interpretability
b. Economical cost
c. Operational efficiency
d. All of the above are possible evaluation criteria.
Question 2
Which statement is false?
a. Clustering is a type of predictive analytics.
b. Forecasting in essence concerns regression in function of time.
c. Association analysis is a type of descriptive analytics.
d. Survival analysis in essence concerns predicting the timing of an event.
Question