Big Data MBA. Schmarzo Bill
Чтение книги онлайн.
Читать онлайн книгу Big Data MBA - Schmarzo Bill страница 5
• Business Intelligence operates with schema on load in which you have to pre-build the data schema before you can load the data to generate your BI queries and reports. Data science deals with schema on query in which the data scientists custom design the data schema based on the hypothesis they want to test or the prediction that they want to make.
Organizations that try to “extend” their Business Intelligence capabilities to encompass big data will fail. That's like stating that you're going to the moon, then climbing a tree and declaring that you are closer. Unfortunately, you can't get to the moon from the top of a tree. Data science is a new discipline that offers compelling, business-differentiating capabilities, especially when coupled with Business Intelligence.
CROSS-REFERENCE
Chapter 5 (“Differences Between Business Intelligence and Data Science”) discusses the differences between Business Intelligence and data science and how data science can complement your Business Intelligence organization. Chapter 6 (“Data Science 101”) reviews several different analytic algorithms that your data science team might use and discusses the business situations in which the different algorithms might be most appropriate.
Don't Think Data Warehouse, Think Data Lake
In the world of big data, Hadoop and HDFS is a game changer; it is fundamentally changing the way organizations think about storing, managing, and analyzing data. And I don't mean Hadoop as yet another data source for your data warehouse. I'm talking about Hadoop and HDFS as the foundation for your data and analytics environments – to take advantage of the massively parallel processing, cheap scale-out data architecture that can run hundreds, thousands, or even tens of thousands of Hadoop nodes.
We are witnessing the dawn of the age of the data lake. The data lake enables organizations to gather, manage, enrich, and analyze many new sources of data, whether structured or unstructured. The data lake enables organizations to treat data as an organizational asset to be gathered and nurtured versus a cost to be minimized.
Organizations need to treat their reporting environments (traditional BI and data warehousing) and analytics (data science) environments differently. These two environments have very different characteristics and serve different purposes. The data lake can make both of the BI and data science environments more agile and more productive (Figure 1.2).
Figure 1.2 Modern data/analytics environment
CROSS-REFERENCE
Chapter 7 (”The Data Lake“) introduces the concept of a data lake and the role the data lake plays in supporting your existing data warehouse and Business Intelligence investments while providing the foundation for your data science environment. Chapter 7 discusses how the data lake can un-cuff your data scientists from the data warehouse to uncover those variables and metrics that might be better predictors of business performance. It also discusses how the data lake can free up expensive data warehouse resources, especially those resources associated with Extract, Transform, and Load (ETL) data processes.
Don't Think “What Happened,” Think “What Will Happen”
Business users have been trained to contemplate business questions that monitor the current state of the business and to focus on retrospective reporting on what happened. Business users have become conditioned by their BI and data warehouse environments to only consider questions that report on current business performance, such as “How many widgets did I sell last month?” and “What were my gross sales last quarter?”
Unfortunately, this retrospective view of the business doesn't help when trying to make decisions and take action about future situations. We need to get business users to “think differently” about the types of questions they can ask. We need to move the business investigation process beyond the performance monitoring questions to the predictive (e.g., What will likely happen?) and prescriptive (e.g., What should I do?) questions that organizations need to address in order to optimize key business processes and uncover new monetization opportunities (see Table 1.2).
Table 1.2 Evolution of the Business Questions
CROSS-REFERENCE
Chapter 8 (“Thinking Like a Data Scientist”) differentiates between descriptive analytics, predictive analytics, and prescriptive analytics. Chapters 9, 10, and 11 then introduce several techniques to help your business users identify the predictive (“What will happen?”) and prescriptive (“What should I do?”) questions that they need to more effectively drive the business. Yeah, this will mean lots of Post-it notes and whiteboards, my favorite tools.
Don't Think HIPPO, Think Collaboration
Unfortunately, today it is still the HIPPO – the Highest Paid Person's Opinion – that determines most of the business decisions. Reasons such as “We've always done things that way” or “My years of experience tell me …” or “This is what the CEO wants …” are still given as reasons for why the HIPPO needs to drive the important business decisions.
Unfortunately, that type of thinking has led to siloed data fiefdoms, siloed decisions, and an un-empowered and frustrated business team. Organizations need to think differently about how they empower all of their employees. Organizations need to find a way to promote and nurture creative thinking and groundbreaking ideas across all levels of the organization. There is no edict that states that the best ideas only come from senior management.
The key to big data success is empowering cross-functional collaboration and exploratory thinking to challenge long-held organizational rules of thumb, heuristics, and “gut” decision making. The business needs an approach that is inclusive of all the key stakeholders – IT, business users, business management, channel partners, and ultimately customers. The business potential of big data is only limited by the creative thinking of the organization.
CROSS-REFERENCE
Chapter 13 (“Power of Envisioning”) discusses how the BI and data science teams can collaborate to brainstorm, test, and refine new variables that might be better predictors of business performance. We will introduce several techniques and concepts that can be used to drive collaboration between the business and IT stakeholders and ultimately help your data science team uncover new customer, product, and operational insights that lead to better business performance. Chapter 14 (“Organizational Ramifications”) introduces organizational ramifications, especially the role of Chief Data Monetization Officer (CDMO).
Summary
Big