Data Lakes For Dummies. Alan R. Simon
Чтение книги онлайн.
Читать онлайн книгу Data Lakes For Dummies - Alan R. Simon страница 20
What happened?
Why did it happen?
What’s happening right now?
What’s likely to happen?
What’s something interesting and important out of this mountain of data?
What are our options?
What should we do?
Your data lake needs to support the entire analytics continuum in all corners of your organization.
Suppose that Jan, your company’s CPO, is incredibly pleased with the work that Raul’s team did to have your data lake support machine learning models for the evaluation cycle. So, she asks Raul to expand the HR organization’s usage of analytics that are enabled by the data lake. Raul sits down with his analysts, Julia and Dhiraj, to create a master list of analytical questions that should be considered for implementation.
Raul’s team has the easiest time with “What happened?” types of questions, because these are what your company’s data warehouse and data marts have been producing for years. Now, though, your data marts and data warehouse will either be retired or incorporated into the data lake environment, so your data lake can take over this mission and serve up the data to answer questions along the lines of:
Which employees have consistently been rated in the top quintile in each department during the past three years?
Which employees have received the largest percentage salary increases during all evaluation periods during the past five years?
How many new employees were hired in each of the past three years?
How many employees left during each of the past years? How many of those resigned? How many were involuntarily terminated? How many retired?
Because your company’s executives are somewhat on the formal side, your list of “What happened?” questions will be categorized under the label descriptive analytics. In other words, your data lake will be producing analytics that describe something that happened in the past (which might be the very recent past, several years ago, or perhaps even farther back). But just like your existing data warehouse and data marts mostly do, your data lake will now be producing descriptive analytics.
You also need the data lake to help you dig into the reasons something happened. For example, your descriptive analytics tell you that the number of employees who voluntarily resigned from the company last year was 25 percent above the yearly average for the previous five years. Inquiring minds want to know why!
Diagnostic analytics help you dig into the “why” factor for what your descriptive analytics tell you, and — congratulations! — your data lake will take on another assignment. In this case, you can be sure that Jan, your CPO, will be digging for answers now that she’s clued in to the increase in employee turnover.
Raul is well aware that, although insight into past results is an important part of your company’s analytics continuum, Jan and the other executives — as well as many others at all levels of your organization — also need deep insights into what’s happening right now. Before working in HR, Raul used to be in the supply chain organization. His specialty there was providing up-to-the-minute, near-real-time reports and visualizations for logistics and transportation throughout the entire supply chain.
This special variation of descriptive analytics — basically, factually describing what’s happening right now — may have some applicability to HR, though probably less so than over in the supply chain organization. Still, Raul makes a note to dive into these types of questions.
Jan, Raul, Julia, Dhiraj, Tamara, and most everyone else in HR knows with absolute certainty that predictive analytics need to be a critical capability when the data lake functionality is built out. Even though predictive analytics aren’t exactly a crystal ball with guaranteed predictions, the sophisticated models can ingest data and tell the HR team and others what’s likely to happen. This way, the data lake can help provide insights such as the following:
Which employees are most at risk of resigning in the next year?
Which employees with less than three years of experience are most likely to become top performers in their next jobs?
Which employees with between 10 and 15 years of experience are most likely to underperform during the rest of this fiscal year?
Who are the top 50 nonmanagerial employees most likely to succeed as managers?
Predictive analytics generally falls under the category of data mining. Another form of data mining is digging into mountains of data, seeking interesting and important patterns and other insights that otherwise may remain hidden. Discovery analytics helps you mine your data to see the following:
Have any of your employees exhibited behavior that may indicate inappropriate or illegal activities, such as expense account fraud?
Is there anything going on in the company that can legally expose the company?
Overall, are employees happy working here?
Descriptive, diagnostic, predictive, and discovery analytics all help you gain valuable insights into different aspects of your organization, its performance, possible risks, and much more. However, you need more than insights! You need to drive those insights into decisions and actions.
Prescriptive analytics is a relative newcomer into the overall analytics continuum. “Wait a minute!” you may be thinking. “I’ve been making decisions and taking actions for a long time!” The “secret sauce” of prescriptive analytics, however, is making those decisions and taking those actions with a healthy assist from your organization’s data being fed into increasingly sophisticated analytics. And yes, you guessed it: Your data lake will play a starring role in driving prescriptive analytics. So, your data lake will help you with the following scenarios:
Based on market forecasts and the overall economy, you need to cut approximately 10 percent of your headcount. What are your options? How do you get the work done? Can you shift some of the work to lower-cost contractors? Should you try a voluntary early retirement program to reduce the number of involuntary terminations? Name four or five scenarios with all the data and all the trimmings!
Then, out of those four or five scenarios, which one is “best” and why? Are there any downside surprise risks you should be aware of?
Table 2-1 shows you the relationship between the easy-to-understand questions and the more formal names you’ll use as you plan your data lake.
TABLE 2-1 Matching Analytics and Business Questions
Question | Type of Analytics |
---|---|
What happened?
|