Data Lakes For Dummies. Alan R. Simon
Чтение книги онлайн.
Читать онлайн книгу Data Lakes For Dummies - Alan R. Simon страница 18
You have three main options for how to deal with your proliferation of independent data marts as part of your data lake initiative:
Retire some or all of the data marts, and replace them with data lake functionality.
Isolate some of the data marts, and leave them in place alongside your new data lake.
Incorporate some of your data marts as components of your data lake.
Data mart retirement
If your existing data marts are creaking and groaning and are now coming up short even for the analytical needs of their respective users, here’s a great idea: Get rid of them!
Figure 2-3 shows how your new data lake gives you the perfect opportunity to not only get your data mart proliferation under control, but also upgrade your overall analytics.
FIGURE 2-3: Using a data lake to retire data marts.
Chances are, most of your data marts, especially those that have been around for a while, support descriptive analytics (basic business intelligence functions such as drilling deeper into summarized data to gain additional insights from lower levels of your data). But what about advanced analytical needs such as machine learning or other data mining and artificial intelligence–enabled analytical needs? Probably not so much!
So, why keep those aging data marts around? Redirect the data feeds from your source systems into your new data lake, and rebuild your analytics for accounting, your human resources (HR) organization, sales and marketing, and other parts of your enterprise within the data lake environment.
Data mart isolation
What if one of your existing data marts is an absolute work of genius? Suppose that three or four years ago, your company built a data mart to support your annual strategic planning cycle. Your strategic planning data mart has data feeds from numerous applications and systems around your enterprise. Do you really want to reinvent the wheel just because you’re now building a data lake?
Great news: You don’t have to throw away your data mart baby along with the data lake water! (Okay, maybe not the best metaphor, but you get the idea.)
Figure 2-4 shows how you can leave that strategic planning data mart in place alongside the new data lake. You’re essentially isolating that data mart from the new epicenter of your enterprise analytics. True, some data feeds will be duplicated between the strategic planning data mart and the data lake. But that’s okay! And over time, maybe you’ll decide to incorporate the strategic planning data mart into the data lake itself.
FIGURE 2-4: Leaving a data mart intact and alongside your data lake.
Data mart incorporation
The primary difference between isolating an existing data mart (refer to Figure 2-4) and incorporating that data mart into the data lake (see Figure 2-5) is that you eliminate the duplicate data feeds between the two.
FIGURE 2-5: Incorporating a data mart into your data lake.
Suppose your data feeds for your strategic planning data mart are exceptionally well architected. Why not move them over to bring data into the data lake? Chances are, other analytical needs for accounting, finance, HR, marketing, and other organizations and functions within your enterprise can also leverage that data. At the same time, all the great work that your organization did to consolidate and organize data for your annual strategic planning can become part of your overall data lake.
Eliminating Future Stand-Alone Data Marts
Even after getting your data mart proliferation under control as part of your data lake efforts, beware: History can easily repeat itself!
Make no mistake about it: Just because you’re now in the data lake era rather than the earlier data warehouse era, business organizations will still likely want to create their own smaller-scale data marts for their specific analytics needs.
Your data lake gives you a carrot-and-stick, one-two punch to help prevent the proliferation of future data marts.
First the stick, and then the carrot.
Establishing a blockade
Your company’s top leadership needs to help you establish a blockade against new data marts springing into existence. Your chief information officer (CIO) needs to make this policy crystal clear, in concert with their counterparts on the business side: the chief operating officer (COO), chief financial officer (CFO), and others in your company’s executive ranks.
Ideally, even your chief executive officer (CEO) should sign a declaration that another round of data mart proliferation won’t be tolerated.
Should a “no proliferation” edict be written in stone? Probably not. Some departments within your company will inevitably come up with some unique, time-is-of-the-essence analytical need that is better met through a stand-alone data mart than through the data lake.
However, the proponents of a new data mart should be required to prove their case and have their data mart project approved as an exception to the “no proliferation” rule. They need to declare the following:What the business imperative is for building a new stand-alone data mart (for example, to address some sort of business crisis or to take advantage of a market opportunity that must be addressed immediately)
Why their analytical needs can’t be met using the data lake in the same time frame that it would take to build their new data mart
Whether their planned data mart will be used only for a short period of time and be retired or if it will subsequently be incorporated into the data lake
Providing a path of least resistance
Business users around your organization build new stand-alone data marts because that’s what they’ve done for a long, long time. They realize that the best way to bring data-driven insights into