Data Lakes For Dummies. Alan R. Simon
Чтение книги онлайн.
Читать онлайн книгу Data Lakes For Dummies - Alan R. Simon страница 17
However, a B-52 today bears only a slight resemblance to one made in the ’50s or ’60s. Sure, if you were to put one of the original B-52s side by side with one of today’s planes, the two aircraft would look nearly identical. But the engines, the avionics, the flight controls … pretty much every major subsystem has been significantly upgraded and replaced in each operational B-52 at least a couple times over the years.
Better yet, a B-52 isn’t just some old plane that you may see flying at an airshow but that otherwise doesn’t have much purpose due to the passage of time. Not only is the B-52 still a viable, operational plane, but its mission has continually expanded over the years thanks to new technologies and capabilities.
In fact, you can think of a B-52 as sort of a bionic airplane. Its components and subsystems have been — and will continue to be — swapped out and substantially upgraded on a regular basis, giving the plane a planned life span of almost four times the normal longevity of the typical Air Force plane. Talk about an awe-inspiring feat of engineering!
However, all those enhancements and modifications to the B-52 happened gradually over time, not all at once. Plus, the changes were all carefully planned and implemented with longevity and continued viability top of mind.
Your data lake should follow the same model: a “bionic” enterprise-scale analytical data environment that regularly incorporates new and improved technologies to replace older ones, as well as enhancing overall function. You almost certainly won’t get an entire century’s usage out of a data lake that you build today, but if you do a good job with your planning and implementation, 10 or even 20 years of value from your data lake is certainly achievable.
More important, your data lake won’t be just another aging system hanging around long past when it should’ve been retired. You almost certainly have plenty of those antiquated systems stashed in your company’s overall IT portfolio. That’s why the B-52 is the perfect analogy for the data lake, with a “bionic” approach to regularly replacing major subsystems helping to keep your data lake viable for years to come.
Strengthening the analytics relationship between IT and the business
If a tree falls in a forest, but nobody is around to hear it fall, does it make a sound?
Or how about this one: If you build a system to support analytics across your organization and load it with tons of data, but nobody really uses it, does your organization really have analytical data?
Don’t worry, you didn’t go back in time to a college philosophy class — you won’t be graded on your responses to either of these questions.
You can think of a data warehouse as a direct ancestor of a data lake. Data warehousing came onto the scene around 1990, and it has been the primary go-to approach for enterprise analytics in the decades since.
Far too many of today’s data warehouses are like that tree falling in a forest. The IT side of your company originally set out to build an enterprise-wide home for analytical data that will support reporting, business intelligence, data visualization, and other analytical needs from every corner of your organization.
Alas, that data warehouse, like so many others, came up short. Maybe the data warehouse doesn’t contain certain sets of data that are needed for critical analytics. Perhaps the data warehouse contents aren’t properly organized and structured and are difficult to access with the business intelligence tools available. Whatever the reason may be, your organization’s business users finally said, “To heck with it!” and built their own smaller-scale data marts to satisfy their own departmental or functional analytical needs.
Along the way, a sense of distrust built up — at least when it came to analytics and data — between your IT organization and the business users who are supposed to be their customers. Not good!
The data lake presents your organization with an opportunity for a fresh start. You can apply many of the best practices and also the painful lessons from 30-plus years of data warehousing to your data lake efforts and avoid repeating the mistakes and shortcomings of the past. As your data lake gets built, no matter if you’re on the IT side or the business side of your company, you can help rebuild that essential trust, especially when it comes to all-important analytics and the resulting data-driven insights.
Reducing Existing Stand-Alone Data Marts
You really can’t argue with the original concept of an enterprise data warehouse! Figure 2-1 illustrates the basic idea of a single home for most or all of the data needed to support a broad range of analytics across the entire enterprise.
Sounds like a great idea, right?
FIGURE 2-1: The vision of an enterprise data warehouse.
Dealing with the data fragmentation problem
A lofty vision is one thing; reality is often something else. Figure 2-2 illustrates how almost every organization’s idea of centralized, enterprise-scale data warehousing eventually surrendered to a landscape littered with numerous stand-alone, nonintegrated data marts.
Okay, so maybe the idea of “Do your own thing, and build your own data mart” got out of control. Now that you can see what a mess that approach created, why not just retire those data marts and fold them into your enterprise data warehouse that’s probably underutilized?
A collection of independent data marts is almost always hampered by a lack of common master data (for example, to sales, a “customer” may be something different than a “customer” is to your marketing team), different software packages and technologies across the data marts, and other challenges. Taken together, these challenges make it almost impossible to consolidate separate, independent data marts back into a single data warehouse. Most organizations instead throw their hands up in the air and say that they’re following a federated data warehouse approach. You “create” a federated data warehouse by simply declaring that some or all of your data marts are part of a “federation” that, when considered together, are sort of like a data warehouse. “Um … yeah, that’s our story, and we’re sticking to it. It’s magic!” (Not really … and not all that valuable from an enterprise-wide perspective.)
FIGURE 2-2: The reality of numerous stand-alone data marts.
Decision point: Retire, isolate, or incorporate?
What should you do about your proliferation of data marts now that your organization