Data Lakes For Dummies. Alan R. Simon

Чтение книги онлайн.

Читать онлайн книгу Data Lakes For Dummies - Alan R. Simon страница 19

Data Lakes For Dummies - Alan R. Simon

Скачать книгу

fate and build an end-to-end solution. Old habits are extremely difficult to break!

      

Beyond a blockade on new data mart development, your data lake can give these business users a path of least resistance. Make it easier for them to go to the data lake for the data they need instead of doing everything on their own.

      Suppose that a new chief people officer (CPO) is hired to lead your company’s HR organization. Jan, the new CPO, is a big believer in applying super-advanced analytics, such as machine learning and artificial intelligence, to numerous HR functions: employee evaluations, salary adjustments and promotions, succession planning, and more.

      Jan appoints an analytics team within HR and tells them that, within the next three months, they need to have some initial machine learning models built in time for the semiannual employee evaluation cycle. Raul, the analytics teamleader, has been with your company for 15 years and has built several HR-specific data marts in the past for similar needs.

      Raul assigns two of the team members, Julia and Dhiraj, to analyze the HR data in Workday (a cloud-based HR and financial management system) to figure out what data needs to be brought into the machine learning model. Raul also assigns another team member, Tamara, to start designing an Amazon Redshift database to store the HR data and support the machine learning algorithms.

      Not so fast, Raul!

      “Hmm … a data lake,” Raul thinks. “I wonder if the data we need is already in there?”

      Sure enough, Raul goes browsing through the data lake catalog and finds that the data lake already has a ton of HR data from Workday that is regularly refreshed. He asks Julia and Dhiraj to match up the work that they’ve done so far with what the data lake catalog shows. Within two hours, they report back with the fantastic news: “Everything we need is in the data lake already!”

      A well-constructed data lake offers business users a path of least resistance when it comes to gathering the data they need for their analytical needs. Raul’s team will still need to build the machine learning models to produce the analytics that Jan, your CPO, wants to apply to the next evaluation cycle. But they no longer need to proceed with analytics on a business-as-usual basis, constantly acquiring and storing the same data over and over in different data marts.

      Over time, as familiarity with the data lake spreads throughout your organization, fewer unnecessary data mart requests such as Raul’s will need to be redirected back to the data lake. Raul wasn’t deliberately trying to do everything on his own; he just wasn’t familiar enough with what the data lake provided, not only to HR but to your company as a whole.

      Data warehousing has been on the scene since around 1990, which means that thousands of enterprise-wide data warehouses have been built and deployed over the years. In fact, looking back at the B-52 analogy earlier in this chapter, you can think of a data warehouse as the equivalent of a propeller-driven airplane that preceded the jet aircraft era, which, of course, makes the data lake the equivalent of that technology-leaping jet.

Some ultramodern, large-scale enterprise data warehouses have been built in the past several years, using relatively new technologies such as the SAP HANA in-memory database management system. Many others, however, were built on older relational databases and are still chugging along. They still work okay, for the most part. But in this new era of data lakes, it’s time to decide what to do about the old-timers.

      Sending a faithful data warehouse off to a well-deserved retirement

      If your data warehouse is really showing its age, your best bet is to hold a nice retirement party in the company cafeteria with cake and ice cream for everyone and with a few speeches about how wonderful the data warehouse has served the company’s enterprise-wide reporting and business intelligence mission over the years. (Okay, you can probably skip the cake and ice cream, as well as the cafeteria party itself.)

Schematic illustration of migrating the data warehouse into your new data lake.

      FIGURE 2-6: Migrating your data warehouse into your new data lake.

      

Your old data warehouse contents were likely stored in a dimensional model such as a star schema or a snowflake schema. Inside a data lake, the equivalent models might also be dimensional. Alternatively, you could be using a columnar database such as Amazon Redshift. You can still use a visualization tool such as Tableau or a classic business intelligence tool such as MicroStrategy, but your database design will differ from your old data warehouse.

      Resettling a data warehouse into your data lake environment

      Suppose you and your team actually did a fantastic job architecting and building your data warehouse. You did your work and deployed the data warehouse only a few years ago, using fairly modern technology. To put it simply, your data warehouse just isn’t ready for retirement. But you still want to build a data lake to take advantage of modern big data technology. What should you do in this case?

      Just as with a solidly built data mart, you can sort of “forklift” a well-architected data warehouse into your data lake environment. You’ll still have to do some rewiring of data feeds, and you’ll be adding complexity to your overall analytical data architecture. But there’s no sense in exiling a solidly built data warehouse into oblivion if it can still deliver value for you for a while to come.

      

You don’t set out to build a data lake just to stuff tons of data into a modern big data environment. You build a data lake to support analytics throughout your enterprise. And the reason for your organization’s analytics is to deliver data-driven insights, with the emphasis on the term data-driven.

      For better or for worse, the term analytics means different things to different people. As you set out to build your data lake, you need to understand what analytics means to your organization.

      Deciding what your organization wants out of analytics

      You should think of

Скачать книгу