Data Lakes For Dummies. Alan R. Simon

Чтение книги онлайн.

Читать онлайн книгу Data Lakes For Dummies - Alan R. Simon страница 16

Data Lakes For Dummies - Alan R. Simon

Скачать книгу

alt="check"/> Blockading new stand-alone data marts

      

Deciding what to do about your data warehouses

      

Aligning your data lake plans with your organization’s analytical needs

      

Setting your data velocity speed limits

      

Getting a handle on your analytical costs

      Suppose that you and about 15 other family members or friends all head to your favorite lake for a weeklong summer vacation.

      A data lake is very much like that weeklong trip to your favorite lake. Because a data lake is an enterprise-scale effort, spanning numerous organizations and departments, as well as many different business functions, you and your coworkers will likely seek a variety of varying benefits and outcomes from all that hard work.

      The best data lakes are those that satisfy the needs of a broad range of constituencies — basically, something for everyone to make the results well worth the effort.

      Maybe your organization has been dabbling in the world of big data for a while, going back to when Hadoop was one of the hottest new technologies. You’ve built some pretty nifty predictive analytics models, and now you’re fairly adept at discovering important patterns buried in mountains of data.

      So far, though, your AAA — adventures in advanced analytics — have been highly fragmented. In fact, your analytical data is all over the place. You don’t have consistent approaches to cleansing and refining raw data to get the data ready for analytics; different groups do their own thing. It’s like the Wild West out there!

      The concept of a data lake helps you harness the power of big data technology to the benefit of your entire organization. By following emerging best practices, avoiding traps and pitfalls, and building a solidly architected data lake, you can seize the day and help take your organization to new heights when it comes to analytics and data-driven insights.

      

You’ll achieve economies of scale for the data side of analytics throughout your organization, which means that you’ll get “more bang for your buck” when it comes to acquiring, consolidating, preparing, and storing your analytical data on behalf of your enterprise as a whole rather than repetitively doing so for numerous smaller groups.

      Your data lake’s big data foundation presents you with an opportunity that, not too long ago, was out of reach for most organizations. You can store, manage, and analyze all three types of data — structured, unstructured, and semi-structured — within a single environment, and without having to jump through hoops to do so!

      Many of the business questions you ask of your data will only require structured data. Suppose you work in the supply chain organization within your company. You’ll definitely want your data lake to provide insight into the following:

       Who among your strategic suppliers has the best combination of on-time component production and also very low problem rates?

       Which third-party logistics firms have the best — or worst — on-time shipping performance?

       What’s the percentage of product spoilage among all internal and third-party warehouses during the past six months?

      Other critical business analytics may involve unstructured or semi-structured data. You’ll want to know the following:

       What percentage of tweets from your customers represent a positive sentiment about your product quality? Negative sentiment? What “hot spots” are showing up in blogs, tweets, and other social media posts, as well as YouTube videos, that can mean profitability and market share problems for you down the road?

       Your reports show a dramatic increase in breakage in Warehouse #2. You have surveillance cameras in all your facilities. Is there anything that shows up on video that could indicate one or more root causes for this breakage that you can address through procedural changes?

      

Your data lake gives you one-stop shopping for structured, unstructured, and semi-structured data in a logically centralized, cohesive environment.

      BACK TO THE FUTURE, PART 2

      In the first edition of Data Warehousing For Dummies (Wiley), back in 1996, I included a chapter about the future directions of data warehousing. One of the forecasts I made was that the first-generation data warehousing of that time would eventually evolve into what I called “multimedia data warehousing” and would include not only structured data but also video and audio content. I made this prediction on the basis that “not all of the business questions we need to ask out of a data warehouse will come from numbers, dates, and character strings; sometimes we need information from images and other multimedia content as well.”

      Building an all-new analytical data environment around big data technology sounds like a great idea, right? You may be worried, though, that your organization can invest a ton of money over the next couple of years, only to find that your data lake is obsolete because of an entirely new generation of technology.

      In other words, can your data lake be not just today’s but also tomorrow’s go-to platform for more and more analytical data and data-driven insights? Absolutely!

      Constructing a bionic data environment

      Maybe you’ve heard of a B-52. No, not a member of the American new wave music group (so don’t start singing “Love Shack”) but rather the U.S. Air Force plane.

      The

Скачать книгу