The Informed Company. Dave Fowler

Чтение книги онлайн.

Читать онлайн книгу The Informed Company - Dave Fowler страница 8

The Informed Company - Dave Fowler

Скачать книгу

target="_blank" rel="nofollow" href="https://www.amazon.com/Information-Dashboard-Design-At-Glance/dp/1938377001/">Information Dashboard Design by Stephen Few (my review here)

image

      Source: The Data School

      Few are complete “experts” in all of the areas of modern data governance, and the landscape is changing all of the time. If you have a story to share, or a chapter you think is missing, or a new idea, email us. Even if you don't know what specifically to share, but you don't mind sharing your story, please reach out as we are particularly interested in adding real‐world experiences and insights.

      There is already too much jargon in the data world, often created by talented vendor marketing teams. We try to stick with the most common and straightforward words that are already in use. For any jargon we do find necessary, we include a definition.

      There are many books for the old ways of working with data. We're highlighting current best practices here, so we ignore outdated terminology and techniques. In a few cases where it is beneficial to talk about industry evolution—like the change from ETL to ELT—we teach ELT and discuss the choice in a separate chapter.

      Almost every part of this book could be contentious to someone, in some use case or to some vendor. In writing this book, it is tempting to bring up the caveats everywhere and write what would ultimately be a very defensive and overly explained book. We believe this type of book is way less useful for people seeking straightforward advice. Where we have a strong opinion, we don't argue it; we just go with it. Where we think the user has a legitimate choice to make, we pose those options.

      This book aims to provide a broad overview and general guidelines on how to set up a data stack. We intentionally gloss over the details of launching a Redshift instance, writing SQL, or using various BI products. That would clutter the text, repeat what's already on the internet, and make the read quite stale.

      Not every company needs the entirety of this book. As a growing company's data needs expand, more and more of the book becomes valuable. Note, though, many best practices presented at each stage appear when they start to be relevant. These practices assume they are useful from the point they appear in the book, onward, to avoid redundancy. So it may benefit you to at least skim those earlier stages even if you and your company are further ahead.

      At the end of the book we have a section where we describe what has changed in the data world that makes this new architecture relevant and performant. We avoid explaining how our recommendations differ from previous practices like Kimball Dimensional modeling so as not to clutter the experience. Such discussions are necessary, however, and we've put them in this last section of the book.

      Lastly, throughout the book you will see the following icons:

       image Definitions

      They are related to a term found on the same page. For example, on this page, the term “data lake” is mentioned. A data lake is a staging area for several data sources.

       image Protips

      Protips expand on an idea or provide additional information about a topic related to what you read within a given chapter.

      In 2015, I used a product called Amazon Redshift. At the time, I had spent the prior 15 years of my career in a variety of roles all centered around their use of data, from analytics to marketing to operations. And while I considered my data competency my biggest professional differentiator, I had also become deeply frustrated. For all of the supposed progress in the data ecosystem, it was still slow, hard, and expensive to get insights out of data.

      But my first experience with Redshift is where that all changed for me. I have such a visceral memory of the first hour I spent with the product: queries I ran returned so fast that it seemed like absolute magic. I had spent years and years of my career writing queries and waiting for the MacOS “spinner” icon to stop spinning. Now, all the sudden, these same queries weren’t 20% faster…they were 10 to 100 to 1000x faster. I felt like I had superpowers.

      I'll let Dave and Matt actually explain how the modern data warehouse can achieve these types of performance results, but for now, just trust me that it can and does. Given that, the fascinating question is actually: what does this mean for people like you and me?

      What I've come to realize in the years since my initial experience with Redshift is that the modern data stack has exactly two very important impacts for us:

      1 With this far better tooling, we can grow our impact on the organizations we work for dramatically. I'm going to say something really stupid and obvious, but: if your queries return 100x faster, you can know 100x more stuff. And that means that you'll just be tremendously more valuable in the insights you're able to provide and the decisions you can make.

      2 As a result of #1, our career options are just far, far greater. When I started my career, “data analyst” was a junior position that you attempted to graduate out of as quickly as possible to move onto other things. Now data analysts are high‐leverage, strategic employees with earning potential that mirrors that of software engineers.

      There really has been a change in what a single analyst is able to accomplish with some fairly simple tooling and some very accessible knowledge and skills. That single individual can now construct an entire sophisticated data

Скачать книгу