The Informed Company. Dave Fowler
Чтение книги онлайн.
Читать онлайн книгу The Informed Company - Dave Fowler страница 8
How This Book Was Written
This book originates in part from a project within The Data School (Figure A.2), a collection of free online books and interactive tutorials on managing and leveraging data (see dataschool.com). These resources are always expanding, much like the articles of Wikipedia: each round of updates sees our ebooks cover additional topics, go deeper on established ideas, share more real‐world examples, and better deliver that content. Our goal is to maintain and improve these resources and keep them modern.
Source: The Data School
Few are complete “experts” in all of the areas of modern data governance, and the landscape is changing all of the time. If you have a story to share, or a chapter you think is missing, or a new idea, email us. Even if you don't know what specifically to share, but you don't mind sharing your story, please reach out as we are particularly interested in adding real‐world experiences and insights.
There is already too much jargon in the data world, often created by talented vendor marketing teams. We try to stick with the most common and straightforward words that are already in use. For any jargon we do find necessary, we include a definition.
There are many books for the old ways of working with data. We're highlighting current best practices here, so we ignore outdated terminology and techniques. In a few cases where it is beneficial to talk about industry evolution—like the change from ETL to ELT—we teach ELT and discuss the choice in a separate chapter.
Almost every part of this book could be contentious to someone, in some use case or to some vendor. In writing this book, it is tempting to bring up the caveats everywhere and write what would ultimately be a very defensive and overly explained book. We believe this type of book is way less useful for people seeking straightforward advice. Where we have a strong opinion, we don't argue it; we just go with it. Where we think the user has a legitimate choice to make, we pose those options.
This book aims to provide a broad overview and general guidelines on how to set up a data stack. We intentionally gloss over the details of launching a Redshift instance, writing SQL, or using various BI products. That would clutter the text, repeat what's already on the internet, and make the read quite stale.
How to Read This Book
The book starts with a quick overview and decision charts about what the stages are and what stage is appropriate for you. This book is structured with a section for each of the four stages, and if you'd like, you can jump ahead to the stage you're at.
Not every company needs the entirety of this book. As a growing company's data needs expand, more and more of the book becomes valuable. Note, though, many best practices presented at each stage appear when they start to be relevant. These practices assume they are useful from the point they appear in the book, onward, to avoid redundancy. So it may benefit you to at least skim those earlier stages even if you and your company are further ahead.
At the end of the book we have a section where we describe what has changed in the data world that makes this new architecture relevant and performant. We avoid explaining how our recommendations differ from previous practices like Kimball Dimensional modeling so as not to clutter the experience. Such discussions are necessary, however, and we've put them in this last section of the book.
Lastly, throughout the book you will see the following icons:
They are related to a term found on the same page. For example, on this page, the term “data lake” is mentioned. A data lake is a staging area for several data sources.
Protips expand on an idea or provide additional information about a topic related to what you read within a given chapter.
Foreword
In 2015, I used a product called Amazon Redshift. At the time, I had spent the prior 15 years of my career in a variety of roles all centered around their use of data, from analytics to marketing to operations. And while I considered my data competency my biggest professional differentiator, I had also become deeply frustrated. For all of the supposed progress in the data ecosystem, it was still slow, hard, and expensive to get insights out of data.
But my first experience with Redshift is where that all changed for me. I have such a visceral memory of the first hour I spent with the product: queries I ran returned so fast that it seemed like absolute magic. I had spent years and years of my career writing queries and waiting for the MacOS “spinner” icon to stop spinning. Now, all the sudden, these same queries weren’t 20% faster…they were 10 to 100 to 1000x faster. I felt like I had superpowers.
I'll let Dave and Matt actually explain how the modern data warehouse can achieve these types of performance results, but for now, just trust me that it can and does. Given that, the fascinating question is actually: what does this mean for people like you and me?
What kind of “people” do I mean? You know—people who are involved in making decisions at companies and want to use data to help us. People who likely over the years have acquired a variety of data skills, whether that's Excel VLOOKUPs, Google Analytics dashboards, SQL, or any of a thousand other options. People who have always felt like it shouldn’t be so hard to do basic stuff when it comes to data (but of course, for some reason, it always has been).
What I've come to realize in the years since my initial experience with Redshift is that the modern data stack has exactly two very important impacts for us:
1 With this far better tooling, we can grow our impact on the organizations we work for dramatically. I'm going to say something really stupid and obvious, but: if your queries return 100x faster, you can know 100x more stuff. And that means that you'll just be tremendously more valuable in the insights you're able to provide and the decisions you can make.
2 As a result of #1, our career options are just far, far greater. When I started my career, “data analyst” was a junior position that you attempted to graduate out of as quickly as possible to move onto other things. Now data analysts are high‐leverage, strategic employees with earning potential that mirrors that of software engineers.
There really has been a change in what a single analyst is able to accomplish with some fairly simple tooling and some very accessible knowledge and skills. That single individual can now construct an entire sophisticated data