Data Management: a gentle introduction. Bas van Gils

Чтение книги онлайн.

Читать онлайн книгу Data Management: a gentle introduction - Bas van Gils страница 21

Data Management: a gentle introduction - Bas van Gils

Скачать книгу

Doe, John H. Doe, John Howard Doe). Can we reconcile this? Can we figure out with any degree of certainty who is who and which products were purchased when Mr. Doe calls with a complaint?

      Most transaction systems only hold the last version of data. This means that when a customer moves from A to B, then the fact that he used to live at A is often lost. Transaction systems typically also store data at the finest level of granularity. For (historic) reporting and (predictive) analytics this may not always be the best solution. This is where the category of business intelligence (BI) data comes into play. The idea is to create data sets of transaction data and master data. This data set contains historic data for timeline analysis. The data set is structured in such a way that data can be easily aggregated and summarized for reporting and analysis purposes. Chapter 18 will discuss BI in more detail. Example 18 illustrates this point.

       Example 18. BI data

      Suppose your company has a number of product lines: the CoolX line of products as well as several others. Even more, the company also offers various services to customers. Separate systems keep track of all purchases, services requests, payments and so on.

      From a reporting perspective, management may be interested in questions such as: how many products of a certain type did we sell per store and how does that deviate from previous quarters? Do we retain our customers when they move? A similar line of reasoning applies to analytics questions such as: what would be a good service to cross-sell with our CoolX product to a specific group of customers? To be able to answer these questions, data must be consolidated. Often this means looking at the data from a historic perspective. Even more, individual records are less important in this situation than the patterns that are present in the data.

      Whether this type of data is updated frequently depends on the architecture of your information systems landscape. In some cases, updates in data from transaction systems and master data systems are pushed to the BI environment once or twice a day. In other situations, this is done in (near) real-time.

      The fourth type of data is reference data which is perhaps the most elusive of all. Reference data is used to make sense of other data, often through codes or hierarchies of codes. The idea is that by using a code, you give a very precise meaning to something that is potentially very complex. Example 19 gives two simple examples.

       Example 19. Reference data

      As a more complex example, consider the use of industry classification codes to label organizations you do business with. For example, code 440000 is all retail traders, 445000 is a child of 440000 and is the code for food and beverage stores. Code 445200 is a child of 445000 and signifies specialty food stores such as 445210 (meat markets), 445220 (fish and sea food markets), and 445290 (other specialty stores). Using such codes consistently allows us to easily find all specialty food stores by looking for all stores that are labelled 445000 or one of its sub-codes.

      Reference data may seem like a really simple and straightforward concept yet in practice this is hardly the case. In chapter 14, I will discuss the relevant theory in more detail. Also note that reference data tends to be static. Using reference data in real-world situations will be discussed in more detail in the examples in part II in this book.

      The fifth and last type of data that I will discuss is metadata. Loosely defined, metadata is “data about data”. Anything you can know about your data is metadata. Through metadata you can answer questions such as: what is the definition of “customer”? In which processes do we create customer data? How does customer data flow through our information systems? The list goes on and on. As an organization you can (and perhaps should) collect metadata about all other types of data. Having a good set of metadata available is foundational for managing and governing your data. Metadata is discussed in more detail in chapter 10.

Illustration

       Illustration

       Synopsis - In this chapter, I introduce the topic of data governance. Data governance is the capability that deals with accountability for data. I will first position data governance in relation to (other) data management (activities). Then I will provide an overview of key data governance themes based on the Data Management Body of Knowledge (DMBOK) [Hen17]. Last but not least, I will give an overview of a modern approach to data governance based on three key roles: data owners, data users, and data stewards.

      The word governance, or its associated verb to govern, has many definitions and interpretations, depending on the context in which it is used. Many people seem to associate this word with (the use of) power; with laying down and enforcing the law. This view is indeed close to the Merriam-Webster Dictionary definition which uses phrases such as “to exercise continuous sovereign authority over” and “to control, direct, or strongly influence the actions and conduct of”. The DMBOK defines data governance as follows [Hen17]:

       Data governance is defined as the exercise of authority and control (planning, monitoring, and enforcement) over the management of data assets.

      This definition screams a command-and-control, top-down approach to governance: make plans, define rules, implement, enforce, and punish when the rules are not followed. This isn’t the only way to implement data governance, though. In this chapter, I will first show how to position data governance in relation to data management. I will then follow-up with a discussion of the data governance activities as listed in the DMBOK and a discussion of a modern approach to data governance through data stewards, data owners, and data users. I will end the chapter with a brief discussion of the relationship between data governance and other governance processes that may be followed in the organization.

Скачать книгу