Data Management: a gentle introduction. Bas van Gils

Чтение книги онлайн.

Читать онлайн книгу Data Management: a gentle introduction - Bas van Gils страница 19

Data Management: a gentle introduction - Bas van Gils

Скачать книгу

is used by a local factory in its production process, polluting the water somewhat. As long as the factories are downstream, most of the upstream citizens won’t mind too much. This is the equivalent of data that “moves” through the organization, from system to system, to be used in various processes. There is a high risk of introducing “pollution” in the form of problems with the data quality. Here too, if you are upstream then the problems with data quality from downstream won’t affect you too much.

      The water keeps flowing and a bit downstream there is a big dam and power plant. Here the speed of the flow of water is controlled and it is used to generate power for local towns. After the dam, the river forks. One of the streams flows into a water cleaning facility after which it continues on to the next village. The other end flows to what used to be a cool beach but is now deserted because of the polluted water. This is called data movement and techniques related to it stem from an area called data integration. The equivalent of the dam/ power plant is a system that controls the flow of data. After that comes the fork in the data river. In one data stream there is a data quality solution that cleanses the data making it usable for local users. In the other stream there is no such solution.

      Last but not least, police boats have started patrolling the river to make sure no other illegal dumps of waste take place. The equivalent of the police boat is a governance structure where data management professionals check for misuse of data or prevent the introduction of errors into the data.

      Note how, in both cases, the emphasis is on flow. Note also, that a local solution (e.g. cleaning water/ data) helps in one point but not in the other – it may be a better idea to fix issues upstream. Note also that in both cases, governance structures are in place to make sure things run smoothly. In both cases, the point is to make sure that certain things are in place (“grip”) such that value can be derived from the asset, be it water or data. The comparison can probably be extended further but this gives a fair indication of how water and data are similar.

       The analogy of the data river was invented by Luuk Spronk-van Lieshout and Emine Ozturk, two data stewards at PGGM – a Dutch pension provider.

      In this book, I will take the point of view that DM is an organizational capability. The capability of the organization depends on certain resources (people, systems) being in place. These resources together make sure that there is enough grip on data and enable the organization to use data and get value from its data assets.

      Considering that DM is about balancing between “grip” and “value creation”, a notion that needs careful exploration is the data lifecycle, which is the process of creation – use – archive/ destruction of data. Example 14 briefly hinted at it already: data is created somewhere (presumably in a process, leading to an update of systems) and is subsequently used in many places in the organization. The one point that is missing from this exploration is that data should eventually be archived/ destroyed. In many industries, there are regulations in place that stipulate when and how data should be archived/ destroyed.

      What makes it hard to manage data along its lifecycle is that it never gets used up: you can make as many copies as you like without impacting the “original”. These copies may float around the organization and there is no telling what they will be used for if you are not careful. In order to successfully manage data across its lifecycle, organizations should at least keep careful track of where data goes but also should have governance structures in place to make this happen.

      Balancing between the two goals of DM is a big task and many things have to be in place to make that happen. One of the strong points of the DMBOK that I have mentioned several times so far lies in the fact that it has broken down the field of DM into smaller pieces called functional areas. For my purposes, it makes more sense to call them (sub) capabilities, signifying that together they contribute to the overall DM capability. Figure 7.1 shows what this partitioning looks like. This visual is often called “the DMBOK wheel.”

      What the DMBOK does is take each of these areas and attempt to give a broad overview of what its objectives are, which activities are part of it, which inputs/ outputs can be expected, and what type of tooling are required for support. It also describes good practices. The book is written by many authors, each taking care of a particular area. Unfortunately, this means that not all chapters are equally well aligned and that there are several small inconsistencies in the book. All in all, it is an impressive work which offers a great introduction to, and guidance for the field of DM.

      Looking at the wheel, note how some areas appear to consist of two topics. For example at the bottom it says there is an area that covers reference data management and master data management. In this book, I will take a slightly different approach and make sure that – in the chapters to come – each chapter covers a single topic. I have taken a slightly different perspective that is mostly in line with the wheel. I have deliberately left out certain topics such as database operations management (which is, in my view, mostly an IT capability dealing with how data technology should be run and operated) and document & content management (which deals with unstructured data: whilst this is important, it is not the focus of this book). I will cover the topics listed in table 7.1. Example 15 illustrates that in practical settings, many of the DM capabilities are required together to achieve success.

Illustration

      Figure 7.1 The DMBOK wheel

       Example 15. Data management example

      This example is based on a real-world case at a Dutch governmental agency in the mid-1990s. One of the challenges this organization faced was a large backlog of reports that had to be completed from a regulatory perspective (business intelligence, reporting). Creating these was far from easy because data was dispersed over many systems across the organization, and there was no standard environment (e.g. a data warehouse) to bring it together (integration). To make matters worse, different departments and professionals were in disagreement about key aspects such as data definitions, ownership of data, and quality of the data (governance, quality).

      Ultimately this was, of course, resolved. It took years of debate and several reorganizations to solve these problems. One of the key success factors in the end was that the organization leveraged processes, systems, policies, and procedures that were already in place and extended them one step at a time.

ChapterTopicShort introduction
9Data governanceData governance is the enterprise discipline concerned with starting, managing, and sustaining the DM program. Key topics are accountability, decision-making, and supporting the program.
10MetadataMetadata is, loosely defined, data about data. Anything you know about your data is metadata. This is a foundational thing for all the other capabilities: it is crucial to know definition, location, etc. of your data.
11ModelingModeling is all about “making sense of data through boxes and arrows”. I have already shown some examples in chapter 6. This area is closely related to Architecture, and focuses on (data) modeling techniques.
12ArchitectureArchitecture is about “fundamental properties of a system, and the principles guiding design and evolution” [ISO11]. The key challenge relates to getting to

Скачать книгу