Data Management: a gentle introduction. Bas van Gils

Чтение книги онлайн.

Читать онлайн книгу Data Management: a gentle introduction - Bas van Gils страница 17

Data Management: a gentle introduction - Bas van Gils

Скачать книгу

id="ulink_bf0f613a-b855-54b5-93ef-d22ef4d8cb52">In chapter 2, I briefly discussed the data/ information dichotomy and claimed that, at least for purposes of this book, there is little difference between these two terms. In my view, data codifies what we know about the world in the form of text, numbers, graphs, images and so on1. In more technical terms, this means that data can be either structured (which typically means it is in tabular form), unstructured (such as a random piece of text), or semi-structured (such as an e-mail, which consists of a header and main body).

      Linguistically, the term data is the plural form of the term datum, which at least suggests that there is something such as a “basic building block for data”. I’ll call this a data point. I’ll use the term record to signify a (semi) structured group of data points that belong together, similar to paper records in physical catalogs used prior to today’s digital data storage. Note that records typically consist of several standardized fields that are filled in with actual data points. The easiest way to understand what a field is, is to think of a record as a form with predefined fields that can be filled in. Last but not least, a group of records together forms a data set. Example 10 illustrates these definitions.

       Example 10. Data, data point, record, field and data set

      The diagram shows data that is stored in a system (outer box). The small inner boxes signify the records in this system. Each record has three fields: the name, birthday and birth city of a per- son. In this example there are six records in total, each having three data points matching the three fields that make up a typical record. The top row is grouped (dashed box): this signifies the data set with records about people that were born in Tilburg. Another potential data set would be: the group of all records for people born before 1960.

Illustration

      Data is stored in systems in various ways. By far the most common way to structure data in systems is through tables such that each row of the table maps to a record (see also section 4.5). More precisely put: the column headings of the table match the names of the fields in the record, and the intersection of rows and columns (the “cells” of the table) contain the individual data points. Example 11 builds on the previous example and illustrates these definitions.

      The diagram uses a model fragment to show how tables in a data store are defined. Modeling is an important part of DM. Data models – as well as other types of models – are explained in more detail in chapter 11.

       Example 11. Storing data in tables

      The lower part of the diagram is taken from the previous example and shows three person-records. However, this time each record also has a unique ID. The top part of the diagram shows the definition of what a typical record looks like. It shows that each record has four fields and also shows the data type. Last but not least, it shows whether a field is automatically generated or not.

      The example has two tables that are related through a dependency. These links between tables make it possible to answer questions such as “show me all orders where the customer was born before 1960”.

       Illustration

      The previous section discussed data from an IT perspective. In this section, I will switch gears and discuss data from a business (process) perspective. This is a major shift to another level of abstraction: rather than considering exactly how data is structured and stored in systems, this perspective is all about understanding which type of data is required to make processes run.

      One of the things that is key for good data management is that these business concepts are clearly defined. This often leads to the creating of a (business) glossary. The glossary is discussed in further details in chapters 10 and 28. By studying these definitions, it often becomes clear which business concepts are related. These relationships can be documented in a conceptual data model, which will be discussed in chapter 11 (see also section 4.4 on information/ data analysis).

      Example 12 illustrates the main points from this discussion.

       Example 12. Data in processes

      The diagram shows a single invoicing process which has an order as input and an invoice as output. These business concepts are related to each other, as well as to other business concepts. The solid arrows indicate these relationships. The labels on these relationships give an indication of how to interpret them.

Illustration

      The questions that remain are: how are business concepts stored in systems? How are the business and IT perspectives connected? When database systems became popular in the 1970s, a technique was developed to analyze and “normalize” data structures in an effective manner: the relational model [Cod70, Cod79, Dat12] (see also section 4.5). Around the same time, various modeling approaches were developed to visualize what these data structures should look like. Chief among them was the Entity Relationship Model [Che76]. The main idea behind this type of modeling approach is to analyze how business concepts should be structured in such a way that they can efficiently be stored in database systems. This level of analysis straddles the business and IT perspectives. Models at this level of abstraction are often called logical data models, something which will be discussed in more detail in chapter 11.

      What is relevant for purposes

Скачать книгу