Managing Data Quality. Tim King

Чтение книги онлайн.

Читать онлайн книгу Managing Data Quality - Tim King страница 12

Автор:
Жанр:
Серия:
Издательство:
Managing Data Quality - Tim King

Скачать книгу

the actual costs and benefits end up at the pessimistic end of the scale, then you might have a hard time from the project sponsors.

      In the second case, the variance due to data quality issues is smaller (and has been declared), so you could achieve both approval for the project and avoid any unpleasant surprises later in the project.

      In the final case, the variance due to data quality issues indicates a large range for the potential payback period. The outcome could be that you get approval for a limited initial investment, say £25k, to correct or gather data and perhaps to run a proof-of-concept study. This is likely to help reduce the level of risk and uncertainty associated with the project and potentially avoid the organisation committing funds to a non-viable project. The outcome of this limited project phase would then be suitable to allow the whole project to proceed, having helped to reduce the risk of cost over-runs and implementation issues.

      Challenges when exploiting and managing data

      19

      The data triangle

      To support effective data exploitation, it is useful to consider the ‘data triangle’, shown in Figure 2.2. This illustrates three factors that need to be present to ensure effective data exploitation; similar to a tripod, if any one of the elements is not present, then the overall approach will not stand up. The three factors are as follows:

      Analytic and processing tools: You need to ensure that you have both an effective tool set and the individuals with the skills to use it efficiently and effectively.

      Subject matter expertise: The world’s best experts in your chosen tool set will only be able to provide minimal benefits unless they are supported by suitable subject matter expertise. This will help to ensure that they are able to understand the context of the particular exploitation challenge, to interpret the meaning of the data, to spot obvious data issues and to provide validation that outputs are sensible.

      Data of known quality: It is important not only that you have sufficient data to be able to undertake data exploitation tasks, but also that the quality of these data is known. Clearly, it would be great to have ‘perfect’ data. In real business situations, however, there will be data quality issues that cannot be corrected in a timely or cost-effective manner. The previous discussion on complex decisions showed the importance of knowing and stating the quality of your data and the impact they have on extracting appropriate insight and foresight.

      Figure 2.2 The data triangle

      Managing Data Quality

      20

      Data as a raw material

      When using data, many individuals assume that they can use all the data ‘as is’. This assumption is flawed, as explained by the analogy of how artisans making furniture vary their approach according to the raw materials they are using:

      For a raw material such as newly manufactured metal or plastic, the product will be highly consistent with conformance or test certificates proving the quality of the material. This consistency of product means that all the material could be used, with the main challenge being how to minimise wastage.

      For a raw material of seasoned wood, the product is likely to have some pieces with knots in them, be slightly warped and/or include areas of woodworm or rot. In this case, the artisan assesses each piece of wood to determine the best way to utilise it in order to make furniture, perhaps rejecting pieces that were too rotten or warped or choosing to use the more knotty pieces of wood for the back of the furniture where it is less visible/critical.

      When undertaking data exploitation, data are the raw material – but are a raw material that can vary from being like a metal through to being like wood. Before undertaking any analysis using data, it is important to consider whether the data are like wood and need understanding of their quality to know which data should perhaps be ignored, which may need cleansing and which seems reliable. The aggregate assessment of the quality of the input data will help inform how you describe the confidence levels of the data outcomes.

      Ideally, organisations will improve the sourcing of their data in the longer term, and look to implement the quality control and certification that ensures the data become like the metals and plastics that enable consistent manufacturing processes in this analogy.

      The data machine: expectations vs reality

      Acquiring, storing, managing and exploiting data within an organisation involves many activities and processes. These activities could be thought of as a machine powering the organisation. If you had to visualise the ‘data machine’ that represents your organisation, what sort of machine is it?

      In a quarry, an excavator is a large (and expensive) piece of equipment that is essential for loading rock blasted from the rock face into dumper trucks for transporting to the processing plant. These are sophisticated and powerful machines that, if maintained correctly, should load rock effectively and efficiently for many years.

      Implementation of an enterprise software solution can be likened to this excavator – it can provide a powerful and, arguably, efficient means to run business processes and acquire and analyse data. Similar to a major piece of equipment in a quarry, this will be a large investment for the organisation.

      If you were to visualise the data processing activities of your organisation as a machine, would it be similar to this major item and be a single, efficient and effective entity? In reality, perhaps, there could be a few manual work-arounds to overcome deficiencies in certain

      Challenges when exploiting and managing data

      21

      areas, the use of some spreadsheet-based tools for manipulating data and interfaces with other corporate systems that do not function perfectly. In such cases, the machine that you imagine to represent the way that your organisation exploits data will be more like the types of machine imagined by the artist William Heath Robinson in the early 20th century: complicated, inefficient, tenuously held together and lacking proper documentation.

      Whilst you perhaps visualise your data machine as being closer to the latter, how would the senior leaders in your organisation view it? If they have recently invested in enterprise software tools, then they could believe that the ‘data machine’ is a single, efficient entity, when in reality there are a number of data quality issues arising from data migration problems, interfaces that do not function correctly and so on.

      This analogy illustrates how organisations can have an unrealistic perception of the way that they process and exploit data, believing them to be far more coherent and efficient than they actually are. Later in this book we define approaches to help provide structure and definition to your ‘data machine’.

      Do your data trust you?

      You could ask (or be asked) the relevant question: Do you trust your data? What if you were to turn this question on its head? This leads us to think about the issues from the perspective of the data, that is if I am a data set can I trust the people who handle me to do the right thing with or to me? Do your data trust you? This can be another way of illustrating the importance of managing data correctly.

      The data set’s view

Скачать книгу