Managing Data Quality. Tim King
Чтение книги онлайн.
Читать онлайн книгу Managing Data Quality - Tim King страница 11
2 Challenges when exploiting and managing data
Managing Data Quality
16
Complex decisions
Decision making in most organisational contexts can range from very simple through to extremely complex. Simple decisions, such as where to send a technician, will possibly not require much input data and will only have limited consequences if the wrong decision is taken. In contrast, more complex decision making, such as strategic planning, life cycle costing and project planning, is likely to have more extensive decision logic, more significant consequences and a greater reliance on the quality of input data.
To look at a slightly different context, in a biology or physics experiment, it is likely to be understood that there are factors that cannot be fully controlled in the experiment and that the accuracy of measurements is not perfect. Therefore, results are often expressed as a range, for example 254 +/- 10. In the case of many complex business decisions, the quality of input data will be as variable as that encountered in such a biology experiment, yet typically the outputs are not likely to include any expression of sensitivity. This could easily lead to incorrect assumptions about the certainty of the decision.
In summary, you must understand the decisions that your data support so that you can determine the extent to which data quality will influence the reliability of those decisions.
Virtuous circle or downward spiral?
In general, the decision-making process will be influenced by data quality. What you should be trying to avoid is a downward spiral where poor data quality leads to poorer information quality. In turn, this will tend to lead to incorrect business decisions and hence worse results. A poorly thought out project, decision or activity is likely to lead to worse data being generated as a result of staff being demoralised by the inappropriate project, thereby continuing the downward spiral of ever decreasing data quality.
What you should be aiming for is a ‘virtuous circle’ (see Figure 2.1) whereby improvements in data quality deliver improvements in information quality. These in
Figure 2.1 The virtuous circle of data quality
Challenges when exploiting and managing data
17
turn should improve the quality of the decision making of the organisation and, in turn, should lead to better results or outcomes. With better decisions and results, it is likely that the data arising from these activities will also be better quality, particularly in more ‘data aware’ organisations.
Unclear data ownership
Some people talk about ‘data owners’ and organisations often worry about assigning ownership to particular data sets. Since many business processes can create, use and amend similar data (e.g. customer data being updated as part of many processes), an assigned owner of data will struggle to retain influence over the activities that are changing the data and contributing to poor quality.
By assigning empowered process owners and maintaining explicit specifications for the data being created by a process, organisations can establish a more solid foundation for the control of data quality. If such data specifications have been defined, agreed and published to reflect the decisions being supported by the data, it becomes possible to assess compliance objectively. Many individual sources (e.g. different suppliers or departments) can then contribute to the data, but always in accordance with the single, definitive data specification.
Managing data storage is not ownership; effective owners are those who can determine the extent to which data are appropriate for the needs of the business.
Backups and data quality
All well run information technology (IT) systems will have an agreed backup regime in place, ensuring the organisation is able to restore a full or partial system in the event of major hardware or software issues. This will typically include a range of daily, weekly and monthly backups, plus off-site storage and perhaps standby ‘failover’ systems, to ensure that if, for example, there were to be a fire on site, software systems and services could still be restored from scratch without undue delay or data loss. Backups can vary from a ‘full’ backup (copying all data and information to backup media) through to an ‘incremental’ backup that only copies data and information that have changed since the last backup was run.
Depending on the backup service level agreement, the data backups can usually give a view of the data from one or two months ago. However, if you were to try and restore your data to a backup from some time ago, you will almost certainly ‘lose’ all the data updates that have been made since the backup was created. Such a scenario requires both the application of incremental restores and the analysis of which updates to ignore and overwrite.
If staff have been entering poor quality data over a significant period of time (perhaps over many years), then it will be extremely difficult, if not impossible, to go back to a point where the data were ‘good’ and find the correct data. This is particularly challenging
Managing Data Quality
18
when there are some teams who are very diligent (whose data you do not want to correct or change) and other teams who are more careless (whose data you will need to correct).
The reality is that system and data backups provide little or no value when trying to resolve data quality problems.
Data quality and lack of transparency in business cases
In an organisational context, it is rare for a business case to be expressed along the lines of ‘based on the quality of input data, we believe project costs are likely to be between £240k and £320k, with benefits in the range of £80k to £160k per annum’. In this example, the worst-case forecast would give a payback period of four years, whereas the best case would suggest that the project covered its costs in only 18 months. Clearly, this represents a large range of outputs, and, depending on how the costs, benefits and payback are presented, there will be very different perceptions of the level of risk presented by the project. Based on this example, consider the impact if the results were presented as follows:
1. ‘The project will cost £280k, deliver benefits of £120k per annum and therefore achieve a payback of 2.3 years’ (these are based on the mid-points of the above scenario).
2. ‘Based on the assessed quality of input data, we forecast the project costs to be between £260k and £300k with benefits in the range of £100k to £140k per annum’ (these are based on the mid-points of the above scenario, but with lower variance in costs).
3. ‘Based on the assessed quality of input data, we forecast the project costs to be between £240k and £320k with benefits in the range of £80k to £160k per annum.’
In the first case, the business case sounds attractive, so you will probably be able to gain project