Site icon Global Data Strategy

Data Quality – a Multidimensional Approach

Data Quality – a Multidimensional Approach

August 2020, Nigel Turner, Principal Consultant, EMEA

In my blogs and articles over the lockdown period I’ve avoided talking about the impact of the Covid 19 pandemic and the heavy reliance on good quality data to support the models needed to combat and mitigate its effects.   I have decided to break my silence in this blog  as a major data story recently hit the headlines in my part of the world, Wales in the United Kingdom. This story was literally so close to home that I felt impelled to highlight and comment on it, and use it to stress why the need for good data quality is more important than ever.

As the lockdown was imposed, the Welsh Government compiled a list of 75,000 people living in Wales who were classified as ‘vulnerable’.  These were mainly older people or those with existing health conditions that would make them particularly at risk should they contract the virus.  Letters were duly sent out advising them to stay at home for 12 weeks.  It soon became evident, however, that not all was well.  It was reported that 13,000 letters were sent to the wrong addresses (that’s over 17% of all letters sent) with the outcome that 13,000 vulnerable people were not advised to shield, and others not in the vulnerable category told to stay at home.  I don’t need to spell out the implications of this; high risk people may well have become severely ill or worse as a result.  It also damaged trust in the Government at a time when combating Covid relied heavily on the population complying with its instructions.

The Welsh Government blamed these problems on a ‘processing error’ (a standard non-explanation when these things happen).  They duly re-sent 13,000 letters but even today  there are vulnerable people still reporting they have not received any advice, and healthy people being told to shield when they don’t need to.  The problems rumble on.

A more plausible explanation for the problem is rooted in underlying data quality issues.  The Government assembled its list from several health and social services data sources across Wales and it emerged that this data had a plethora of problems.  These included incomplete and missing data, duplicate data records, out of date addresses and contact numbers, and an inability to merge sources effectively into a definitive list because of a lack of data standards and resulting format and content inconsistencies.

This tale illustrates all too well why data quality matters.   It matters because all other data disciplines rely so heavily on it.  Predictive modelling, analytics, business intelligence et al can only produce reliable results and models if the underlying data that feeds it can itself be relied on.  Clearly in this case, the data couldn’t be relied on.

So how can you ensure that data quality is fit for purpose, whether it’s combating Covid,  estimating product sales or marketing to potential customers? Creating and maintaining good quality data depends on five basic activities:

  1. Understand what data is stored and processed and how it is used within an organisation
  2. Baseline the current quality of the data and assess how well it is meeting business needs and uses
  3. Where the data is not fit for purpose, set quality improvement targets to guide remedial activities
  4. Undertake improvement initiatives (encompassing people, process, technology and the data itself) and measure the impact against targets
  5. Regularly measure the data and continue to improve and maintain it so that it meets current and future business needs

In this process, a critical activity is to measure the data and set improvement targets. To do this it’s important to recognise that one data quality measure is never enough; data quality consists of several dimensions and so any data set needs to be evaluated against these.  The bad news is that there is no standard industry agreement on what these dimensions are, and there are many variants.  I’ve personally always favoured the following 7 dimensions, split into two categories: Content (focused on the data itself) and Context (focused on the use of the data).  The five Content dimensions are:

The Context dimensions are:

Data can then be baselined against these dimensions.  Targets can be set and reported on regularly in support of a programme of continuous data quality improvement.   I end with a  few final tips to make this approach work in practice:

Taking this multi-dimensional approach to tracking and improving data quality is essential.  If these disciplines had been adopted on health and social care data in Wales before the onset of the pandemic, many of the problems which emerged would have been avoided or at least reduced.   Good data quality not only enhances trust and reputation, it can even save lives.

 

Exit mobile version