Data Quality Shortcomings: Another Case for Data Governance

April 2021, Nigel Turner, Principal Consultant, EMEA

Data quality has again made headline news in the UK, and as usual for all the wrong reasons.  In February 2021 UK TV and press extensively covered a story which emerged in the city of Liverpool, a place best known as the home of the Beatles.  This story was however not centred on ‘Yesterday’ and the rest of their back catalogue but on today’s most pressing priority.

The UK has embarked on an ambitious vaccine roll out programme to combat Covid-19, as Britain has been one of the hardest hit countries on the planet with a death toll exceeding 120,000.  Already nearly 18 million people have received the first dose of the vaccine, around one third of the adult UK population.

The roll out has been targeted to prioritise the most vulnerable groups as defined by age, underlying health conditions, other risk factors etc. It was therefore surprising when a fit and healthy 32 year old Liverpudlian man with no underlying health issues received a text invitation to attend a vaccination session.  When he queried why he had been selected, he was told by his doctor that he had been classed as ‘morbidly obese’ by the authorities, which placed him in a high priority vaccination category.  This puzzled him further as he is of average weight for his height.

Further investigations revealed the truth, and it didn’t take Sherlock Holmes to figure it out.  When his height had been registered in his medical record it had been input not as his actual height of 6 feet 2 inches but as 6.2 centimetres.  If he was indeed 6.2 centimetres tall he would be far more likely to die by being savaged by his neighbour’s cat than as a result of the pandemic.  On the other hand he would make a great choice as lead for the next Ant-Man movie, saving the studio a fortune in special effects.

There was nevertheless some good news in this story:  his weight had been correctly recorded.  The bad news is that an algorithm had then calculated on the basis of his height / weight ratio that his Body Mass Index (BMI) was 28,000, more than a little above the BMI threshold of 40 which defines ‘morbid obesity’.  Needless to say the error has now been rectified and he is back in his rightful place lower down the vaccine queue.

This story made many people in the UK smile at a time when the seemingly endless lockdown, combined with our habitual dark, dank, damp winter, was depressing people’s spirits.  But what was an amusing tale also exposed a more serious data problem, a fact acknowledged by the chair of the Liverpool Clinical Commissioning Group (CCG), leading the vaccination roll out in the city.  She was quoted in the press as saying ‘I can see the funny side of this story but also recognise there is an important issue for us to address’.  She is of course referring to the accuracy of clinical data at a time when the reliance on good quality data is critical in the world’s battle against Covid-19, and when it’s vital for the authorities to maintain the trust and cooperation of the public.  Although this problem was brought to light and corrected, how common are data quality problems in clinical and patient data generally?  It’s clear from her remarks that the chair of the CCG suspected there may be more nasties lurking in the data the vaccine programme depends on.  And many industry and health surveys bear this suspicion out.  For example, one cross-industry survey in 2017 concluded that only 3% of all data records examined contained no inaccuracies and errors, so 97% did, a concerning conclusion. [1]

So what’s all this to do with data governance?  This rapidly expanding discipline places responsibility on accountable individuals to ensure that the data they are answerable for meets both the needs of the organisation which collects and processes it, and the people that organisation serves.  A primary reason why so many organisations today are implementing data governance programmes is that they have recognised that poor data quality is often the norm, not the exception.  They see data governance as key to solving this problem by ensuring a continuous focus on making data fit for its intended purposes.

How would data governance have prevented the shortcoming (in more senses than one) exposed in this Liverpool case?  In a governance framework accountable individuals would have been responsible for monitoring the accuracy of the data throughout its lifecycle from creation to deletion.  Part of this role would be to define and enforce the so called business rules which are used both to evaluate the quality of the data and to drive improvement.  In this case it’s self-evident that a 6.2 cm tall adult is just plain silly.  Almost certainly it was caused by a data entry input mistake made when the record was created or updated.  A simple business rule could have been devised and implemented to prevent and highlight such errors, for example by specifying a minimum and maximum allowable height for an adult, say between 4 and 7 feet tall.

Implementing this when the data input process was designed would have prevented the problem at source.  This clearly didn’t happen here, probably as no accountable data owner or data steward was in place to specify the rules when the application requirements were drawn up and implemented.  Catching it later by retrospectively implementing the rule is a second best option but would nevertheless have rapidly highlighted the error.  It could then have been investigated and put right before this story ever saw the light of day, reinforcing a truism that in data governance no news is usually good news.

The days when data quality inadequacies could be ignored by organisations are at an end.  In the world of social media and 24/7 news problems can be exposed and made public in an instant.  Does any organisation really want to risk becoming the next headline maker for the wrong reasons?  If not, data governance may be the answer.  If you haven’t tried it yet, I strongly recommend looking into it.

[1] ‘Only 3% of Companies’ Data Meets Basic Data Quality Standards’, Tadhg Nagle, Thomas C. Redman & David Sammon, Harvard Business Review, September 11 2017.