Data Quality – An Architecture for Success

October 2018, Nigel Turner, Principal Consultant, EMEA

I started working in data management in the dim and distant days of the mid 1990s when I was part of a small team in the Business Strategy function of a global telecommunications company.  The team had been tasked with exposing the root causes of a problem that the CEO at that time had homed in on.  Although many millions had been spent on acquiring and deploying large scale IT solutions to support the company’s operations, this investment was considered to be at least in part a failure as there was growing evidence that operational inefficiencies and process breakdowns were commonplace, despite the fact that the business cases for these investments had promised new IT would improve them.  Worse still, all this investment had, in some cases, made the problems worse and not better.

After many interviews with key people across the company, two main themes consistently emerged.  The first was poor management of IT requirements, which led to purchases of software that did not align with operational and process needs.  The second was that although many new databases and data warehouses had been deployed to store data and make it more accessible to the business processes that depended on it, the data within them was overall not fit for purpose.   Uncontrolled data duplication, missing data, inaccurate data etc. were recurring issues raised time and again.  We soon gathered a compelling body of evidence to show that our processes were failing all too often because the data they relied on was not of the quality expected.  Poor data quality, and how to fix it, became the predominant challenge.  I was the (un)lucky team member given the task of doing something about it.

Over the course of the next decade, data quality became my consuming passion.  A business-wide data quality programme was created, lots of improvement projects completed, and many benefits accrued, including a decrease of revenue losses, cost reduction, better customer management, reduced legal and regulatory risks and so on.  But though it was an acknowledged success both within and outside the company, it all took a long time, and consumed a lot of resource and effort.

But what has this piece of 20th century history got to do with today’s data management challenges?  In my view, it is as relevant as ever.  The data quality problems outlined above persist today in the great majority of organisations we in Global Data Strategy work with and talk to.   On the positive side, many things have improved since those early days of data quality.  The drive to automate processes as businesses seek to become increasingly digital have elevated data quality up the corporate agenda of many organisations, so that anyone promoting the data quality cause today is no longer viewed as a strange, swivel eyed eccentric, as many of us were in the pioneering days.   My first lesson in data quality was that you cannot automate or digitise a process unless the data supporting it is accurate and complete, and this is as true today than ever, and a growing number of companies recognise this.  Moreover the software tools available to profile & analyse and enhance data quality have improved beyond all recognition. Today there are few large organisations that do not have at least one dedicated data quality tool somewhere in their armoury.

Despite this, the problems continue.  There are many reasons for this.  The main one of course is that today’s data challenges make those of the 1990s look like child’s play.  The volume, complexity and speed of data processing has exploded.  The range and scope of data platforms have expanded, now embracing both long established data sources (data warehouses, operational data stores, CRM etc) and newer ones, including master data platforms, big data lakes, product lifecycle management tools, analytics platforms and so on.  So although approaches, techniques and tools have made a giant leap forward, the chasm between data quality needs and the capacity to deliver them is as large as ever.  And a giant leap forward is only worth taking if you jump far enough to bridge the gap; if not, you fall and you fail.

So what’s needed to ensure a successful data quality strategy and approach, given today’s formidable challenges?  Reflecting on my early experiences, there were two things that could and would have helped us to deliver better data quality more quickly and efficiently.  The first was a more rigorous method for prioritising data quality problems, so we could focus our resources more effectively.  The second, and related missing element, was a means to identify which data mattered most, as it was most crucial in business operations and / or it was used across multiple processes and platforms.  As we did not have a clear view of this in the early days, we tended to address issues bottom up, i.e. identified a specific problem, put a team together to analyse the root causes, derive solutions, and deliver them.  Sometimes, more by luck than judgement, we hit upon an issue that would benefit other areas of the business, and so duly gave it a higher priority.  But it could be a hit and miss process.

Today every organisation has data quality problems and the scope and scale of data is such that all cannot realistically be tackled.  So the same issues arise.  Where should you start?  What data is highest priority and why?  In addition, what would ‘good’ look like, how do we define that, and how do we know when we achieve it?

Another data management discipline has a great deal to contribute if these questions are going to be answered, and it is an area of data management all too often neglected in many companies.  I am referring to business architecture generally and data architecture in particular.  In my early telecommunications days we had no data models to refer to (other than detailed technical physical data models associated with specific platforms and systems), no methodical way of identifying the interrelationship between data and business processes, and no formal, agreed business definitions of key data entities and their attributes. These are all things that a sound, dynamic data architecture provides.

To tackle data quality more strategically (more top down rather than bottom up) linking it closely with data architecture has huge value.   To list some of the main benefits:

·        Conceptual and logical data models highlight both the most important data domains and entities and so provide an ideal starting point for focusing data quality endeavours

·        Other architectural models (e.g. dataflow diagrams, process ‘swim lanes’ and so on) highlight the interdependencies between business processes and data and so help to identify which processes would most benefit from specific data quality improvements and so inform prioritisation and focus.

·        Attributes identified in the data models help to define the data standards specific data fields need to adhere to.  This can help to specify data quality improvement targets and thresholds required (in terms of both format and content), help to quantify the gap between desired adherence and actual, and form the foundation of data quality business rules needed to clean up the data and maintain its quality.

·        Effective metadata management is an essential component of data quality improvement.  Architectural artefacts are the starting point to provide this.

To conclude, tackling data quality problems in today’s enterprises requires a much more strategic and architectural driven approach than was the norm at the dawn of data quality initiatives.  Using architecture to frame and focus data quality efforts is essential.  I wish I had known that back in the 1990s, but as in all things in life, it’s better late than never.