Architecting Accountability – Making Data Governance & Data Architecture Work Together

January 2020, Nigel Turner, Principal Consultant, EMEA

In my last blog I explored the connections between the disciplines of data architecture and data quality. In particular, I suggested how they can work in tandem to design and deliver a more strategic approach to tackling data quality issues by combining the tools and techniques of both disciplines.

Writing this original piece led me on to think about how data architecture and data governance can similarly be combined. How can their potential synergies help resolve some of the problems many organisations face when trying to implement a working data architecture and establish business accountability and responsibility for data through governance? In this blog I’ll briefly explore this relationship.

Over many years I have helped many organisations design and implement data governance programmes. Delivering a workable and sustainable data governance framework requires the business and IT to work together to answer two fundamental governance questions:

What data domains, types, sources etc. should be within the scope of formal data governance and how should this be determined?
How should responsibility and accountability for this data be allocated?

At the heart of data governance is the requirement to establish business and IT accountability for data. This implies the need for data owners and (usually) data stewards. Owners should generally be fairly senior business managers, the stewards either business or IT people who manage the data day to day on behalf of the data owners and the wider community. Owners and stewards become the leaders of any data governance programme, assembling and coordinating cross-organisational teams to tackle data problems and drive continuous data improvement.

However the above two questions can often be difficult to answer in large organisations, particularly those with global reach. Usually these companies are characterised by having many data sources held on a variety of platforms and applications including data warehouses, CRM systems, operational databases, data lakes, ERP systems and so on. Moreover related data is often held in multiple forms across these platforms. For example, in one major UK based company I was involved with, an inventory of customer data showed that it was held in over 400 core systems and applications, some in different parts of the world. And these were just the sources that were known; there was a strong suspicion that many more shadow IT applications (Excel spreadsheets and the like) also held customer data. To make matters even messier, literally hundreds of business and IT stakeholders had an interest in this data, so all might justifiably contend that they should have a greater or lesser say in how the data should be governed and improved.

So how can you find answers to the two basic questions above when dealing with data estates of this scale and complexity? At first sight, it’s a daunting prospect, a bit akin to solving a Rubik’s cube when blindfolded, with one hand tied behind your back.

So where can you start determining what data types and sources should be within the scope of a formal data governance programme, and which to exclude, at least at the outset? This is the first challenge where data architecture can be used to good effect. A great place to start to identify key data types is using two of the core artefacts of data architecture, namely conceptual and logical data models.

When I talked about data quality in my last blog, data models were seen as invaluable tools to help focus improvement efforts. They are equally great tools to scope core data domains and data types for governance. In addition, physical data models can help to identify the key data sources where the data is held and can also point to those sources where the data is mastered. As a golden rule, always put a strong initial governance focus on the master sources (which of course might include Master Data Management (MDM) platforms and reference data) and this invariably helps to generate better data downstream as the data flows to recipient data sources, and upstream by updating and setting rules for data entry systems.

Data models can also help to answer the second question, notably how the most appropriate data owners and stewards can be identified. By exposing the high level data domains, conceptual models can help to designate the senior business executives who should own these domains on behalf of the organisation, for example product, location, supplier data et al. By documenting key data entities and attributes logical models can help to refine this strategic view and identify specific entity and / or attribute owners. Logical and physical models can also suggest data stewards by linking both business and IT data subject matter experts to entities, attributes and data sources.

Finally, other data architecture products, including dataflow diagrams, process / data maps, data catalogues, data source inventories and so on can also help to identify the creators, modifiers and consumers of the data, also valuable in pinpointing both the right owners and stewards and also the key stakeholders who have a direct interest in the data. For example, the attributes of a Product entity could be governed in a way that Product Number is owned by one or more Operations managers (maybe in different parts of the world), a Product Description may be owned by Global Marketing, and Product Price by regional Finance teams. This can then be used to ensure that in any work to improve the quality of product information that the right people are in the room and are active cross-business collaborators in the work.

Overall, therefore, there are several ways in which data architecture can help and support data governance. If an organisation already has a well-defined and maintained data architecture, with linked conceptual, logical and physical models, this can significantly improve the chances of data governance success. If these are missing or disjointed, ensure that any data governance initiative includes a workstream to generate at least the first cut of the models as an early deliverable.

Whereas data architecture certainly supports data governance, it’s also important to realise it’s not a one way relationship. In lots of organisations, many data architectures have ended up as shelfware, never implemented in the real world. There are many reasons why this can happen, but a key recurring one is that many data architecture endeavours lack business support and involvement. Business people all too often view data architecture as an academic, abstract, technical activity of little of no relevance to them. Getting business people actively involved in developing and implementing the architecture will massively enhance the chances of a successful implementation. Data governance is the best way of making this happen as business data owners and stewards will have a strong stake in the game if they are accountable for improving the data. In this case, as in the wider context, data governance is the bridge between the world of business and the world of data.

Like all data disciplines data governance and data architecture may have a different emphasis and focus, but they are mutually reinforcing. Combining them will generate synergy and rigour, where together to generate a cycle of continuous improvement. An architected approach to data governance works, so if you haven’t already reaped the benefits, why not try it in your organisation?