Data Governance

Data is an increasingly important strategic asset for an organisation and like all important assets it should be managed within a formal framework.  This reduces risk, increases opportunities to add value, and gives a clear picture of an otherwise hidden investment.  This management of data as assets is called Data Governance and is made up of the following components:

Data Asset Inventory

This is a simple register of all datasets that match any of the following general indicators that those datasets are an asset to the organisation:

  • The dataset is used for decision making.
  • The dataset is held in an application.
  • The dataset is regularly updated.
  • Loss of the dataset would be problematic.

The details of each dataset held in the register (the metadata) are similar to those held for any other asset and include:

  • System of record (where the data is held).
  • Owner
  • Maintainer(s)
  • Keyword(s)
  • Theme (see Taxonomy below)
  • Format
  • Size
  • Update frequency
  • Language
  • Classification (see Classification below)
  • Details of any Personally Identifiable Information
  • Details of any publication

Creating an initial data asset inventory is often the hardest part of data governance as the whole concept is new to most people and so the process requires a set of interviews with data owners that is part investigation and part training.

Responsibilities and Training

In the same way that managers in an organisation have specific responsibilities for the physical assets in their care, such as laptops or even desks, managers will need a specific set of responsibilities around data assets. These include:

  • How to identify a data asset and the owner of that asset
  • Ensuring the data is updated and meets any defined quality standards
  • Updating and maintaining the data asset inventory
  • Correct classification and description of the data
  • Publication (where a decision is made to publish)

As this is a new area for most managers, the introduction of these responsibilities is normally accompanied by a training programme to introduce the concepts and explain the details.

Standards

Unlike the financial industry which has long-established industry bodies that develop standards that are used ubiquitously throughout the world, the data industry is still in its infancy and so each organisation or group of organisations must develop and maintain its own set of standards.  These standards cover the following:

  • Metadata.  This is both the list of details that are recorded for each data and the format by which metadata is published and shared.  There are a number of published government and industry standards in this area, but some are quite complex to use and as a result there is no single standard, with organisations creating their own based around a published standard.
  • Classification.  This is the security level of the data and the process for determining that.  A classification is often a set of levels of what needs to be done for the data to be shared, such as:
    • Shareable as is
    • Shareable after automated redaction
    • Shareable after manual redaction
    • Shareable only in aggregated form
    • Shareable only with access control
    • Not shareable
  • Taxonomy. A limited set of subjects that are relevant to the organisation and which can be used to categorise the datasets.  This is normally part of the information management strategy of an organisation.
  • Conceptual Data Model (optional).  It’s likely that your IT team has some parts of this from IT projects that restructures your data.  A full model for your entire organisation is exceptionally valuable in resolving long-standing data problems, planning changes, and understanding the potential of your data.
  • Formats (optional).  Some organisations choose to be quite rigid about the formats that data can be held in, in order to ensure interoperability, control the costs in managing data, prevent being locked into certain tools or being locked out of data.
  • Licensing (optional).  When data is published it should be licensed (or explicitly unlicensed) and that decision is best made on an organisation-wide basis.

Models for governance

There are multiple models for Data Governance and an organisation should choose that which suits it best.  These models include:

  • A cross-organisation team headed by a senior executive.  The main advantage of this model is that it allows multiple departments and key people to be involved in the development of this new function and the identification of training needs.
  • Within the IT department.  This is a natural choice where the IT department already has a strong focus on formal information management and a strong service ethos.
  • Within the product management team. If the primary reason for embarking on data governance is for the creation of data-driven services then the product team is sometimes the best place to start up data governance, even if it is limited to the datasets they initially manage under this framework.

Finally, it should be noted that whatever model is chosen, ongoing engagement both within the organisation and with the customer/stakeholder community is vital to ensure that the data governance meets their needs.  This engagement should be formalised as part of the responsibilities of the chosen model.

How we can help

We can help in multiple ways:

  • Develop a detailed long term data strategy covering governance, data services, data science and open data.
  • Introduce a full or partial data governance framework.
  • Generate a comprehensive data asset inventory, data classification standard, metadata strategy and data governance roles.
  • Develop staff support resources and deliver the cultural change needed to support formalised data governance.
  • Specify measurements and metrics to ensure the ongoing compliance with the data governance framework and minimise issues from non-compliance.