Four Areas of Focus in the Data Governance Space
Today’s high velocity, volume and variety of data has rendered data governance a business requirement. Given the massive amounts of data generated internally and easily available, externally – a disciplined approach to manage all this information is the need of the hour.
A data governance strategy helps maintain data privacy, meet regulatory compliance and regulations. Such a strategy consists of policies, standards, roles, and processes that ensure the proper use of data, its availability, integrity, usability and security.
Large scale enterprises with their several departments, more often than not, witness the same data being collected in different ways and formats. Sometimes, even the values of a particular captured data could differ from department to department.
For example, a business function might capture an item as ‘missing’ by not giving it a value while another might use a specific value to denote an item is ‘missing’. This inconsistency in data extends across the organization and impacts processes and analytics down the line – like when using this data to create a predictive model.
Data governance ensures consistency across departments and processes and does away with the organization’s silos. Thoughtworks Looking Glass 2022 discusses how leaders should consider where data ownership sits within their organization as data quality problems tend to emerge from organizational structures and architectures that don’t incentivize teams to produce and share the data resources they have.
Here are some key areas that I expect will get more attention in the coming year are:
Source of truth: data governance intends to understand and streamline the process of data collection and integration – to identify who owns the data and the source of truth for that data. This approach will guarantee both consistency of data alongside accuracy of analysis and modeling based on said data. However, this is easier said than done in spite of being an area of focus.
Data governance in AI and ML: Today, the meaning of data governance has come to include all areas of data and AI. And I find this area extremely interesting. It requires a lot of thought, and the solution is not as straightforward as with data governance in relation to data management, ownership, accesses, compliance and privacy.
Data has biases and leads to models that have biases. I foresee the design and adoption of concrete methods and frameworks that deal with such biases. Explainable AI is already seeing traction in the market. I expect more focus on the ‘explainability’ of AI and ML, especially where decisions may have a big impact on people, as is with the banking and insurance sector.
Aside from regulations focused on data collection, storage and access, this year, I expect more regulations centered on AI. An example is the EU AI Act 2021; a regulatory and legal framework that applies to all types of AI. Its objective is to ensure AI systems are safe, respect fundamental rights and values, are lawful and trustworthy. I expect to see other regions working along the same lines.
Applicability to small and medium sized businesses: while large enterprises are seeing value from investments in people/talent with capabilities aligning with data governance objectives, the small and medium sized businesses will also begin making and seeing value from such investments – ensuring legal compliance and maintaining trust in their products and services.
Increased awareness across the organization: no matter what business one is in, they are likely to handle some form of data. A well-rounded awareness of data governance across roles and departments will nurture data quality, integrity, and compliance. This year, I expect organizations to make short term and small investments targeted at augmenting awareness around data governance.
Data governance is everyone’s responsibility from the data professionals to business leaders. I foresee more instances of ‘data governance committees’ emerging that seek an active role from multiple stakeholders.
(The author Shraddha Surana is Lead Data Scientist and Global Data Community Co-Lead, Thoughtworks and the views expressed in this article are her own)