By all accounts, possibly the least exciting aspect of modern data analytics, is in fact the governance of the practice itself. Phrases like big data, AI, machine learning, data stories and augmented reality are all among the more exciting phrases in the realm of data & analytics; however remain only as buzz words if not administered correctly & thoughtfully with a solid data governance strategy.
To the old adage of “rubbish in, rubbish out”; no matter how well defined & robust your data governance strategy; it can still drastically go off-course with inaccurate, inconsistent raw data generated and collected at source. Enter the tracking plan… we’ve written several articles talking about the emergence of the modern data tracking plan, and here you will see how it has evolved into a critical part of data governance.
What is Data Governance
At this point it is a good idea to understand more about what data governance is not. Data governance is most commonly confused with data management or data stewardship - 2 key disciplines involved with the management of the data lifecycle & the key activities involved in ensuring data accuracy & discoverability.
Data governance is an overarching data management concept concerning the capability that enables an organization to ensure that high data quality exists throughout the complete lifecycle of the data, and data controls are implemented that support business objectives. (as defined on wikipedia)
Here is further reading on Talend.com that better describes in detail the nuances of data governance and surrounding concepts.
Challenges of modern day data governance
The exponential growth in volume of data being processed each day is astounding. 2.5 quintillion bytes of data are produced by humans every day. This certainly comes with a few complications for any business trying to ensure ongoing data accuracy.
Data Ownership: With so much effort across the industry to ‘break down silos’ amongst business units and build cross functional teams across an organisation; we’re often left with a problem of ownership for particular parts of data lifecycles. Defining data structures & mapping behavioural event data is literally put in the wrong hands and often without any software or platform to enforce rules or conventions.
Consistent Single Source of Truth It sounds simple. One place that an entire business, from CEO to data scientist, can rely on for a real time view of the most meaningful data. This would need to include the current & historical status of each and every data structure & associated metadata, along with meaningful & precise definitions of each element.
Well Defined Conventions Naming conventions is one thing, but what about conventions imposed by the numerous tools & platforms in the modern cloud based stack. Your same single source of truth needs to understand and alert end users to the limitations, or the ‘rules’ imposed from various data sources and destinations at the right time to ensure potential discrepancies are caught before they can move further down the data pipeline.
Data Lineage With so much data flowing across such a variety of sources & destinations, it is very difficult to know exactly where 1 piece of data comes from & where it is going & from where it actually originated. If this is not defined and mapped, you’re left with blind leading the blind through any organisation as time goes by.
Lack of Data Documentation This is linked to the centralised source of truth. Align with just listing, tracking and mapping the data accordingly, it is even more important to document this data, give it context & define exactly why it is valuable.
So Why a Tracking Plan?
A tracking plan in the traditional sense probably won’t go a long way to support a solid data governance strategy. We’ve written here about the traditional tracking plan and how it’s beginning to transform with the requirements of the industry.
We are leading the transformation of the traditional tracking plan.
At Trackplan we are working to build the schema management platform for behavioural data tracking. A tracking plan will become your agnostic single source of truth, integrated to your data tracking tools, updated in real time with versioning & synced with relevant conventions and limitations set by tools in your cloud based stack.