Moving beyond the broad definition of a DMP (data management platform), Snowplow positions themselves specifically as a Behavioural Data Management platform. Snowplow Analytics is most concerned and focused on empowering businesses to collect, process and model high-quality behavioural data.
This is where Trackplan comes in. We integrate directly with Snowplow’s schema registry (also known as Iglu), to provide you with a collaborative schema management solution. Trackplan helps your data team to design an intuitive tracking plan, manage your data structures, and promote versions and environments, while providing your business with a single source-of-truth.
As the tracking plan evolves, changes to the data structures are published to your Snowplow schema registry, where schemas are used to enforce validation and rules on incoming behavioural data to improve your data quality.
To best understand how we do this, first let’s define & describe a few key concepts.
What is a schema?
A schema is like a blueprint of the data that you collect, or a definition of its structure. A schema will also define the structure of the entities attached to events and are important to ensure data is validated and processed correctly. This is best described from a Snowplow point of view here
“Schemas define the structure of the data that you collect. Each schema defines what fields are recorded with each event that is captured, and provides validation criteria for each field. Schemas are also used to describe the structure of entities that are attached to events.”
A Schema Registry
A schema registry is effectively a storage layer for schemas that can be maintained and allow them to evolve over time. This includes managing versioning and environments, e.g. development and production.
“Schema registry provides a serving layer for your metadata. It provides a RESTful interface for storing and retrieving schemas. It stores a versioned history of all schemas and allows evolution of schemas.”
As changes are published to the schema registry, new definitions and validations can be quickly deployed to the relevant environment to ensure these rules are enforced by your data management platform.
You can think of a data structure as an event or entity, with its associated metadata, such as properties, data type, validations, etc... The schema can be thought of then as the set of rules that inform the validation engine about the behavioural data being collected, and how to handle it, e.g. if a rogue or event is tracked
So what does it all mean?
To summarise the above and give some context, think of it in this way: Each data structure has its own schema that will describe the “what” and the “how” of a particular event or entity. All these different schemas are then stored in a schema registry. These schemas will also contain the validation criteria for each data structure.
Your tracking plan will become the foundation for changes to your behavioural data tracking, with your team first planning, creating or updating data structures, publishing to your development environment for testing, and lastly deploying to production once live.
This lifecycle will ensure you data tracking strategy begins with the end goal in mind: data quality
If you are interested in testing or finding out more about our integration with Snowplow, email us here, or start a conversation on our live chat.