Introducing Block Data: Diagnose, quarantine, fix, and backfill bad data
Bad data leads to bad decisions, but most teams are unable to address their data quality problems proactively in real time. Today, we are excited to announce Block Data, a new early access feature that helps teams automatically identify and drop unplanned data before it’s forwarded downstream, review and quarantine suspected bad data for investigation, and replay quarantined data once it’s been inspected and modified.
Cross-functional teams, including product and marketing teams, need real-time access to high-quality customer data to improve customer experiences and accelerate growth. According to Harvard Business Review, it costs ten times as much to complete a unit of work when the data is flawed in any way. Yet, most teams have neither the means nor the resources to address their data quality problems proactively in real time.
Earlier this year, we introduced Data Plans, a powerful suite of tools that allow developers, product managers, and growth marketers to have a shared understanding of what data is available, in what format, and how it is consumed. Teams that implement Data Plans are able to identify and flag issues with their data such as incorrect formatting, incomplete data, missing data, and dirty or messy data, but diagnosing and fixing these data quality issues quickly before bad data makes it to core systems can still be a challenge.
Today, we are excited to announce Block Data, a new early access feature that helps teams automatically identify and drop unplanned data before it’s forwarded downstream, review and quarantine suspected bad data for investigation, and replay quarantined data once it’s been inspected and modified. With Block Data and Data Plans, teams can feel secure that their data quality is protected consistently and continuously across their entire data pipeline.
Automatically filter bad data across your data pipelines
Block Data allows you to whitelist data points in-flight, in real time, with an option for quarantining unplanned and undesired data and backfilling it after you inspect and fix it.
With mParticle, teams can create Data Plans programmatically using an API, or through our UI, to validate and enforce data collection based on a shared agreement across data stakeholders. Developers can easily configure mParticle’s SDKs for Web, iOS, and Android to validate events as they are collected against a specific plan. Data Plans support iterative development of plans for data currently in production, while still allowing you to activate and deactivate new versions easily.
The prerequisite for Block Data is that a Data Plan must exist. Block Settings and Quarantine Connections can then be set up for a specific Data Plan.
Once the block settings are activated, blocked events and attributes never make it to downstream integrations preventing bad data from polluting your production systems. Today, this feature only supports server-side integrations. We will be adding support for client-side integrations (Kits) in the near future.
Naturally, the next step for any blocked data is to inspect it. We provide teams the ability to review stats on what data was blocked and when it was blocked with an interactive summary report.
Fix bad data in-flight
More often than not, product managers and marketers need to involve developers to diagnose and fix bad data. To expedite this common workflow, a Quarantine Connection can be set up to export bad data in JSON format to Amazon S3 or a destination of your choice with a Webhook.
Backfilling is the process for importing previously blocked data from a Quarantine Output, optionally transforming it, and then replaying it back into mParticle. We provide helper scripts for developers to backfill blocked data using mParticle’s Events API allowing teams to never miss a beat on data they need to make critical business decisions.
Always on, self-service tools to keep your data flywheel spinning
Block data is currently available in Early Access. It is a new feature of our comprehensive data quality suite that continues to help customers enforce total data quality management across their data pipelines. High quality data helps our customers power outstanding user experiences, and measurement of customer engagement throughout the journey which leads to better optimization and successful business results.
Join us on October 28th to learn how mParticle helps customers protect data quality to improve customer engagement and accelerate growth.