Part Two: The Rise of the Cloud Data Warehouse
This post is part of the CDP 2.0: Why Zero-Waste Is Now series. For the best experience, we recommend starting with the Introduction and reading each chapter in sequence.
Cloud data warehouse: The center of customer data
Concurrent with the development of CDPs, cloud data warehouses (CDWs) exploded in popularity. Starting with RedShift and BigQuery, CDWs became a foundational component of the tech stack for organizations of all sizes. CDW adoption was driven by extraordinary tailwinds, including:
- Momentum of the cloud computing.
- Pandemic acceleration of digital transformation.
- Emergence of ELT functionality.
- Highly active CDW partner ecosystems.
Taken together, these factors significantly lowered the barrier to entry for data warehouse initiatives, launching many data consolidation efforts.
CDWs addressed many of the data quality, governance, and privacy challenges companies had managing their data assets. Consequently, CDWs became the centerpiece of data aggregation efforts, including comprehensive 360º customer views—a long-pursued marketing goal. This created organizational shifts, elevating Data and IT teams to decision-maker roles for customer data initiatives, which had previously been the purview of marketing.
This had a significant impact on the CDP market, as IT considerations began to take precedence in many CDP decisions. Furthermore, CDWs made it easier for internal teams to build CDP-like capabilities through platform integrations and a variety of off-the-shelf functionality altering build vs. buy decisions for many companies.
"Composable" CDP: The warehouse-native approach
Deliberately piggybacking on CDW adoption, a new class of CDP emerged, originally called "reverse-ETL" (rETL) platforms. The core assumption of these solutions is that the customer’s CDW is the central repository of customer data. Originally, these platforms focused on moving data from a CDW to various endpoints. They offered audience building capabilities and integrations to various marketing platforms.
While these functions overlapped with the data orchestration abilities of established CDPs, rETL positioned themselves as a more open and CDW-friendly option for moving data from the CDW to other platforms. This included the ability to ingest any data from the CDP, as opposed to the
"rigid" data schemas of established CDPs. And since they resided in the customer’s CDW environment, there was no issue with data transparency.
Recently, they have expanded into full CDPs offerings, rebranding themselves as "composable CDPs," adding identity resolution and data ingestion capabilities. Their architecture differed from established CDPs as they are CDW "overlays," that is they reside in the customer’s CDW architecture, expanding the capabilities of the CDW itself, rather than being a standalone platform. As we discuss later, this also made composable CDPs natively zero-copy—with some very material qualifiers to this.
For data engineering teams specifically, this architecture is attractive, as it conceptually supports their larger CDW initiatives, specifically CDW-as-system-of-record efforts. This enabled composable CDPs to claim many CDW virtues for themselves, specifically data quality and governance benefits.
The "composability" branding has become somewhat contentious, the specifics of which we address later. By the strict definition, composability allowed customers to choose and pay for only the features they wanted to use. This differed from the established CDP pricing models, which bundled all functionality together and was based on usage models. Composable CDPs pushed their unbundled pricing as a primary benefit to customers.
While other CDPs offered modular pricing previous to this, composable CDPs positioning helped make this standard across CDPs. This change was not technical in nature, as current software practices enable any modern platform to be used in a modular and interoperable fashion.
Composable CDPs differed in a material way from established CDPs. As composable CDPs sit in the customer CDW, compute "costs" are borne by the customer via their CDW, rather than paid directly to the CDP. The immediate effect was a price "level," which appeared appreciably lower than established CDPs. However, the combined cost of [CDP + CDW additional compute] is not well understood until the composable CDP is fully deployed. Additionally, composable CDPs have "performance switches," which significantly increase costs as they change compute and storage behaviors to deal with latency issues.
We discuss composability and the virtues and drawbacks of composable CDPs in later chapters.