The Identity Graph for the Modern Data Stack

Joao Correia

Founder of SnowcatCloud January 25, 2022

Devices don't buy things; people do. Behavioral data (event-data) is the basis for building a single customer view. Still, without an intelligence layer to consolidate behaviors (events) at the individual level, behavioral data (event data) alone provides an incomplete view of the customer.

There are several reasons why:

We live in a multi-device world where everyone uses at least two or three devices.
Inconsistency of customer identifiers across channels that collect behavioral data (online and offline) leading to increased complexity.
Customers not authenticated across all touchpoints, which doesn't make their behaviors less valuable
Most behavioral tracking tools are still third-party (a third party domain running on your site); some browsers started limiting or outright blocking third-party tracking.
Increasing privacy regulations can be an obstacle but in this post we'll see how an Identity Graph can help.
The data modeling necessary for identity resolution is complex, relational databases weren’t made to solve these types of problems.

This post dives into a mostly unspoken problem that affects organizations today; the fact that behavioral tracking is most of the time device centric (when it should be customer centric), and presents a long-term solution to building the best possible single customer view.

The Single Customer View

Creating a single customer view enables organizations to better understand, acquire and service their customers.

These are the two most common to building a single customer view:

Customer Data Platform (CDPs) = Buy, Rent

CDPs ingest data from multiple systems and create and maintain a single customer view, most often with excellent segmentation capabilities. An activation layer allows data to be used for sales, marketing, support, and more.

Modern Data Stack (MDS) = Build, Own

With the Modern Data Stack, organizations build the single customer view in their data warehouse using separate software to collect, transform and activate the data.

The Modern Data Stack approach requires a data team/consultancy, but the organization can use the data in virtually any use case rather than being locked to vendor capabilities.

Owning the data is critical to extracting the maximum value.

What is an Identity Graph?

WHAT

Identity Graph Example

An Identity Graph (ID Graph) is a database that stores object identifiers and their relationships, usually individuals and their digital interactions, creating a single customer view of the customer that can be used for real-time personalization, advertising, lead scoring, recommendation, segmentation, and more.

Identity graphs are widely used in telecommunications, banking, advertising, law enforcement, among others.

Take Facebook, for example, it is a graph where people are linked to other people, groups, and pages through different relationships, e.g., likes, member, friend, etc.

The network of friends of your friends (friends in second degree), pages you like and the people who also like those pages or groups (even if you are not connected to them) can be a greater predictor than your direct connections.

What political party you will vote for, if you’re a smoker and more. Graphs can know us better than ourselves.

Why a graph and not a relational database?

WHY

Relational databases are not designed for highly connected data; it can get messy and unmanageable fast.

This is where graph databases come in; they already power our day-to-day; although we don't realize it, we are part of a graph every time we swipe our credit cards or log into a social media site.

Graphs are great at taking a starting point and following the connections from that starting point until it finds an answer (graph traversal), a problem which SQL is not designed for (see N+1).

Iceberg, The Customer Graph for the Modern Data Stack

At SnowcatCloud, most of our customers use a Modern Data Stack, and some face challenges creating a consistent single view of the customer.

So we developed SnowcatCloud Iceberg, an extensible Customer Graph, graph-native that provides our customers with a solid, Cost-effective foundation for identity resolution or other use cases such as marketing attribution, fraud detection, segmentation, privacy compliance, and more.

SnowcatCloud Iceberg is fully compatible with Snowplow pipelines, powered by the leading world's graph database Neo4j, and plays well with the Modern Data Stack.

How does it work?

HOW

SnowcatCloud uses each customer's Snowplow behavioral data to create in real-time an identity graph stored in a graph database (Neo4j) which the customer can own, similar to owning a Snowflake, BigQuery database.

Off the bat, we create an Identity Graph with all the customer Snowplow data (e-commerce, events, page views, IP addresses, cookies) with additional support for FingerprintJS and Shopify events. SnowcatCloud Iceberg is compatible with any standard Snowplow data pipeline

The Identity Graph has an extensible model and can be queried through APIs and data imported/exported from/to the Snowflake/BigQuery data warehouse, thus integrating with the Modern Data Stack.

An Example

Here's a simple example to illustrate how it works.

An anonymous customer using a tablet clicks on a ad and browses the site
Phone number 8584562445 calls to the call-center with a question
Using the desktop customer registers as Jane Doe and makes a purchase
Jane receives an email on the tablet and clicks on the link to track the order. The link contains an ID that is used to identify Jane.
Jane receives the order and goes to the store to return it a few days later

Identity Graph Sequence

Thanks to the Identity Graph, we can now see the behavior and all devices owned by Jane, and thus create a single view of the customer.

Identity Graph Example

SnowcatCloud Iceberg API/data export allows data teams to consolidate their customer identities in their data warehouse or applications in real-time.

Identity Graph Use Cases

Below are some example use cases for an Identity Graph;

Identity Resolution (The Identity Graph can be used to link multiple identities/devices to an individual, both deterministically and probabilistically).
Customer Segmentation (Behavioral data can be used to segment customers into clusters and look-alikes using non-supervised machine learning)
Account-Based Marketing (Using integrations like Clearbit, we can group behavioral event-data at an account level, thus understanding behavior at the account level)
Marketing Attribution (Enables better marketing attribution because you have a complete picture of the customer journey across devices.)
Lead Scoring (The Identity Graph can be used for lead scoring, applying graph algorithms to lead scoring. Page Rank is an example.)
Recommendation Engine (The Identity Graph can be used as a recommendation engine by scoring the probability of connections between nodes or "people who x also like y")
Privacy Compliance (The Identity Graph can also provide a complete list of devices for a given individual, making it easy to comply with data requests and deletions)

Conclusion

Graph databases are a powerful tool to add to the Modern Data Stack arsenal, and given the multi-channel/device we live in, privacy regulations are more needed than ever.

Review how you are using your behavioral data (event-data) to create a single customer view

Looking to know more about creating an identity graph for your organization? Contact us at hello@snowcatcloud.com.

SnowcatCloud, Inc. | Cloud-hosted Snowplow, affording companies true ownership of event-level data.