Real-Time Customer Data Platform

stephentmerkle · 6/10/25

Imagine you are collecting data from your website via streaming using HTTP API. As people browse and do things on the site, that data is being collected. You wish to create audiences using that event data and want to activate those audiences to a cloud storage destination such as S3 or SFTP. However, you are aware you are unable to map experience event data within the destination connectors. This makes it impossible to activate the audiences to such destinations.

Secondly, imagine once on the site, people can either authenticate (log in) or choose not to (unauthenticated/anonymous users). In cases where an individual arrives at the site multiple times in a month but does not log in, you want to collect behaviors of such anonymous users over time until they eventually log in. Once they do, you wish to connect their past behaviors to their current known identity.

How do you design a data model that allows you to accomplish all of the above?

There are a few approaches to consider when defining your data model:

Options:

1. Ingest the behavioral data using streaming ingestion into an Individual Profile (record based) profile enabled dataset. This will allow those attributes to be available for mapping when you are activating to the S3 or SFTP destinations. That said, only the most recent behaviors will be on the profile at any point in time (assuming you are using the default time-based merge policy)

2. Ingest the behavioral data using streaming ingestion into an experience-event based, profile enabled dataset. This will allow you to collect those behavioral data in a time-series and you will be able to track historical activities for these users. You will also be able to tie these anonymous users to their known profiles after they log in (assuming they use the same device/browser/cookie id). However, though those attributes will be available for use as part of audience criteria, they won't be available for mapping when you attempt to activate to the S3 or SFTP destinations.

3. Do both: Ingest that data into both a record based, profile-enabled dataset AND an experience-event profile-enabled dataset. This may introduce redundant information. Again, the record-based dataset will only include the most recent activity (assuming you are using the default time-based merge policy) while the experience event-based dataset will contain all the historical behavioral events for the profile (including those on the record based dataset). Downsides to this approach may include potential stitching issues (see scenario below).

Imagine a scenario where 2 individuals use a shared device or browser: Person A's browses the site at a public library without logging in and their information gets written to both datasets. Later on, Person B uses the same device to browse the site, and they choose to log in. The historical data collected for Person A in the experience-Event dataset will now stitch to Person B's record-based profile.

Curious to entertain thoughts on these approaches and whether other better options exist for such a scenario. Let's discuss....