Expand my Community achievements bar.

Join us in celebrating the outstanding achievement of our AEP Community Member of the Year!

Multi-Forked Event Expiration Strategy using DataStream in AEP RTCDP

Avatar

Employee Advisor

5/22/23

Authors:  @Saswata Ghosh , @Shelly Goel, @Carlos Guerrero 

Editor: @Danny-Miller  

 

While implementing AEP for customers, onboarding high-frequency data like Web Events is a common use-case. Many Platform users have upwards of up to 90% of profiles being populated by behavioral data alone, in comparison to that of record data. Therefore, managing your behavioral data using expiration-rules is critical in ensuring compliance within your license entitlements and to prevent being obsolete for your use cases.  

Customers often associate different weightage and relevance to different event-types and want to assign different expiry duration for them. An example would be choosing ‘30 days’ retention for ‘page views’ and ‘product browse’ events compared to 1 year for ‘purchase’ events. This is achievable in a WebSDK implementation using two DataStream pointing to two datasets (purchase vs other events) and assigning different expiry duration to them. This blog walks through an implementation done for a leading department store group. 

About Automated Event Expiration Process 

In Adobe Experience Platform, an Experience Event Schema has _id and Timestamp as mandatory fields which need to be populated during ingestion. The timestamp is supposed to hold the actual occurrence time of the event irrespective of when it was ingested into AEP. We highly recommend using Event Type as it sets context to the event and is used by many downstream services/apps. 

DannyMiller_0-1684190231130.png

You can configure ‘expiration’ times for all Experience Events that are ingested into a dataset enabled for Real-Time Customer Profile based on the ‘Timestamp’ attribute. This lets you automatically remove data from the Profile Store that is no longer valid or useful for your use cases. Experience Event expirations (formerly known as ‘TTL’ – Time to Live) cannot be configured through the Platform UI or APIs. Instead, you must contact Adobe Support to enable Experience Event expirations on your required datasets. Once applied, any event data that is older than the number of days allotted by the ‘expiration’ is permanently deleted from Unified Profile Store. As a benefit, any anonymous profile for which all events have expired, also gets purged from the profile store. 

Real World Example: 

Our customer is one of the biggest department store groups in Europe. Recently they replaced Salesforce DMP with AEP RTCDP for their segmentation and activation use cases. 

The customer also migrated from Google Analytics by implementing WebSDK for their website using Google Tag Manager. During the discussion for data retention best practices, we advised execution of ‘Experience Event Expiration’ on their Web Events dataset to streamline their addressable audience and profile richness within license limits. 

However, they expressed the following challenges:

  1. Different Web Events have different relevance and data-retention requirements due to their varying use cases. E.g., Purchase Events Data is valuable to them for up to a year so that they can re-target their customers who made a purchase during last Christmas/thanksgiving etc.
  2. For other high-frequency web events like page views or product browsing, they agreed to a lower data retention (TTL: time to live) of 30 days. We also had to ensure that the custom solution should not increase their server calls. 

Solution for the multi-forked event expiration strategy 

DannyMiller_0-1684190407390.png

Based on above high-level design, this light-weight solution has three key components: 

  1. 2 different datasets referencing the same AEP WebSDK schema. This allows the fragments across datasets to be brought together using the Real-time Customer Profile.
    • DannyMiller_2-1684190231133.png 
  2. 2 different datastreams pointing to their respective AEP profile-enabled datasets 
    • DannyMiller_1-1684190561745.png
    • DannyMiller_2-1684190590549.png 
  3. Custom code in processing layer to dynamically fork purchase events vs. other events to their respective datastream_id. Pre-requisite: “page_type” is captured in customer’s data layer, however it can be any other criteria/type as per your use-case which should be readily available in the data layer. 

Google Tag Manager example: 

The following code snippet from the data layer can be embedded on every page or it can be placed in some common location so that it is called on every page. 

DannyMiller_0-1684525277523.png

 

DannyMiller_4-1684190738143.png

 

DannyMiller_5-1684190762092.png

Alternative approach (using Tags/Launch)

Create a Data Element to hold the relevant datastream ID based upon the ‘pageType 

DannyMiller_6-1684190897037.pngDannyMiller_7-1684190921121.png

In this way, the events from ‘purchase’ pageType will be redirected to ‘purchase’ dataset via the ‘purchase’ DataStream which we had configured earlier. Similarly, the ‘default’ clause will direct events from other pages to the other ‘Web Events’ dataset. If your customer has a similar requirement, you can utilize this multi-forked DataStream strategy for more than two pageTypes as well or introduce other conditions as well. 

Things to consider 

  1. Adobe has released a new feature called Pseudonymous profiles data expiration which is applicable for Production sandbox only and complementary to the granular ‘dataset-level’ event expiration discussed in this article. Unlike this use case for known customers, it is targeted at purging pseudonymous users based on ECID, GAID, or AAID.
  2. This strategy for multiple event datasets should be applied in moderation as there is a soft guardrail of 20 datasets that leverage the XDM ExperienceEvent class
  3. Segmentation logic needs to be validated before and after applying the TTL as different TTL on different datasets can introduce inaccuracies in segment lookback windows >TTL duration
  4. The Experience Event TTL is set at dataset-level by Engineering team and is resource-intensive. Also, there is no monitoring UI for the system job. Too many datasets with different TTLs may lead to multiple TTL jobs running together with possible conflicts and failure without alerts.
  5. Multiple Experience Event Datasets can increase the complexity of SQL queries requiring to join multiple tables for some Data Distiller use cases.
  6. Experience Event expiration removes events permanently only from the Unified Profile Store (UPS) and neither the data lake nor the Identity graph (UIS). When all Experience Events have been removed, if the profile no longer has any profile attributes, the profile will no longer exist. 

Three Key Benefits 

  1. Simplicity: The strategy to fork events (row-level) to multiple datasets can be based upon any condition i.e. data layer variable/event as per the use case, using simple if-else code. 
  2. Flexibility: Datastream can be created in advance (without dataset) and its ID referred to in the processing rule. This allows for parallel development and deletion of old dataset/ linking of new ones to a DataStream without dependency.
  3. Cost Savings & Durability: The Event Expiration set on dataset purges historical events daily, ensuring that existing profiles stay within 5000 events guardrail and new anonymous profiles get purged from the system beyond their usefulness/expiry period. This also helps customers stay within profile license limits over a longer term and save on license cost. 

 Related Links:  

7 Comments

Avatar

Community Advisor and Adobe Champion

5/22/23

Thank you for sharing this guys! We faced a similar issue in the past and looks like this is a feasible solution. 

 

Avatar

Community Advisor

5/22/23

This is helpful. Thanks for sharing this.

Avatar

5/24/23

Could be a big money saver in terms of licensing as we can expire some of the data from profile store

Avatar

Employee

6/11/23

Thanks @Danny-Miller and the Team, this is super helpful.

I have a question in regards to this point: "different datasets referencing the same AEP WebSDK schema. This allows the fragments across datasets to be brought together using the Real-time Customer Profile."

Wouldn't datasets be brought together using RCP even without them using the same schema? What advantages the same schema has in this case?

Thanks!

Avatar

Employee

6/17/23

@yulial62241168 - Thank you. You are correct that RTCDP will merge different Experience Event schemas too into the union schema. However 1 Web Events schema is our starting point for mapping the source data from WebSDK to the target XDM attribute of this schema. 2 datasets is a simpler solution than having multiple schemas both in terms of implementation as well as maintenance