Expand my Community achievements bar.

SOLVED

Streaming Web Events through Experience Edge Framework - Importance of enabling 'Profile' on schema & datasets

Avatar

Level 3

Hi,

In batch ingestion...

1. Data lands in data lake in the form of datasets

2. If the profile is enabled on datasets, data move into Profile Store

3. In Profile Store, data participate in profile stitching and become part of profile view & segmentation

 

In Streaming Ingestion via Experience Edge Framework(WebSDK, Edge Network, Data stream)...

1. Event Data lands in Profile Store

2. In Profile Store, data participate in profile stitching & become part of profile view. Also, data participate in real-time segmentation

3. Event data moves into Data lake as part of batch running every 15 min on profile store.

 

My question is regarding streaming ingestion, if event data is landing in profile store and participating in profile stitching & segmentation then what is use of enabling 'Profile' on datasets & schema?

My base assumption is 'Profile' is enabled to move your data from data lake to profile store. Once data is moved into profile store, it participate in profile stitching & segmentation. Does this understanding is correct? or there is any gap?

 

Thanks.

1 Accepted Solution

Avatar

Correct answer by
Employee Advisor

@ChetanyaJain @vikash4 A small tweak:

"if you have two records with the same _id, then the oldest data will be reflected when you look-up this data using queries/APIs"

For time-series data the _id is used to prevent duplicates from being loaded and thus, will ignore/reject any subsequent records with the same _id in the same dataset.

View solution in original post

4 Replies

Avatar

Employee Advisor

Hi @vikash4, thanks for your question here:

Firstly, looking at latency for streaming ingestion on the Platform.

Real-Time Customer Profile < 1 minute

Data Lake < 60.

Thus streaming data (Experience Events) lands in Profile before the data lands in Data Lake. 
However, there are multiple reasons to answer your question on why XDM & Dataset is required with Unified Profile enabled. 
Firstly (XDM)

  • Without defining an Identity (or multiple identities) or enabling Profile at the Schema level.
    • XDM sets the structure of how your data should exist when ingesting via Streaming (similar to Batch Ingestion).
      • XDM is the blueprint of how your data should be structured on ingestion.
    • Unified Profile and Unified Identity ingestion would not occur.
    • In addition, without Unified Profile or Identity, you would not be able to query any of your ingested data via Unified Profile Look-Up (or Real-Time Customer Profile Entities API) or the Identity Graph Viewer (or Identity Services API) as a Namespace is required for all.

Data Lake

  • If Batch or Streaming data does not conform to the structure of the associated XDM schema, then the ingestion will fail based on validation.
    • Please see monitoring data ingestion for further information. Another advantage of utilizing Data Lake for troubleshooting/analysis for Data Lake, Unified Profile & Identity Service ingestion.
    • For ingestion issues, you can use the Data Access API to retrieve failed batches.

Hope this helps!
Josh

 

Avatar

Level 3

Hi @Joshua_Eisikovi ,

Thanks for the detailed explanation. So, does this mean..

1. Event data will be in Profile Store even if Profile is disabled on datasets? Since, it lands in Profile store and checks "Profile" enablement for further processing.

2. If profile is enabled, event data participate in unified Profile and unified Identity 

3. if profile is enabled, event data will be in Unified Profile Look-Up (or Real-Time Customer Profile Entities API) or the Identity Graph Viewer.

Thanks.

Avatar

Community Advisor

Hi @vikash4 

1. If Profile is not enabled on the dataset, the data will not flow into UPS (Unified Profile Service)

2. Yes. Fields (eg: ECID, AAID) that you marked as Identities will show up in the identity graphs.

3. You can check the events data after doing the look-up on the profile and under the Events tab. The events data will not show up at the profile attribute level but will be present under the Events tab. Remember, if you have two records with the same _id, then the oldest data will be reflected when you look-up this data using queries/APIs. So each time there's an update to the event new _id should be generated by the source system.

 

Remember -

Please let me know if you have more questions.

 

Thanks,

Chetanya

Avatar

Correct answer by
Employee Advisor

@ChetanyaJain @vikash4 A small tweak:

"if you have two records with the same _id, then the oldest data will be reflected when you look-up this data using queries/APIs"

For time-series data the _id is used to prevent duplicates from being loaded and thus, will ignore/reject any subsequent records with the same _id in the same dataset.