In batch ingestion...
1. Data lands in data lake in the form of datasets
2. If the profile is enabled on datasets, data move into Profile Store
3. In Profile Store, data participate in profile stitching and become part of profile view & segmentation
In Streaming Ingestion via Experience Edge Framework(WebSDK, Edge Network, Data stream)...
1. Event Data lands in Profile Store
2. In Profile Store, data participate in profile stitching & become part of profile view. Also, data participate in real-time segmentation
3. Event data moves into Data lake as part of batch running every 15 min on profile store.
My question is regarding streaming ingestion, if event data is landing in profile store and participating in profile stitching & segmentation then what is use of enabling 'Profile' on datasets & schema?
My base assumption is 'Profile' is enabled to move your data from data lake to profile store. Once data is moved into profile store, it participate in profile stitching & segmentation. Does this understanding is correct? or there is any gap?
Hi @vikash4, thanks for your question here:
Firstly, looking at latency for streaming ingestion on the Platform.
Real-Time Customer Profile < 1 minute
Data Lake < 60.
Thus streaming data (Experience Events) lands in Profile before the data lands in Data Lake.
However, there are multiple reasons to answer your question on why XDM & Dataset is required with Unified Profile enabled.
Hope this helps!
Hi @Joshua_Eisikovi ,
Thanks for the detailed explanation. So, does this mean..
1. Event data will be in Profile Store even if Profile is disabled on datasets? Since, it lands in Profile store and checks "Profile" enablement for further processing.
2. If profile is enabled, event data participate in unified Profile and unified Identity
3. if profile is enabled, event data will be in Unified Profile Look-Up (or Real-Time Customer Profile Entities API) or the Identity Graph Viewer.
1. If Profile is not enabled on the dataset, the data will not flow into UPS (Unified Profile Service)
2. Yes. Fields (eg: ECID, AAID) that you marked as Identities will show up in the identity graphs.
3. You can check the events data after doing the look-up on the profile and under the Events tab. The events data will not show up at the profile attribute level but will be present under the Events tab. Remember, if you have two records with the same _id, then the oldest data will be reflected when you look-up this data using queries/APIs. So each time there's an update to the event new _id should be generated by the source system.
Please let me know if you have more questions.
"if you have two records with the same _id, then the oldest data will be reflected when you look-up this data using queries/APIs"
For time-series data the _id is used to prevent duplicates from being loaded and thus, will ignore/reject any subsequent records with the same _id in the same dataset.