I uploaded the same JSON file containing 3 records multiple times, and all records were ingested successfully. I expected AEP to reject duplicated records based on the Identity field, but that didn’t happen.
My understanding was that marking a field as an Identity (and Primary Identity) would behave similarly to a primary key in a database, preventing duplicate values from being ingested.
Why doesn’t it work this way in AEP, and is there any way to enforce rejection of duplicated identity values at ingestion time?
Solved! Go to Solution.
Views
Replies
Total Likes
Thank you, guys. I'll post what ChatGTP says. I feeded your answers as well. I think this could help other people.
Short answer:
You can’t. Adobe Experience Platform does not support rejecting records or files at ingestion time based on duplicated Identity values. Everything gets ingested into AEP Datalake.
In AEP, Identity fields are not equivalent to primary keys in a database.
During ingestion, all records are always accepted into the Data Lake, even if they contain duplicated Identity values.
Marking a field as an Identity or Primary Identity does not enforce uniqueness and does not trigger rejection.
Identity in AEP is used for identity resolution and profile unification, not for ingestion validation.
Data Lake
Stores all ingested records as-is.
Does not perform deduplication or uniqueness checks.
Identity Service / Identity Graph
Processes identity fields and builds relationships between identities.
Does not validate uniqueness or block duplicates.
Profile Service (Real-Time Customer Profile)
Uses the Identity Graph to unify data.
For record-based datasets, it performs an upsert:
New identity → creates a profile
Existing identity → updates the same profile
No duplicate profiles are created, but duplicate records still exist in the Data Lake.
Because of this architecture, AEP never “rejects” duplicated identities—it simply normalizes them at the profile level.
For time-series (ExperienceEvent) datasets, the _id field can be used to prevent duplicate events in Profile Service.
If the same _id is ingested again, Profile Service may ignore it for profile computation.
However, the event is still stored in the Data Lake.
This behavior does not apply to identity fields or record-based datasets.
If rejecting duplicates is a strict requirement, it must be handled outside of AEP, for example:
Deduplicate data in your ETL or source system before ingestion
Generate and control unique identifiers upstream
Use _id strategically for ExperienceEvent datasets to avoid duplicate profile events
AEP is not designed to enforce uniqueness at ingestion time.
Identity is used for profile unification, not data validation.
Duplicates are expected in the Data Lake and resolved logically by Profile Service.
This is the intended and documented behavior of Adobe Experience Platform.
Views
Replies
Total Likes
Hi @OmarGo2
AEP doesn't work like that ,When you upload that JSON file multiple times, all 3 records are saved as separate entries in the Data Lake. AEP treats these as a history of events or "snapshots" of data.
But during ingestion deduplication can happen with the record ID :
Every XDM record has a root-level _id field. If you send two records with the exact same _id in the same batch or streaming request, AEP will treat the second one as a duplicate and may deduplicate it during the ingestion process to save space.
AEP is built to ingest data as fast as possible and resolve the "truth" later during the Profile merge.
Views
Replies
Total Likes
hi @Vinoth_Govind , I think you're referring to this _id field in a bare bone schema:
In my case, I'm using personID as the identity and primary identity. So I guess _id is null or automatically populated by Adobe. In any case, as you mentioned, Adobe doesn't care if you ingest duplicated records, in a dataset has duplicated records, Adobe will use the most recent data when creating/updating a profile?
Did I get it right?
Views
Replies
Total Likes
Yes, _id is different from person ID.
_id should be unique to each record or event - you have to generate it.
Refer https://experienceleague.adobe.com/en/docs/experience-platform/xdm/classes/individual-profile
Search of _id parameter.
When multiple records for the same Identity (Person ID) exist, AEP uses Merge Policies to decide what the "Final Version" of the Profile looks like.
Default Merge policy is timestamped , so yes latest will be be the final one.
Views
Replies
Total Likes
It's important to understand the differences between record-based datasets (like those for the Individual Profile Class or custom classes) and time-series datasets (like those for ExperienceEvent Class).
The "_id" field is only applicable for time-series datasets where it's a mandatory field. Profile Service will check the combination of the record's primary ID and "_id" value and if already ingested, will ignore those subsequent records. (This only happens in Profile Service, as the data lake's dataset will accept all records regardless.)
Record-based data is very different. It's always appended to the data lake's dataset, but in Profile Service it's inserted if it has a new primary identity and updated if the primary identity exists. Unlike the data lake, there's no history kept for record-based data within Profile Service.
Views
Replies
Total Likes
Thank you, guys. I'll post what ChatGTP says. I feeded your answers as well. I think this could help other people.
Short answer:
You can’t. Adobe Experience Platform does not support rejecting records or files at ingestion time based on duplicated Identity values. Everything gets ingested into AEP Datalake.
In AEP, Identity fields are not equivalent to primary keys in a database.
During ingestion, all records are always accepted into the Data Lake, even if they contain duplicated Identity values.
Marking a field as an Identity or Primary Identity does not enforce uniqueness and does not trigger rejection.
Identity in AEP is used for identity resolution and profile unification, not for ingestion validation.
Data Lake
Stores all ingested records as-is.
Does not perform deduplication or uniqueness checks.
Identity Service / Identity Graph
Processes identity fields and builds relationships between identities.
Does not validate uniqueness or block duplicates.
Profile Service (Real-Time Customer Profile)
Uses the Identity Graph to unify data.
For record-based datasets, it performs an upsert:
New identity → creates a profile
Existing identity → updates the same profile
No duplicate profiles are created, but duplicate records still exist in the Data Lake.
Because of this architecture, AEP never “rejects” duplicated identities—it simply normalizes them at the profile level.
For time-series (ExperienceEvent) datasets, the _id field can be used to prevent duplicate events in Profile Service.
If the same _id is ingested again, Profile Service may ignore it for profile computation.
However, the event is still stored in the Data Lake.
This behavior does not apply to identity fields or record-based datasets.
If rejecting duplicates is a strict requirement, it must be handled outside of AEP, for example:
Deduplicate data in your ETL or source system before ingestion
Generate and control unique identifiers upstream
Use _id strategically for ExperienceEvent datasets to avoid duplicate profile events
AEP is not designed to enforce uniqueness at ingestion time.
Identity is used for profile unification, not data validation.
Duplicates are expected in the Data Lake and resolved logically by Profile Service.
This is the intended and documented behavior of Adobe Experience Platform.
Views
Replies
Total Likes
Views
Likes
Replies