Expand my Community achievements bar.

SOLVED

How can I make AEP reject files with duplicated Identity values on ingestion?

Avatar

Level 2

I uploaded the same JSON file containing 3 records multiple times, and all records were ingested successfully. I expected AEP to reject duplicated records based on the Identity field, but that didn’t happen.

My understanding was that marking a field as an Identity (and Primary Identity) would behave similarly to a primary key in a database, preventing duplicate values from being ingested.


Why doesn’t it work this way in AEP, and is there any way to enforce rejection of duplicated identity values at ingestion time?

OmarGo2_0-1767797791837.png

OmarGo2_1-1767797843894.png

 

 

1 Accepted Solution

Avatar

Correct answer by
Level 2

Thank you, guys. I'll post what ChatGTP says. I feeded your answers as well. I think this could help other people.

Short answer:

You can’t. Adobe Experience Platform does not support rejecting records or files at ingestion time based on duplicated Identity values. Everything gets ingested into AEP Datalake. 


Why this does not work in AEP

In AEP, Identity fields are not equivalent to primary keys in a database.

  • During ingestion, all records are always accepted into the Data Lake, even if they contain duplicated Identity values.

  • Marking a field as an Identity or Primary Identity does not enforce uniqueness and does not trigger rejection.

Identity in AEP is used for identity resolution and profile unification, not for ingestion validation.


What actually happens under the hood

  1. Data Lake

    • Stores all ingested records as-is.

    • Does not perform deduplication or uniqueness checks.

  2. Identity Service / Identity Graph

    • Processes identity fields and builds relationships between identities.

    • Does not validate uniqueness or block duplicates.

  3. Profile Service (Real-Time Customer Profile)

    • Uses the Identity Graph to unify data.

    • For record-based datasets, it performs an upsert:

      • New identity → creates a profile

      • Existing identity → updates the same profile

    • No duplicate profiles are created, but duplicate records still exist in the Data Lake.

Because of this architecture, AEP never “rejects” duplicated identities—it simply normalizes them at the profile level.


Important exception: ExperienceEvent datasets

For time-series (ExperienceEvent) datasets, the _id field can be used to prevent duplicate events in Profile Service.

  • If the same _id is ingested again, Profile Service may ignore it for profile computation.

  • However, the event is still stored in the Data Lake.

  • This behavior does not apply to identity fields or record-based datasets.


How to handle duplicates if rejection is required

If rejecting duplicates is a strict requirement, it must be handled outside of AEP, for example:

  • Deduplicate data in your ETL or source system before ingestion

  • Generate and control unique identifiers upstream

  • Use _id strategically for ExperienceEvent datasets to avoid duplicate profile events


Key takeaway

AEP is not designed to enforce uniqueness at ingestion time.
Identity is used for profile unification, not data validation.
Duplicates are expected in the Data Lake and resolved logically by Profile Service.

This is the intended and documented behavior of Adobe Experience Platform.

View solution in original post

5 Replies

Avatar

Level 5

Hi @OmarGo2 

 

AEP doesn't work like that ,When you upload that JSON file multiple times, all 3 records are saved as separate entries in the Data Lake. AEP treats these as a history of events or "snapshots" of data.

 

But during ingestion deduplication can happen with the record ID :

Every XDM record has a root-level _id field. If you send two records with the exact same _id in the same batch or streaming request, AEP will treat the second one as a duplicate and may deduplicate it during the ingestion process to save space.

 

AEP is built to ingest data as fast as possible and resolve the "truth" later during the Profile merge.

 

Avatar

Level 2

hi @Vinoth_Govind , I think you're referring to this _id field in a bare bone schema:

In my case, I'm using personID as the identity and primary identity. So I guess _id is null or automatically populated by Adobe. In any case, as you mentioned, Adobe doesn't care if you ingest duplicated records, in a dataset has duplicated records, Adobe will use the most recent data when creating/updating a profile?

Did I get it right?

OmarGo2_0-1767877359774.png

 

Avatar

Level 5

Yes, _id is different from person ID.

 

_id should be unique to each record or event - you have to generate it.

 

Refer https://experienceleague.adobe.com/en/docs/experience-platform/xdm/classes/individual-profile 

Search of _id parameter.

 

When multiple records for the same Identity (Person ID) exist, AEP uses Merge Policies to decide what the "Final Version" of the Profile looks like.

Default Merge policy is timestamped , so yes latest will be be the final one.

Avatar

Employee

It's important to understand the differences between record-based datasets (like those for the Individual Profile Class or custom classes) and time-series datasets (like those for ExperienceEvent Class).

 

The "_id" field is only applicable for time-series datasets where it's a mandatory field. Profile Service will check the combination of the record's primary ID and "_id" value and if already ingested, will ignore those subsequent records. (This only happens in Profile Service, as the data lake's dataset will accept all records regardless.)

 

Record-based data is very different. It's always appended to the data lake's dataset, but in Profile Service it's inserted if it has a new primary identity and updated if the primary identity exists.  Unlike the data lake, there's no history kept for record-based data within Profile Service.

Avatar

Correct answer by
Level 2

Thank you, guys. I'll post what ChatGTP says. I feeded your answers as well. I think this could help other people.

Short answer:

You can’t. Adobe Experience Platform does not support rejecting records or files at ingestion time based on duplicated Identity values. Everything gets ingested into AEP Datalake. 


Why this does not work in AEP

In AEP, Identity fields are not equivalent to primary keys in a database.

  • During ingestion, all records are always accepted into the Data Lake, even if they contain duplicated Identity values.

  • Marking a field as an Identity or Primary Identity does not enforce uniqueness and does not trigger rejection.

Identity in AEP is used for identity resolution and profile unification, not for ingestion validation.


What actually happens under the hood

  1. Data Lake

    • Stores all ingested records as-is.

    • Does not perform deduplication or uniqueness checks.

  2. Identity Service / Identity Graph

    • Processes identity fields and builds relationships between identities.

    • Does not validate uniqueness or block duplicates.

  3. Profile Service (Real-Time Customer Profile)

    • Uses the Identity Graph to unify data.

    • For record-based datasets, it performs an upsert:

      • New identity → creates a profile

      • Existing identity → updates the same profile

    • No duplicate profiles are created, but duplicate records still exist in the Data Lake.

Because of this architecture, AEP never “rejects” duplicated identities—it simply normalizes them at the profile level.


Important exception: ExperienceEvent datasets

For time-series (ExperienceEvent) datasets, the _id field can be used to prevent duplicate events in Profile Service.

  • If the same _id is ingested again, Profile Service may ignore it for profile computation.

  • However, the event is still stored in the Data Lake.

  • This behavior does not apply to identity fields or record-based datasets.


How to handle duplicates if rejection is required

If rejecting duplicates is a strict requirement, it must be handled outside of AEP, for example:

  • Deduplicate data in your ETL or source system before ingestion

  • Generate and control unique identifiers upstream

  • Use _id strategically for ExperienceEvent datasets to avoid duplicate profile events


Key takeaway

AEP is not designed to enforce uniqueness at ingestion time.
Identity is used for profile unification, not data validation.
Duplicates are expected in the Data Lake and resolved logically by Profile Service.

This is the intended and documented behavior of Adobe Experience Platform.