Expand my Community achievements bar.

SOLVED

exam question - data deduplication & ingestion

Avatar

Level 8

I came across that question on exam. 

When the data should undergo deduplication - after ingestion using Data Prep or before ingestion??

As a consultant I would answer that it depends but unfortunately during exam its not possible. 

What is your view on that?

Topics

Topics help categorize Community content and increase your ability to discover relevant content.

1 Accepted Solution

Avatar

Correct answer by
Community Advisor

Hi @Michael_Soprano 

 

As you have mentioned it depends if you would like to prevent duplication for any specific custom event or the overall record ingesting into AEP, to prevent duplication at record level while ingest data in XDM format, XDM schemas provide an _id field(which is also a required field) for experience event class and experience individual profile which prevent event/record duplication in the dataset.

However if you wish to de-duplicate a specific metric as we do for analytics custom events using event serialization CJA offers similar functionality(metric de-duplication) while constructing a data view with report time processing which also makes your changes retroactive.

 

Data Prep will help you construct or re-construct your data on the fly which is ingesting using different sources think of it like performing ETL operations with in AEP, so i would say if ingesting XDM data it would go de-duplication before data prep as _id field would prevent records with same id to be ingested in the platform.

 

would like to hear your view on this as well, also please share for which exam you've faced this question and what answer you've selected.

View solution in original post

1 Reply

Avatar

Correct answer by
Community Advisor

Hi @Michael_Soprano 

 

As you have mentioned it depends if you would like to prevent duplication for any specific custom event or the overall record ingesting into AEP, to prevent duplication at record level while ingest data in XDM format, XDM schemas provide an _id field(which is also a required field) for experience event class and experience individual profile which prevent event/record duplication in the dataset.

However if you wish to de-duplicate a specific metric as we do for analytics custom events using event serialization CJA offers similar functionality(metric de-duplication) while constructing a data view with report time processing which also makes your changes retroactive.

 

Data Prep will help you construct or re-construct your data on the fly which is ingesting using different sources think of it like performing ETL operations with in AEP, so i would say if ingesting XDM data it would go de-duplication before data prep as _id field would prevent records with same id to be ingested in the platform.

 

would like to hear your view on this as well, also please share for which exam you've faced this question and what answer you've selected.