Re-ingestion of Missing Data Without Causing Duplicates | Community
Skip to main content
Level 2
May 21, 2025
Solved

Re-ingestion of Missing Data Without Causing Duplicates

  • May 21, 2025
  • 4 replies
  • 500 views

Hi everyone,

 

Issue:
Due to certain issues, data from some files (corresponding to specific dates) is missing in AEP. We need to ingest this missing data, but we are concerned about potential data duplication in the dataset.

Some information on the dataflow setup:

  • Our source data files are stored in Azure Storage Explorer. These are incremental files, with a new file received daily. Each file is retained in Azure Blob Storage for 7 days before deletion.
  • In AEP, we are using the Data Landing Zone (DLZ) and connecting to the Azure source via API.
  • A dataflow has been created to handle incremental data loading from Azure to AEP DLZ via API, and it is currently running on a daily schedule.

Current Setup (Example):

  • On May 1st, we performed a one-time data load into the dataset via API. After ingestion, we disabled this dataflow.
  • On May 5th, we created and activated an incremental dataflow for the same dataset. This flow has been running daily and continues to function without issues.
  • However, data from May 2nd to May 4th is missing in AEP.

We’ve been advised to re-ingest data from May 2nd to the current date to ensure data consistency.
(Example: A customer’s phone number might have changed between May 2nd and today.)

If we re-ingest data from May 2nd onwards, will this overlap with already ingested data (from May 5th onwards) and cause duplicates in the dataset?

We want to ensure the dataset remains accurate, up to date, and free of duplicates.

Any guidance on how to safely manage this re-ingestion process would be greatly appreciated.

 

Thanks,

Best answer by AnkitJasani29

Hi @aepuser16 ,

You can re-ingest into the same dataset if the data contains a timestamp field used in the merge policy, as it will not create entirely new records but will update existing ones based on the primary identity.

Alternatively, you can create a new temporary dataset for re-ingestion and then create a dataflow to ingest the missing data from May 2nd to today into this dataset. Once the ingestion is complete, the merge policy will stitch this data with the existing profiles. After successful ingestion, you can disable the one-time dataset.

 

4 replies

AEPuser16Author
Level 2
May 21, 2025

This is only for profile data. There is no event data.

AEPuser16Author
Level 2
May 21, 2025

We are using default Time-based merge policy. If I create a new dataflow for one-time bulk data load from missing date till today, what about the  target dataset ?

  • Should I use same exiting dataset?(any duplicates will be created?)
  • or should I create a new dataset, enable it for profile and then once one-time data loading has been done disable dataflow and dataset? (will that data be in sync with existing one?)

Please let me know the approach that I should follow here. 

AnkitJasani29
AnkitJasani29Accepted solution
Level 6
May 22, 2025

Hi @aepuser16 ,

You can re-ingest into the same dataset if the data contains a timestamp field used in the merge policy, as it will not create entirely new records but will update existing ones based on the primary identity.

Alternatively, you can create a new temporary dataset for re-ingestion and then create a dataflow to ingest the missing data from May 2nd to today into this dataset. Once the ingestion is complete, the merge policy will stitch this data with the existing profiles. After successful ingestion, you can disable the one-time dataset.

 

kautuk_sahni
Community Manager
Community Manager
June 26, 2025

@aepuser16 Just checking in — were you able to resolve your issue?
We’d love to hear how things worked out. If the suggestion above helped, marking a response as correct can guide others with similar questions. And if you found another solution, feel free to share it — your insights could really benefit the community. Thanks again for being part of the conversation!

Kautuk Sahni