Real-Time Customer Data Platform

AEPuser16 · 5/21/25

Hi everyone,

Issue:
Due to certain issues, data from some files (corresponding to specific dates) is missing in AEP. We need to ingest this missing data, but we are concerned about potential data duplication in the dataset.

Some information on the dataflow setup:

Our source data files are stored in Azure Storage Explorer. These are incremental files, with a new file received daily. Each file is retained in Azure Blob Storage for 7 days before deletion.
In AEP, we are using the Data Landing Zone (DLZ) and connecting to the Azure source via API.
A dataflow has been created to handle incremental data loading from Azure to AEP DLZ via API, and it is currently running on a daily schedule.

Current Setup (Example):

On May 1st, we performed a one-time data load into the dataset via API. After ingestion, we disabled this dataflow.
On May 5th, we created and activated an incremental dataflow for the same dataset. This flow has been running daily and continues to function without issues.
However, data from May 2nd to May 4th is missing in AEP.

We’ve been advised to re-ingest data from May 2nd to the current date to ensure data consistency.
(Example: A customer’s phone number might have changed between May 2nd and today.)

If we re-ingest data from May 2nd onwards, will this overlap with already ingested data (from May 5th onwards) and cause duplicates in the dataset?

We want to ensure the dataset remains accurate, up to date, and free of duplicates.

Any guidance on how to safely manage this re-ingestion process would be greatly appreciated.

Thanks,

AnkitJasani29 · 5/21/25

Hi @AEPuser16 ,

You can re-ingest into the same dataset if the data contains a timestamp field used in the merge policy, as it will not create entirely new records but will update existing ones based on the primary identity.

Alternatively, you can create a new temporary dataset for re-ingestion and then create a dataflow to ingest the missing data from May 2nd to today into this dataset. Once the ingestion is complete, the merge policy will stitch this data with the existing profiles. After successful ingestion, you can disable the one-time dataset.

View solution in original post

AEPuser16 · 5/21/25

This is only for profile data. There is no event data.

AEPuser16 · 5/21/25

We are using default Time-based merge policy. If I create a new dataflow for one-time bulk data load from missing date till today, what about the target dataset ?

Should I use same exiting dataset?(any duplicates will be created?)
or should I create a new dataset, enable it for profile and then once one-time data loading has been done disable dataflow and dataset? (will that data be in sync with existing one?)

Please let me know the approach that I should follow here.

AnkitJasani29 · 5/21/25

Hi @AEPuser16 ,

You can re-ingest into the same dataset if the data contains a timestamp field used in the merge policy, as it will not create entirely new records but will update existing ones based on the primary identity.

Alternatively, you can create a new temporary dataset for re-ingestion and then create a dataflow to ingest the missing data from May 2nd to today into this dataset. Once the ingestion is complete, the merge policy will stitch this data with the existing profiles. After successful ingestion, you can disable the one-time dataset.

kautuk_sahni · 6/26/25

@AEPuser16 Just checking in — were you able to resolve your issue?
We’d love to hear how things worked out. If the suggestion above helped, marking a response as correct can guide others with similar questions. And if you found another solution, feel free to share it — your insights could really benefit the community. Thanks again for being part of the conversation!

Real-Time Customer Data Platform

Re-ingestion of Missing Data Without Causing Duplicates

Kautuk Sahni

Learn

Documentation

Events

Community

Support

Resources

Adobe account

Adobe