Expand my Community achievements bar.

Join us on September 25th for a must-attend webinar featuring Adobe Experience Maker winner Anish Raul. Discover how leading enterprises are adopting AI into their workflows securely, responsibly, and at scale.
SOLVED

Re-ingestion of Missing Data Without Causing Duplicates

Avatar

Level 2

Hi everyone,

 

Issue:
Due to certain issues, data from some files (corresponding to specific dates) is missing in AEP. We need to ingest this missing data, but we are concerned about potential data duplication in the dataset.

Some information on the dataflow setup:

  • Our source data files are stored in Azure Storage Explorer. These are incremental files, with a new file received daily. Each file is retained in Azure Blob Storage for 7 days before deletion.
  • In AEP, we are using the Data Landing Zone (DLZ) and connecting to the Azure source via API.
  • A dataflow has been created to handle incremental data loading from Azure to AEP DLZ via API, and it is currently running on a daily schedule.

Current Setup (Example):

  • On May 1st, we performed a one-time data load into the dataset via API. After ingestion, we disabled this dataflow.
  • On May 5th, we created and activated an incremental dataflow for the same dataset. This flow has been running daily and continues to function without issues.
  • However, data from May 2nd to May 4th is missing in AEP.

We’ve been advised to re-ingest data from May 2nd to the current date to ensure data consistency.
(Example: A customer’s phone number might have changed between May 2nd and today.)

If we re-ingest data from May 2nd onwards, will this overlap with already ingested data (from May 5th onwards) and cause duplicates in the dataset?

We want to ensure the dataset remains accurate, up to date, and free of duplicates.

Any guidance on how to safely manage this re-ingestion process would be greatly appreciated.

 

Thanks,

Topics

Topics help categorize Community content and increase your ability to discover relevant content.

1 Accepted Solution

Avatar

Correct answer by
Level 7

Hi @AEPuser16 ,

You can re-ingest into the same dataset if the data contains a timestamp field used in the merge policy, as it will not create entirely new records but will update existing ones based on the primary identity.

Alternatively, you can create a new temporary dataset for re-ingestion and then create a dataflow to ingest the missing data from May 2nd to today into this dataset. Once the ingestion is complete, the merge policy will stitch this data with the existing profiles. After successful ingestion, you can disable the one-time dataset.

 

View solution in original post

4 Replies

Avatar

Level 2

This is only for profile data. There is no event data.

Avatar

Level 2

We are using default Time-based merge policy. If I create a new dataflow for one-time bulk data load from missing date till today, what about the  target dataset ?

  • Should I use same exiting dataset?(any duplicates will be created?)
  • or should I create a new dataset, enable it for profile and then once one-time data loading has been done disable dataflow and dataset? (will that data be in sync with existing one?)

Please let me know the approach that I should follow here. 

Avatar

Correct answer by
Level 7

Hi @AEPuser16 ,

You can re-ingest into the same dataset if the data contains a timestamp field used in the merge policy, as it will not create entirely new records but will update existing ones based on the primary identity.

Alternatively, you can create a new temporary dataset for re-ingestion and then create a dataflow to ingest the missing data from May 2nd to today into this dataset. Once the ingestion is complete, the merge policy will stitch this data with the existing profiles. After successful ingestion, you can disable the one-time dataset.

 

Avatar

Administrator

@AEPuser16 Just checking in — were you able to resolve your issue?
We’d love to hear how things worked out. If the suggestion above helped, marking a response as correct can guide others with similar questions. And if you found another solution, feel free to share it — your insights could really benefit the community. Thanks again for being part of the conversation!



Kautuk Sahni