Hi everyone,
Issue:
Due to certain issues, data from some files (corresponding to specific dates) is missing in AEP. We need to ingest this missing data, but we are concerned about potential data duplication in the dataset.
Some information on the dataflow setup:
Current Setup (Example):
We’ve been advised to re-ingest data from May 2nd to the current date to ensure data consistency.
(Example: A customer’s phone number might have changed between May 2nd and today.)
If we re-ingest data from May 2nd onwards, will this overlap with already ingested data (from May 5th onwards) and cause duplicates in the dataset?
We want to ensure the dataset remains accurate, up to date, and free of duplicates.
Any guidance on how to safely manage this re-ingestion process would be greatly appreciated.
Thanks,
Solved! Go to Solution.
Topics help categorize Community content and increase your ability to discover relevant content.
Views
Replies
Total Likes
Hi @AEPuser16 ,
You can re-ingest into the same dataset if the data contains a timestamp field used in the merge policy, as it will not create entirely new records but will update existing ones based on the primary identity.
Alternatively, you can create a new temporary dataset for re-ingestion and then create a dataflow to ingest the missing data from May 2nd to today into this dataset. Once the ingestion is complete, the merge policy will stitch this data with the existing profiles. After successful ingestion, you can disable the one-time dataset.
This is only for profile data. There is no event data.
Views
Replies
Total Likes
We are using default Time-based merge policy. If I create a new dataflow for one-time bulk data load from missing date till today, what about the target dataset ?
Please let me know the approach that I should follow here.
Views
Replies
Total Likes
Hi @AEPuser16 ,
You can re-ingest into the same dataset if the data contains a timestamp field used in the merge policy, as it will not create entirely new records but will update existing ones based on the primary identity.
Alternatively, you can create a new temporary dataset for re-ingestion and then create a dataflow to ingest the missing data from May 2nd to today into this dataset. Once the ingestion is complete, the merge policy will stitch this data with the existing profiles. After successful ingestion, you can disable the one-time dataset.
@AEPuser16 Just checking in — were you able to resolve your issue?
We’d love to hear how things worked out. If the suggestion above helped, marking a response as correct can guide others with similar questions. And if you found another solution, feel free to share it — your insights could really benefit the community. Thanks again for being part of the conversation!
Views
Replies
Total Likes
Views
Likes
Replies
Views
Likes
Replies