I am using a event type dataset for customer AI instance which has a daily data ingest frequency and hence contains duplicate rows of data with latest timestamp. So, will the duplicate records affect my model predictions?
(Judging from the influential factors, I do think the results might be influenced by duplicate records)
Solved! Go to Solution.
Topics help categorize Community content and increase your ability to discover relevant content.
Views
Replies
Total Likes
@Travis_Jordan @Parvesh_Parmar @DavidRoss91 @vishnuunnikrishnan Request you to please look at this question and share your thoughts.
Views
Replies
Total Likes
@_Manoj_Kumar_ @nnakirikanti @dhanesh04s @an1989 @renatoz28 @brekrut @ccg1706 @somen-sarkar @saswataghosh @NickMannion Kindly take a moment to review this question and share your valuable insights. Your expertise would be greatly appreciated!
Views
Replies
Total Likes
Hi @_Manoj_Kumar_ , thanks for your reply.
I also confirmed with Adobe support, they said duplicate data will indeed affect model predictions. And currently there is no de-dupe logic as use cases might varry for different businesses.
Thanks
Views
Replies
Total Likes
Can I ask why the source is producing duplicate rows of data with the latest timestamp?
If the data is not changing why is the data being re-ingested?
Hi @brekrut , thanks for your reply.
Source is set to update its incremental date field to latest date everyday. It was as per client's requirement. Client did not want to loose on old data or users who are inactive for few months (as TTL is applied).
New data gets generated everyday and historical data + new data is ingested everyday.
Thanks
Views
Replies
Total Likes
Views
Likes
Replies
Views
Likes
Replies