Expand my Community achievements bar.

Join us on September 25th for a must-attend webinar featuring Adobe Experience Maker winner Anish Raul. Discover how leading enterprises are adopting AI into their workflows securely, responsibly, and at scale.

Impact of ingesting 10M records to a profile enabled dataset multiple times a week

Avatar

Level 1

We have a use case to schedule a dataflow with ~10M records to a profile enabled dataset for 2-3 times a week. Since its a huge dataset what would be the impact of ingesting this on our sandbox? Just to clarify, we’re not introducing new profile attributes each time—this is simply an increase in the frequency of dataset refreshes. Would this change in frequency affect our storage capacity, export processes, etc

Topics

Topics help categorize Community content and increase your ability to discover relevant content.

2 Replies

Avatar

Level 3

from your question am assuming you are ingesting into individual profile class dataset.

 

from RTCDP perspective, there may not be much impact. profile store will refer to the latest record of the profile as ingested. 

 

from Data lake perspective, yes, there will be impact that huge number of records will be accumulated in dataset, there are license metrics like total data volume which will be impacted.

 

in general, i do not see any major challenge, but avoid if possible this situation.

 

if only few attributes with in a profile are changing which is leading you to ingest the full profile attributes frequently, then you can consider isolating only those attributes into another dataset and ingest those attributes. this will avoid ingesting all the attributes frequently.

 

Avatar

Level 2

Hi

 

from my perspcective thsi wont have much of an impact from a profile graph / store perspective as it will just add the new ones, but you need to check the data allocation for the instance. I assume thats a refresh of the profile data? if so why not use data distiller to process the data and only insert the deltas, or potentially to remove the previous batch and reload the data (if that is the case) - that can be managed via the APIs. that way data stays relevant. the other downside of reloading the data 3 times a week is that the query tool becomes a nightmare, if you are using that, and you will need to be continually partitioning and ranking the data in your queries.

 

Licensing most often looks at both profiles in the graph and number of records, so 30 million a week is going to lead you into problems at some point