Real-Time Customer Data Platform

JmgMe1 · 9/8/25

We have a use case to schedule a dataflow with ~10M records to a profile enabled dataset for 2-3 times a week. Since its a huge dataset what would be the impact of ingesting this on our sandbox? Just to clarify, we’re not introducing new profile attributes each time—this is simply an increase in the frequency of dataset refreshes. Would this change in frequency affect our storage capacity, export processes, etc

itsMeTechy · 9/10/25

from your question am assuming you are ingesting into individual profile class dataset.

from RTCDP perspective, there may not be much impact. profile store will refer to the latest record of the profile as ingested.

from Data lake perspective, yes, there will be impact that huge number of records will be accumulated in dataset, there are license metrics like total data volume which will be impacted.

in general, i do not see any major challenge, but avoid if possible this situation.

if only few attributes with in a profile are changing which is leading you to ingest the full profile attributes frequently, then you can consider isolating only those attributes into another dataset and ingest those attributes. this will avoid ingesting all the attributes frequently.

View solution in original post

itsMeTechy · 9/10/25

from your question am assuming you are ingesting into individual profile class dataset.

from RTCDP perspective, there may not be much impact. profile store will refer to the latest record of the profile as ingested.

from Data lake perspective, yes, there will be impact that huge number of records will be accumulated in dataset, there are license metrics like total data volume which will be impacted.

in general, i do not see any major challenge, but avoid if possible this situation.

if only few attributes with in a profile are changing which is leading you to ingest the full profile attributes frequently, then you can consider isolating only those attributes into another dataset and ingest those attributes. this will avoid ingesting all the attributes frequently.

JmgMe1 · 9/24/25

Thanks for sharing! Yes agreed that profile store might not have any impact, and data lake is storing all the historical records, but what I have noticed it even though the data lake is holding onto all the previous batches the data volume has remained the same.

In our use case all the attributes are updating hence the full load multiple times a week. Thanks anyways.

marc_stowe_za · 9/10/25

Hi

from my perspcective thsi wont have much of an impact from a profile graph / store perspective as it will just add the new ones, but you need to check the data allocation for the instance. I assume thats a refresh of the profile data? if so why not use data distiller to process the data and only insert the deltas, or potentially to remove the previous batch and reload the data (if that is the case) - that can be managed via the APIs. that way data stays relevant. the other downside of reloading the data 3 times a week is that the query tool becomes a nightmare, if you are using that, and you will need to be continually partitioning and ranking the data in your queries.

Licensing most often looks at both profiles in the graph and number of records, so 30 million a week is going to lead you into problems at some point

Sukrity_Wadhwa · 9/22/25

Hi @JmgMe1,

Were you able to resolve this query with the help of the provided solutions, or do you still need further assistance? Please let us know. If any of the answers were helpful in moving you closer to a resolution, even partially, we encourage you to mark the one that helped the most as the 'Correct Reply.'

Thank you!