Update activity of 24M records taking 3 days | Community
Skip to main content
Level 6
September 1, 2024
Solved

Update activity of 24M records taking 3 days

  • September 1, 2024
  • 3 replies
  • 1367 views

One of our workflows, started on 08/28 and we estimate it is going to finish on 01/09, 04 days after it started.

It loaded 24M records and it has to update all of them. 

The time is spent on the update activity that has 200,000 as batch size. It targets the recipients table.

Is this normal? Is there anything else we can do to improve it?



This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.
Best answer by Manoj_Kumar

Hello @god_prophet  
Here are things I would look at to optimize the WF:

  • What fields are actually updated and remove the fields where data is not changing
  • Can this be changes to incremental load instead of full load?
  • I would change the batch size to something lower
  • Do you have data manipulation happening with expression in Update data activity?
  • Are there any conditions in "Taken into account" section of update data activity?
  • Use Primary /Foreign Keys in the reconciliation key for updating the data. 

 

3 replies

MeitMedia
Level 4
September 1, 2024

Hi @god_prophet ,

 

The workflow duration you mentioned, running from 08/28 and expected to finish on 01/09, indicates a significant processing time, especially when dealing with 24 million records and updating the recipients table. Here's some suggestions for improvement:

1. Batch Size Considerations: The batch size of 200,000 is relatively large. While it reduces the number of batch executions, it also increases the load on the database, potentially leading to locking issues or higher resource consumption.
Recommendation: Try reducing the batch size to something like 50,000 or 100,000. This can lead to more frequent but faster executions, as smaller batches typically lock fewer records and reduce contention.
2. Database Indexing: Ensure that the fields being updated in the recipients table are indexed. Without proper indexing, the database might perform full table scans, significantly slowing down the update process.
Recommendation: Review the fields targeted by the update activity and confirm that appropriate indexes are in place. Consider adding indexes temporarily if needed.
3. Workflow Optimization: Parallelization: If your server has sufficient resources, consider splitting the workload across multiple workflows running in parallel, each handling a subset of the data. This can drastically reduce processing time.
Filtered Updates: If possible, apply filters to minimize the number of records being updated. For example, update only the records that have actually changed rather than all 24 million.
4. Database Maintenance: Large-scale updates can cause database fragmentation or impact performance over time. Ensure regular maintenance tasks like reindexing and optimizing are scheduled to maintain optimal database performance.
5. Monitoring and Logs: Monitor the database and workflow logs to identify any specific bottlenecks or issues during the update process. This can help pinpoint whether the delay is due to database performance, workflow logic, or other factors.
6. System Resources: Ensure that the system resources (CPU, memory, I/O) are not being maxed out. If they are, consider scaling up the resources temporarily to handle the heavy processing load.

While updating 24 million records is intensive and can take time, adjusting the batch size, ensuring proper indexing, and possibly parallelizing the workload should help improve performance. Keep an eye on database and system resources to ensure they are not becoming bottlenecks.

 

Best regards,

MEIT MEDIA (https://www.meitmedia.com)

Find us on LinkedIn

Contact Us: infomeitmedia@gmail.com

 

 

Level 6
September 2, 2024

After talking to Adobe Support, they sent me a list of other workflows that needed to be stopped. 

We went for a batch of 2K  every 4 min, to a batch of 600K and lastly 800K.  So apparently, stopping other workflows can increase the batch size proceesed , even making it higher than what was originally set.  Or maybe their infra team increased temporarly the "processing power" for this workflow?

09/01/2024 11:09:54 PM [22,802,606/24,882,997] record(s) processed 
09/01/2024 11:02:59 PM [22,002,606/24,882,997] record(s) processed 
09/01/2024 10:56:58 PM [21,402,606/24,882,997] record(s) processed 
09/01/2024 10:49:27 PM [20,996,846/24,882,997] record(s) processed
09/01/2024 10:44:27 PM [20,988,736/24,882,997] record(s) processed 
09/01/2024 10:39:27 PM [20,980,830/24,882,997] record(s) processed 
09/01/2024 10:34:27 PM [20,972,518/24,882,997] record(s) processed 
09/01/2024 10:29:27 PM [20,964,677/24,882,997] record(s) processed 
09/01/2024 10:24:27 PM [20,961,927/24,882,997] record(s) processed 
09/01/2024 10:19:27 PM [20,959,211/24,882,997] record(s) processed 
09/01/2024 10:14:27 PM [20,956,533/24,882,997] record(s) processed
09/01/2024 10:09:27 PM [20,953,907/24,882,997] record(s) processed 
09/01/2024 10:04:27 PM [20,951,699/24,882,997] record(s) processed 
09/01/2024 9:59:27 PM [20,949,897/24,882,997] record(s) processed 
09/01/2024 9:54:27 PM [20,948,313/24,882,997] record(s) processed 
09/01/2024 9:49:27 PM [20,947,030/24,882,997] record(s) processed 

Manoj_Kumar
Community Advisor
Community Advisor
September 2, 2024

Hello @god_prophet  Yes, this make sense. They just asked you to free up some resources from other workflows so that this workflow can use those resources.

But I wouldn't recommend doing this again and again. There could be scenarios where the other workflows are important as well and it won't be always possible to free up resources for this workflow.

Manoj  | https://themartech.pro
Manoj_Kumar
Community Advisor
Manoj_KumarCommunity AdvisorAccepted solution
Community Advisor
September 1, 2024

Hello @god_prophet  
Here are things I would look at to optimize the WF:

  • What fields are actually updated and remove the fields where data is not changing
  • Can this be changes to incremental load instead of full load?
  • I would change the batch size to something lower
  • Do you have data manipulation happening with expression in Update data activity?
  • Are there any conditions in "Taken into account" section of update data activity?
  • Use Primary /Foreign Keys in the reconciliation key for updating the data. 

 

Manoj  | https://themartech.pro
Level 6
September 2, 2024

Hi Manoj, about: 

  • Can this be changes to incremental load instead of full load?

What would be the best approache to do incremental loading? Using a split activity? Filtering by an incremental PK column?

Ty

Manoj_Kumar
Community Advisor
Community Advisor
September 2, 2024

Hello @god_prophet

 

Look at the recently created/modified data in the source system for incremental load.

Manoj  | https://themartech.pro
harikrishnadevanabowina
Level 4
September 13, 2024

Hi @god_prophet wondering to know how solved / tackled this issue.

 

Were you able to use any specific process and would like to share here. I am curious to know outcome of this discussions.

 

BR, Hari Krishna D