Expand my Community achievements bar.

SOLVED

Update activity of 24M records taking 3 days

Avatar

Level 6

One of our workflows, started on 08/28 and we estimate it is going to finish on 01/09, 04 days after it started.

It loaded 24M records and it has to update all of them. 

The time is spent on the update activity that has 200,000 as batch size. It targets the recipients table.

Is this normal? Is there anything else we can do to improve it?



1 Accepted Solution

Avatar

Correct answer by
Community Advisor

Hello @god_prophet  
Here are things I would look at to optimize the WF:

  • What fields are actually updated and remove the fields where data is not changing
  • Can this be changes to incremental load instead of full load?
  • I would change the batch size to something lower
  • Do you have data manipulation happening with expression in Update data activity?
  • Are there any conditions in "Taken into account" section of update data activity?
  • Use Primary /Foreign Keys in the reconciliation key for updating the data. 

 


     Manoj
     Find me on LinkedIn

View solution in original post

9 Replies

Avatar

Level 5

Hi @god_prophet ,

 

The workflow duration you mentioned, running from 08/28 and expected to finish on 01/09, indicates a significant processing time, especially when dealing with 24 million records and updating the recipients table. Here's some suggestions for improvement:

1. Batch Size Considerations: The batch size of 200,000 is relatively large. While it reduces the number of batch executions, it also increases the load on the database, potentially leading to locking issues or higher resource consumption.
Recommendation: Try reducing the batch size to something like 50,000 or 100,000. This can lead to more frequent but faster executions, as smaller batches typically lock fewer records and reduce contention.
2. Database Indexing: Ensure that the fields being updated in the recipients table are indexed. Without proper indexing, the database might perform full table scans, significantly slowing down the update process.
Recommendation: Review the fields targeted by the update activity and confirm that appropriate indexes are in place. Consider adding indexes temporarily if needed.
3. Workflow Optimization: Parallelization: If your server has sufficient resources, consider splitting the workload across multiple workflows running in parallel, each handling a subset of the data. This can drastically reduce processing time.
Filtered Updates: If possible, apply filters to minimize the number of records being updated. For example, update only the records that have actually changed rather than all 24 million.
4. Database Maintenance: Large-scale updates can cause database fragmentation or impact performance over time. Ensure regular maintenance tasks like reindexing and optimizing are scheduled to maintain optimal database performance.
5. Monitoring and Logs: Monitor the database and workflow logs to identify any specific bottlenecks or issues during the update process. This can help pinpoint whether the delay is due to database performance, workflow logic, or other factors.
6. System Resources: Ensure that the system resources (CPU, memory, I/O) are not being maxed out. If they are, consider scaling up the resources temporarily to handle the heavy processing load.

While updating 24 million records is intensive and can take time, adjusting the batch size, ensuring proper indexing, and possibly parallelizing the workload should help improve performance. Keep an eye on database and system resources to ensure they are not becoming bottlenecks.

 

Best regards,

MEIT MEDIA (https://www.meitmedia.com)

Find us on LinkedIn

Contact Us: infomeitmedia@gmail.com

MeitMedia_0-1725211520524.png

 

 

Avatar

Level 6

After talking to Adobe Support, they sent me a list of other workflows that needed to be stopped. 

We went for a batch of 2K  every 4 min, to a batch of 600K and lastly 800K.  So apparently, stopping other workflows can increase the batch size proceesed , even making it higher than what was originally set.  Or maybe their infra team increased temporarly the "processing power" for this workflow?

09/01/2024 11:09:54 PM [22,802,606/24,882,997] record(s) processed 
09/01/2024 11:02:59 PM [22,002,606/24,882,997] record(s) processed 
09/01/2024 10:56:58 PM [21,402,606/24,882,997] record(s) processed 
09/01/2024 10:49:27 PM [20,996,846/24,882,997] record(s) processed
09/01/2024 10:44:27 PM [20,988,736/24,882,997] record(s) processed 
09/01/2024 10:39:27 PM [20,980,830/24,882,997] record(s) processed 
09/01/2024 10:34:27 PM [20,972,518/24,882,997] record(s) processed 
09/01/2024 10:29:27 PM [20,964,677/24,882,997] record(s) processed 
09/01/2024 10:24:27 PM [20,961,927/24,882,997] record(s) processed 
09/01/2024 10:19:27 PM [20,959,211/24,882,997] record(s) processed 
09/01/2024 10:14:27 PM [20,956,533/24,882,997] record(s) processed
09/01/2024 10:09:27 PM [20,953,907/24,882,997] record(s) processed 
09/01/2024 10:04:27 PM [20,951,699/24,882,997] record(s) processed 
09/01/2024 9:59:27 PM [20,949,897/24,882,997] record(s) processed 
09/01/2024 9:54:27 PM [20,948,313/24,882,997] record(s) processed 
09/01/2024 9:49:27 PM [20,947,030/24,882,997] record(s) processed 

Avatar

Community Advisor

Hello @god_prophet  Yes, this make sense. They just asked you to free up some resources from other workflows so that this workflow can use those resources.

But I wouldn't recommend doing this again and again. There could be scenarios where the other workflows are important as well and it won't be always possible to free up resources for this workflow.


     Manoj
     Find me on LinkedIn

@god_prophet Also one thought would be running them out side of the actual non business hours where the resources are a bit free.  

Avatar

Correct answer by
Community Advisor

Hello @god_prophet  
Here are things I would look at to optimize the WF:

  • What fields are actually updated and remove the fields where data is not changing
  • Can this be changes to incremental load instead of full load?
  • I would change the batch size to something lower
  • Do you have data manipulation happening with expression in Update data activity?
  • Are there any conditions in "Taken into account" section of update data activity?
  • Use Primary /Foreign Keys in the reconciliation key for updating the data. 

 


     Manoj
     Find me on LinkedIn

Avatar

Level 6

Hi Manoj, about: 

  • Can this be changes to incremental load instead of full load?

What would be the best approache to do incremental loading? Using a split activity? Filtering by an incremental PK column?

Ty

Avatar

Community Advisor

Hello @god_prophet

 

Look at the recently created/modified data in the source system for incremental load.


     Manoj
     Find me on LinkedIn

Avatar

Level 6

Oh yes, we do only update some records, the ones that changed normally. I just happened to need to load all the records this time. 

I thought you're mentionend other functionality besides batch size. 

Avatar

Level 4

Hi @god_prophet wondering to know how solved / tackled this issue.

 

Were you able to use any specific process and would like to share here. I am curious to know outcome of this discussions.

 

BR, Hari Krishna D