One of our workflows, started on 08/28 and we estimate it is going to finish on 01/09, 04 days after it started.
It loaded 24M records and it has to update all of them.
The time is spent on the update activity that has 200,000 as batch size. It targets the recipients table.
Is this normal? Is there anything else we can do to improve it?
Solved! Go to Solution.
Views
Replies
Total Likes
Hello @ogonzalesdiaz
Here are things I would look at to optimize the WF:
Hi @ogonzalesdiaz ,
The workflow duration you mentioned, running from 08/28 and expected to finish on 01/09, indicates a significant processing time, especially when dealing with 24 million records and updating the recipients table. Here's some suggestions for improvement:
1. Batch Size Considerations: The batch size of 200,000 is relatively large. While it reduces the number of batch executions, it also increases the load on the database, potentially leading to locking issues or higher resource consumption.
Recommendation: Try reducing the batch size to something like 50,000 or 100,000. This can lead to more frequent but faster executions, as smaller batches typically lock fewer records and reduce contention.
2. Database Indexing: Ensure that the fields being updated in the recipients table are indexed. Without proper indexing, the database might perform full table scans, significantly slowing down the update process.
Recommendation: Review the fields targeted by the update activity and confirm that appropriate indexes are in place. Consider adding indexes temporarily if needed.
3. Workflow Optimization: Parallelization: If your server has sufficient resources, consider splitting the workload across multiple workflows running in parallel, each handling a subset of the data. This can drastically reduce processing time.
Filtered Updates: If possible, apply filters to minimize the number of records being updated. For example, update only the records that have actually changed rather than all 24 million.
4. Database Maintenance: Large-scale updates can cause database fragmentation or impact performance over time. Ensure regular maintenance tasks like reindexing and optimizing are scheduled to maintain optimal database performance.
5. Monitoring and Logs: Monitor the database and workflow logs to identify any specific bottlenecks or issues during the update process. This can help pinpoint whether the delay is due to database performance, workflow logic, or other factors.
6. System Resources: Ensure that the system resources (CPU, memory, I/O) are not being maxed out. If they are, consider scaling up the resources temporarily to handle the heavy processing load.
While updating 24 million records is intensive and can take time, adjusting the batch size, ensuring proper indexing, and possibly parallelizing the workload should help improve performance. Keep an eye on database and system resources to ensure they are not becoming bottlenecks.
Best regards,
MEIT MEDIA (https://www.meitmedia.com)
Contact Us: infomeitmedia@gmail.com
Views
Replies
Total Likes
After talking to Adobe Support, they sent me a list of other workflows that needed to be stopped.
We went for a batch of 2K every 4 min, to a batch of 600K and lastly 800K. So apparently, stopping other workflows can increase the batch size proceesed , even making it higher than what was originally set. Or maybe their infra team increased temporarly the "processing power" for this workflow?
09/01/2024 11:09:54 PM [22,802,606/24,882,997] record(s) processed
09/01/2024 11:02:59 PM [22,002,606/24,882,997] record(s) processed
09/01/2024 10:56:58 PM [21,402,606/24,882,997] record(s) processed
09/01/2024 10:49:27 PM [20,996,846/24,882,997] record(s) processed
09/01/2024 10:44:27 PM [20,988,736/24,882,997] record(s) processed
09/01/2024 10:39:27 PM [20,980,830/24,882,997] record(s) processed
09/01/2024 10:34:27 PM [20,972,518/24,882,997] record(s) processed
09/01/2024 10:29:27 PM [20,964,677/24,882,997] record(s) processed
09/01/2024 10:24:27 PM [20,961,927/24,882,997] record(s) processed
09/01/2024 10:19:27 PM [20,959,211/24,882,997] record(s) processed
09/01/2024 10:14:27 PM [20,956,533/24,882,997] record(s) processed
09/01/2024 10:09:27 PM [20,953,907/24,882,997] record(s) processed
09/01/2024 10:04:27 PM [20,951,699/24,882,997] record(s) processed
09/01/2024 9:59:27 PM [20,949,897/24,882,997] record(s) processed
09/01/2024 9:54:27 PM [20,948,313/24,882,997] record(s) processed
09/01/2024 9:49:27 PM [20,947,030/24,882,997] record(s) processed
Hello @ogonzalesdiaz Yes, this make sense. They just asked you to free up some resources from other workflows so that this workflow can use those resources.
But I wouldn't recommend doing this again and again. There could be scenarios where the other workflows are important as well and it won't be always possible to free up resources for this workflow.
@ogonzalesdiaz Also one thought would be running them out side of the actual non business hours where the resources are a bit free.
Hello @ogonzalesdiaz
Here are things I would look at to optimize the WF:
Hi Manoj, about:
What would be the best approache to do incremental loading? Using a split activity? Filtering by an incremental PK column?
Ty
Views
Replies
Total Likes
Hello @ogonzalesdiaz
Look at the recently created/modified data in the source system for incremental load.
Oh yes, we do only update some records, the ones that changed normally. I just happened to need to load all the records this time.
I thought you're mentionend other functionality besides batch size.
Views
Replies
Total Likes
Hi @ogonzalesdiaz wondering to know how solved / tackled this issue.
Were you able to use any specific process and would like to share here. I am curious to know outcome of this discussions.
BR, Hari Krishna D
Views
Replies
Total Likes
Views
Likes
Replies
Views
Likes
Replies
Views
Likes
Replies