Expand my Community achievements bar.

Correct "batch size" ?

Avatar

Level 5

Is there a good batch size for an update activity?

There are some workflows that take 5 to 8 hours to ingest/update the data into our schema. 

I'm thinking going from 10,000 to 200,000.  I've tested with 80,000 and 100,000 and everything seems fine.

Is there anything I should be aware of about this big increment?

ogonzalesdiaz_0-1724860170263.png

 

Topics

Topics help categorize Community content and increase your ability to discover relevant content.

2 Replies

Avatar

Level 5

Hi @ogonzalesdiaz ,

 

When considering an increase in batch size for an update activity in ACC, several factors need to be taken into account:

  1. Server Performance: Increasing the batch size significantly, from 10,000 to 200,000, can impact server resources, including CPU and memory. Although your tests with 80,000 and 100,000 have gone smoothly, a larger batch size might strain the system, potentially causing performance degradation or timeouts during peak loads.

  2. Database Load: Large batches can put more load on the database, especially if there are multiple indices or constraints in your schema. This could lead to slower response times or locking issues, which might impact other operations.

  3. Error Handling: With larger batch sizes, the impact of any failures or rollbacks will be more significant. Ensure that your error-handling mechanisms are robust enough to manage these larger batches without losing data.

  4. Processing Time: Consider the impact on the overall workflow duration. While larger batches reduce the number of iterations, they also increase the processing time per batch, which could contribute to longer workflow durations.

  5. Network Bandwidth: If your workflow involves data transfer between servers or databases, the increase in batch size might demand higher network bandwidth. Ensure your infrastructure can support the larger data loads without latency issues.

Before finalizing the increase to 200,000, it’s advisable to conduct further tests under different load conditions to evaluate the performance and reliability comprehensively. You may also consider staggering the batch processing or setting up monitoring to ensure that the larger batches do not negatively impact your system's stability.

 

Best regards,

MEIT MEDIA (https://www.meitmedia.com)

Find us on LinkedIn

Contact Us: infomeitmedia@gmail.com

MeitMedia_0-1724860673328.png

 

 

Avatar

Level 1

Hello @ogonzalesdiaz ,

 

The Batch size field lets you select the number of inbound transition elements to be updated per database transaction. In a single database transaction the number mentioned in Batch size will try to update/insert to database.

 

The default (max) is 10,000 after which it will just iterate to the remainder and then process them as a separate batch call in the activity.

Changing the value can increase or decrease the performance of the database for some operations under certain circumstances.

 

There is no fixed number which can be recommended as best practice. it is dependent on how good the data model is built. It may work fine for 50k in some cases and it may degrade the DB performance  in 20k.

 

Hope this information is helpful.