We recently had an incident where a large amount of content was activated by the authoring community resulting unresponsive publishers. In addition to increasing memory and CPU on the publishers, I'm looking for possible configuration changes which would "throttle" the flush agents so the end users response time won't be severely impacted in the future.
First I thought maybe this was the answer
Tuning the Sling Job Queues
The bulk upload of large assets may be very resource intensive. By default the number of concurrent threads per job queue is equal to the number of CPU cores, which may cause an overall performance impact and high java heap consumption.
It is recommended to not exceed 50% of the cores. To change this value, go to : http://<host>:<port>/system/console/configMgr/org.apache.sling.event.jobs.QueueConfiguration and set queue.maxparallel to a value representing 50% of the CPU cores of the server hosting your AEM instance (eg. for 8 CPU cores, set the value to 4).
My question is, does this affect replication flush agents or just workflow?
Might the Apache Sling Job Queue Configuration -> Queue: com/day/cq/replication/job/{0} be the answer? It has a similar Maximium Parallel Jobs setting. There's also a Priority setting on this config that I haven't tried out.
Thanks
Ned