We do have customized AEM asset portal that uses shared S3 datastore. While replicating assets in bulk amount(600 assets) from author to publish via binary-less replication we have observed that publisher becomes unresponsive and start giving oops page(Error : 500). Publisher automatically recovered when replication queue items reduced. For instance, within 30 sec. author makes around 300-400 POST request to publisher to publish the assets. Also publisher makes request to dispatcher flush to clear cache from dispatcher. Sometimes we observed around 1500+ items in replication queue which makes publisher unresponsive.
How we can increase throughput of binary-less replication?
Is there any way to make replication in chunks or batches?
How we can tune-up our publisher to handle bulk request?
your scenario is not really clear to me. I understand that you are replicating hundreds of assets in bulk and that during that time the publish instance is unable to respond to other incoming requests.
Let me comment some of your statements:
within 30 seconds 400-500 POST requests: That makes me think that the replication itself is quite quick. How do you trigger the replication on author? Using the standard "replicator.replicate()" method or do you use synchronous replication with many threads in parallel? The first one will only replicate one asset at a time, while in the second case you will have multiple replication requests at once.
You should validate the dispatcher logs and see what is happening. Do a few threaddumps and check what's happening. And what is the exception message of these "internal server error" requests you mentioned?
If you can answer these questions we can help you further.