I have a qn regarding distributed events and Jobs in AEM authoring cluster.
1. I have a piece of code which creates a job for a topic and a JobConsumer associated to the respective topic to process the job. If this code executes in a cluster mode and I have whitelisted the topic on all the servers in the cluster should the job be run on all the servers of cluster or ONLY on the server which receives the request?
2. If I have an 'ResourceChangeListener' (also implemented is the marker interface 'ExternalResourceChangeListener') listening to the resource ADDED events, will this listener will be executed on all the servers in the cluster when the change is propagated? or only on master?
My expectation was since the writes happens on the master even the request is served by a non-master instance, the events should be generated and listened to on master. But not seeing this behavior in our environment. I have not seen a proper documentation on the expected behavior online.
Any suggestions would be very helpful to get my understanding correct.
Regarding 1) it will be executed only on one node. Probably the node which received the event (haven't tested it).
Regarding 2) in AEM 6.x (rather: Oak) there is no longer the notion of a cluster master or leader, which does the writes to the persistent storage (Mongo/RDB). Instead all cluster nodes are treated equally, and every cluster node is able to write to the persistent store.
Thanks for your response. Regarding 1 and 2, the behavior that I am noticing matches.
We have a customized rollout and from a single page on selection of global page and all the locales for that page the selected page would be rolled out. We have implemented this based on sling jobs, i,e. one sling job for each selected locale. Now what is happening is that in cluster mode if the request comes to master this process is very quick but if the request comes to a non master, a simple request is taking close to any where between 3-10 mins. If I select ALL locales (28 in our case) the process keeps running close to 20 mins to half an hr on a non-master instance. Is there a way to force to run the job ONLY on master every time? I have tried adding a logic to schedule the jobs only if the current instance is master, but this is not a valid case, since if the request comes to a non-master the process is not triggered at all.
We have two authors and at a point one is the leader (master) and other just calling it as a non-master. This is based on the information under topology screen.
These two are fronted by two webservers - one for each author server, which are again load-balanced via a vip. Now when the request comes via VIP, the request can land on any one of the servers. The behavior noted above is based on when the VIP decides to route the request to the 'leader' vs 'non-leader'.
Sling Jobs executing on non-leader are performing slowly compared to the same jobs executing on leader.
Hm, what version of AEM are you using?
Your observation might be caused by many different factors; can you please validate with threaddumps what's happening on the slow slave instance when a job is running (or is supposed to run)?