Author and Publish instance becomes unresponsive after deployment
Hi,
We are facing following two unusual environment issues in our AMS managed stage environments:
- With deployment either of author or publisher instances becomes unresponsive and service restart is the only option to bring them back-up
- Deployment takes longer time 30-40 minutes
After Analyzing logs we found that AEM restarts many of its services and sometimes these restart of services is multiple times. This is one of the reason for long deployment duration we are facing.
Our deployment package size is 13MB. On a fresh instance when I tried it took just 5 minutes to get everything installed and I did't notice any restart of services as well.
Many times one continuous exception we notice in logs when servers go unresponsive is "AuthenticationSupport service missing":
org.apache.sling.engine.impl.SlingHttpContext handleSecurity: AuthenticationSupport service missing. Cannot authenticate request. org.apache.sling.engine.impl.SlingHttpContext handleSecurity: Possible reason is missing Repository service. Check AuthenticationSupport dependencies.
We are working with AMS on this and a ticket is already opened with them.
AMS suggested few changes and we applied them. For example updated "Apache Sling Job Manager" config to 300 from 30(Default) to make sure there is an optimum delay before the jobs start. This was done as sling jobs were getting started before the sling event bundle was active and cleared everything under "/var/eventing/jobs".
They are still investigating the issue but I thought to share the details here as well in order to understand if anybody else is also faced/facing similar issue and how did they resolve it.
Any inputs here will really be helpful.
Thanks,
Bhawesh