Author and Publish instance becomes unresponsive after deployment

Avatar

Avatar

bhawesh-dandona

Avatar

bhawesh-dandona

bhawesh-dandona

09-08-2020

Hi,

 

We are facing following two unusual environment issues in our AMS managed stage environments:

  • With deployment either of author or publisher instances becomes unresponsive and service restart is the only option to bring them back-up
  • Deployment takes longer time 30-40 minutes

After Analyzing logs we found that AEM restarts many of its services and sometimes these restart of services is multiple times. This is one of the reason for long deployment duration we are facing.

 

Our deployment package size is 13MB. On a fresh instance when I tried it took just 5 minutes to get everything installed and I did't notice any restart of services as well.

 

Many times one continuous exception we notice in logs when servers go unresponsive is "AuthenticationSupport service missing":

org.apache.sling.engine.impl.SlingHttpContext handleSecurity: AuthenticationSupport service missing. Cannot authenticate request. org.apache.sling.engine.impl.SlingHttpContext handleSecurity: Possible reason is missing Repository service. Check AuthenticationSupport dependencies.

 

We are working with AMS on this and a ticket is already opened with them.

 

AMS suggested few changes and we applied them. For example updated "Apache Sling Job Manager" config to 300 from 30(Default) to make sure there is an optimum delay before the jobs start. This was done as sling jobs were getting started before the sling event bundle was active and cleared everything under "/var/eventing/jobs".

 

They are still investigating the issue but I thought to share the details here as well in order to understand if anybody else is also faced/facing similar issue and how did they resolve it.

 

Any inputs here will really be helpful.

 

Thanks,

Bhawesh

Accepted Solutions (1)

Accepted Solutions (1)

Avatar

Avatar

SundeepKatepally

Avatar

SundeepKatepally

SundeepKatepally

09-08-2020

Its a known issue, Adobe is looking for the solution. Below is the reason why instance is becoming responsive

It tries to restart all the dependent bundles that the deployed custom bundle(13 MB one) is referring. Let say , custom bundle is referring to 20 OOTB bundles, then it doesn't mean it restarts 21 bundles , it might restart around 400-500 bundles as the 20 bundles further depend on few other bundles(chain of dependencies).

 

All together deploying the custom bundle is equivalent to restarting the instance itself. Indeed what we observed is restart is faster than the waiting till bundles are restarting.

Answers (0)