Productio AEM Publisher Instances taking 4 hours to restart | Community
Skip to main content
October 16, 2015
Solved

Productio AEM Publisher Instances taking 4 hours to restart

  • October 16, 2015
  • 10 replies
  • 3564 views

Hi guys,

I am facing a strange issue here, Our production setup consists of  One Author Instance and One dispatcher and two publishers 1 and 2.

Each AEM instance running on AWS  x3 large  servers individually. 

But still with  those high performance machines also AEM instances are taking 4 hours to restart.

Our Production Servers are running from last 6 months. have arounf 10GB  crx-repository folder size. Around 4GB content.

 

Some of the analysis which we had observed with the debugging is , that AEM instance restart process  gets more  for a java process around the below mentioned java API.

we went to the system profiler @ http://<Publisher1>/system/console/profiler and ran quite a few traces.  The Author instance starts OK (within a couple of minutes) and then loops around these three packages:

org.apache.jackrabbit.oak.plugins.segment

org.eclipse.jetty.io.nio

org.eclipse.jetty.server.nio

 

We have 13066lines of:

30.03.2015 09:41:05.157 *WARN* [FelixStartLevel] org.apache.jackrabbit.oak.plugins.segment.file.TarReader Invalid entry checksum at offset 0 in tar file /opt/aem/crx-quickstart/repository/segmentstore/data00204a.tar, skipping...

(the bits in red change with each message.  These run for 25 seconds so that can’t be the direct cause of the delay.

We also have 129lines of the format:

30.03.2015 09:41:31.885 *WARN* [FelixStartLevel] org.apache.jackrabbit.oak.plugins.segment.file.TarReader Invalid graph metadata in tar file /opt/aem/crx-quickstart/repository/segmentstore/data00203b.tar

Is there any cleanup jobs we need to perform on our Production Publisher instances? Is something gets loaded over time in AEM instance, which we need to cleanup periodically?

Any help in this regard is really helpfull.

Thanks

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.
Best answer by Runal_Trivedi

then I would suggest take the repository backup and re-index the repository, I am not 100% sure but it might fix your TAR related issues, follow the article to re-index your Repo - http://www.wemblog.com/2011/12/how-to-rebuild-index-in-cq5-wem.html

- Runal

10 replies

sudheeras
October 16, 2015

It seems you have corrupted content (TAR files) in the repository due to some reason. The errors you have mentioned gives the clue that "data00204a.tar" and "data00203b.tar"  are not properly readable.

joerghoh
Adobe Employee
Adobe Employee
October 16, 2015

I agree to @ShamHC and @bsloki: Please raise a daycare ticket and provide the relevant data like:

  • log files
  • the config zip from the OSGI console
  • Version number plus hotfixes/servicepacks applied.

I thinnk, that this forum is not the right place to support you and analyze any production problems. That's what support is for.

kind regards,
Jörg

Sham_HC
Level 10
October 16, 2015

This needs some deep analysis.  File a daycare ticket & in that attach thread dumps, log files, profiler output.  For now try with [1]

https://helpx.adobe.com/experience-manager/kb/prevent-rapid-repository-growth-caused-by-linkchecker-in-aem-6.html

October 16, 2015

Hi Guys ,

I tried the below solution but no change, But updatig sp2 update has worked and improved the behaviour.

Is that the reason? What is the code reason for our AEM instance is taking such long to come up?

Runal_Trivedi
Level 6
October 16, 2015

and I assume your server administrators have been running TAR optimization regularly, the invalid reference errors in TAR should get removed once you have run TAR optimization. If you have not already tried, I would advice try it once and see if it reduces your start time or not.

Go through following link to know more on performance tuning and optimization:

http://docs.adobe.com/docs/en/cq/5-6-1/core/administering/persistence_managers.html#Optimizing%20Tar%20Files

https://helpx.adobe.com/experience-manager/kb/performancetuningtips.html

http://docs.adobe.com/docs/en/cq/5-6-1/deploying/monitoring_and_maintaining.html#Monitoring%20Performance

October 16, 2015

Thanks Sham.

Can apply the Link checker  fixes in our staging environment directly ?

I could also observer the message in some of the trace logs for our website , "Has reach maximum limit of 500 for twitter links", "Has reach maximum limit of 500 for facebook links" and also "Has reach maximum limit of 500 for Google plus links" etc

So is this also a major cause for our production server restart delays?

And applying the Link Checker fix (https://helpx.adobe.com/experience-manager/kb/prevent-rapid-repository-growth-caused-by-linkchecker-in-aem-6.html) will resolve our issues?

Please let me know.

 

Thanks

October 16, 2015

Thanks for your response.

Its really helpfull.

 

When i tried to manually optimize the TAR optimization process by follow the below steps, I am getting the following error.

Manually optimizing tar files using the JMX Console

  1. Open the CQ Web Console and click the JMX item in the Main menu (http://localhost:4502/system/console/jmx).

  2. Click the Repository MBean for the com.adobe.granite domain (http://localhost:4502/system/console/jmx/com.adobe.granite%3Atype%3DRepository).

  3. Click

        
    startTarOptimization()
    Start Tar PM optimization
  4. To stop the optimization process, click

        
    stopTarOptimization()
    Stop Tar PM optimization

 

ERROR:

javax.jcr.UnsupportedRepositoryOperationException at com.day.crx.sling.server.impl.jmx.ManagedRepository.startTarOptimization(ManagedRepository.java:201) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) at sun.reflect.GeneratedMethodAccessor185.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at javax.management.StandardMBean.invoke(StandardMBean.java:405) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at com.adobe.granite.jmx.internal.JMXConsolePlugin.invoke(JMXConsolePlugin.java:176) at com.adobe.granite.jmx.internal.JMXConsolePlugin.doPost(JMXConsolePlugin.java:134) at javax.servlet.http.HttpServlet.service(HttpServlet.java:641) at javax.servlet.http.HttpServlet.service(HttpServlet.java:722) at org.apache.felix.webconsole.internal.servlet.OsgiManager.service(OsgiManager.java:526) at org.apache.felix.webconsole.internal.servlet.OsgiManager.service(OsgiManager.java:450) at org.apache.felix.http.base.internal.handler.ServletHandler.doHandle(ServletHandler.java:339) at org.apache.felix.http.base.internal.handler.ServletHandler.handle(ServletHandler.java:300) at org.apache.felix.http.base.internal.dispatch.ServletPipeline.handle(ServletPipeline.java:93) at org.apache.felix.http.base.internal.dispatch.InvocationFilterChain.doFilter(InvocationFilterChain.java:50) at org.apache.felix.http.base.internal.dispatch.HttpFilterChain.doFilter(HttpFilterChain.java:31) at org.apache.sling.i18n.impl.I18NFilter.doFilter(I18NFilter.java:128) at org.apache.felix.http.base.internal.handler.FilterHandler.doHandle(FilterHandler.java:108) at org.apache.felix.http.base.internal.handler.FilterHandler.handle(FilterHandler.java:80) at org.apache.felix.http.base.internal.dispatch.InvocationFilterChain.doFilter(InvocationFilterChain.java:46) at org.apache.felix.http.base.internal.dispatch.HttpFilterChain.doFilter(HttpFilterChain.java:31) at com.adobe.granite.license.impl.LicenseCheckFilter.doFilter(LicenseCheckFilter.java:300) at org.apache.felix.http.base.internal.handler.FilterHandler.doHandle(FilterHandler.java:108) at org.apache.felix.http.base.internal.handler.FilterHandler.handle(FilterHandler.java:80) at org.apache.felix.http.base.internal.dispatch.InvocationFilterChain.doFilter(InvocationFilterChain.java:46) at org.apache.felix.http.base.internal.dispatch.HttpFilterChain.doFilter(HttpFilterChain.java:31) at org.apache.felix.http.sslfilter.internal.SslFilter.doFilter(SslFilter.java:55) at org.apache.felix.http.base.internal.handler.FilterHandler.doHandle(FilterHandler.java:108) at org.apache.felix.http.base.internal.handler.FilterHandler.handle(FilterHandler.java:80) at org.apache.felix.http.base.internal.dispatch.InvocationFilterChain.doFilter(InvocationFilterChain.java:46) at org.apache.felix.http.base.internal.dispatch.HttpFilterChain.doFilter(HttpFilterChain.java:31) at org.apache.felix.http.sslfilter.internal.SslFilter.doFilter(SslFilter.java:89) at org.apache.felix.http.base.internal.handler.FilterHandler.doHandle(FilterHandler.java:108) at org.apache.felix.http.base.internal.handler.FilterHandler.handle(FilterHandler.java:80) at org.apache.felix.http.base.internal.dispatch.InvocationFilterChain.doFilter(InvocationFilterChain.java:46) at org.apache.felix.http.base.internal.dispatch.HttpFilterChain.doFilter(HttpFilterChain.java:31) at org.apache.sling.security.impl.ReferrerFilter.doFilter(ReferrerFilter.java:290) at org.apache.felix.http.base.internal.handler.FilterHandler.doHandle(FilterHandler.java:108) at org.apache.felix.http.base.internal.handler.FilterHandler.handle(FilterHandler.java:80) at org.apache.felix.http.base.internal.dispatch.InvocationFilterChain.doFilter(InvocationFilterChain.java:46) at org.apache.felix.http.base.internal.dispatch.HttpFilterChain.doFilter(HttpFilterChain.java:31) at org.apache.sling.featureflags.impl.FeatureManager.doFilter(FeatureManager.java:115) at org.apache.felix.http.base.internal.handler.FilterHandler.doHandle(FilterHandler.java:108) at org.apache.felix.http.base.internal.handler.FilterHandler.handle(FilterHandler.java:80) at org.apache.felix.http.base.internal.dispatch.InvocationFilterChain.doFilter(InvocationFilterChain.java:46) at org.apache.felix.http.base.internal.dispatch.HttpFilterChain.doFilter(HttpFilterChain.java:31) at org.apache.sling.engine.impl.log.RequestLoggerFilter.doFilter(RequestLoggerFilter.java:75) at org.apache.felix.http.base.internal.handler.FilterHandler.doHandle(FilterHandler.java:108) at org.apache.felix.http.base.internal.handler.FilterHandler.handle(FilterHandler.java:80) at org.apache.felix.http.base.internal.dispatch.InvocationFilterChain.doFilter(InvocationFilterChain.java:46) at org.apache.felix.http.base.internal.dispatch.HttpFilterChain.doFilter(HttpFilterChain.java:31) at org.apache.felix.http.base.internal.dispatch.FilterPipeline.dispatch(FilterPipeline.java:76) at org.apache.felix.http.base.internal.dispatch.Dispatcher.dispatch(Dispatcher.java:49) at org.apache.felix.http.base.internal.DispatcherServlet.service(DispatcherServlet.java:67) at javax.servlet.http.HttpServlet.service(HttpServlet.java:722) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:229) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:370) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745)

 

Any help in this regards is really helpfull.

Thanks

Adobe Employee
October 16, 2015

There is no information on the AEM version, but looking at the errors referencing Oak segment store it is a 6.0 with TarMK. Tar optimize does not work for TarMK, it is for TarPM, so the error from tar optimize is expected, though a more useful message would have been more helpful. As suggested already, opening a Daycare ticket is your best option.

Offline compaction may work.  Have a look at: http://docs.adobe.com/docs/en/aem/6-0/deploy/upgrade/microkernels-in-aem-6-0.html  

Lokesh_Shivalingaiah
Level 10
October 16, 2015

Link checker will help to certain extent. But as @Sham mentioned this needs more analysis and please do file a daycare ticket

Runal_Trivedi
Runal_TrivediAccepted solution
Level 6
October 16, 2015

then I would suggest take the repository backup and re-index the repository, I am not 100% sure but it might fix your TAR related issues, follow the article to re-index your Repo - http://www.wemblog.com/2011/12/how-to-rebuild-index-in-cq5-wem.html

- Runal