Expand my Community achievements bar.

SOLVED

Productio AEM Publisher Instances taking 4 hours to restart

Avatar

Former Community Member

Hi guys,

I am facing a strange issue here, Our production setup consists of  One Author Instance and One dispatcher and two publishers 1 and 2.

Each AEM instance running on AWS  x3 large  servers individually. 

But still with  those high performance machines also AEM instances are taking 4 hours to restart.

Our Production Servers are running from last 6 months. have arounf 10GB  crx-repository folder size. Around 4GB content.

 

Some of the analysis which we had observed with the debugging is , that AEM instance restart process  gets more  for a java process around the below mentioned java API.

we went to the system profiler @ http://<Publisher1>/system/console/profiler and ran quite a few traces.  The Author instance starts OK (within a couple of minutes) and then loops around these three packages:

org.apache.jackrabbit.oak.plugins.segment

org.eclipse.jetty.io.nio

org.eclipse.jetty.server.nio

 

We have 13066lines of:

30.03.2015 09:41:05.157 *WARN* [FelixStartLevel] org.apache.jackrabbit.oak.plugins.segment.file.TarReader Invalid entry checksum at offset 0 in tar file /opt/aem/crx-quickstart/repository/segmentstore/data00204a.tar, skipping...

(the bits in red change with each message.  These run for 25 seconds so that can’t be the direct cause of the delay.

We also have 129lines of the format:

30.03.2015 09:41:31.885 *WARN* [FelixStartLevel] org.apache.jackrabbit.oak.plugins.segment.file.TarReader Invalid graph metadata in tar file /opt/aem/crx-quickstart/repository/segmentstore/data00203b.tar

Is there any cleanup jobs we need to perform on our Production Publisher instances? Is something gets loaded over time in AEM instance, which we need to cleanup periodically?

Any help in this regard is really helpfull.

Thanks

1 Accepted Solution

Avatar

Correct answer by
Community Advisor

then I would suggest take the repository backup and re-index the repository, I am not 100% sure but it might fix your TAR related issues, follow the article to re-index your Repo - http://www.wemblog.com/2011/12/how-to-rebuild-index-in-cq5-wem.html

- Runal

View solution in original post

10 Replies

Avatar

Level 1

It seems you have corrupted content (TAR files) in the repository due to some reason. The errors you have mentioned gives the clue that "data00204a.tar" and "data00203b.tar"  are not properly readable.

Avatar

Employee Advisor

I agree to @ShamHC and @bsloki: Please raise a daycare ticket and provide the relevant data like:

  • log files
  • the config zip from the OSGI console
  • Version number plus hotfixes/servicepacks applied.

I thinnk, that this forum is not the right place to support you and analyze any production problems. That's what support is for.

kind regards,
Jörg

Avatar

Level 10

This needs some deep analysis.  File a daycare ticket & in that attach thread dumps, log files, profiler output.  For now try with [1]

https://helpx.adobe.com/experience-manager/kb/prevent-rapid-repository-growth-caused-by-linkchecker-...

Avatar

Former Community Member

Hi Guys ,

I tried the below solution but no change, But updatig sp2 update has worked and improved the behaviour.

Is that the reason? What is the code reason for our AEM instance is taking such long to come up?

Avatar

Community Advisor

and I assume your server administrators have been running TAR optimization regularly, the invalid reference errors in TAR should get removed once you have run TAR optimization. If you have not already tried, I would advice try it once and see if it reduces your start time or not.

Go through following link to know more on performance tuning and optimization:

http://docs.adobe.com/docs/en/cq/5-6-1/core/administering/persistence_managers.html#Optimizing%20Tar...

https://helpx.adobe.com/experience-manager/kb/performancetuningtips.html

http://docs.adobe.com/docs/en/cq/5-6-1/deploying/monitoring_and_maintaining.html#Monitoring%20Perfor...

Avatar

Former Community Member

Thanks Sham.

Can apply the Link checker  fixes in our staging environment directly ?

I could also observer the message in some of the trace logs for our website , "Has reach maximum limit of 500 for twitter links", "Has reach maximum limit of 500 for facebook links" and also "Has reach maximum limit of 500 for Google plus links" etc

So is this also a major cause for our production server restart delays?

And applying the Link Checker fix (https://helpx.adobe.com/experience-manager/kb/prevent-rapid-repository-growth-caused-by-linkchecker-...) will resolve our issues?

Please let me know.

 

Thanks

Avatar

Former Community Member

Thanks for your response.

Its really helpfull.

 

When i tried to manually optimize the TAR optimization process by follow the below steps, I am getting the following error.

Manually optimizing tar files using the JMX Console

  1. Open the CQ Web Console and click the JMX item in the Main menu (http://localhost:4502/system/console/jmx).

  2. Click the Repository MBean for the com.adobe.granite domain (http://localhost:4502/system/console/jmx/com.adobe.granite%3Atype%3DRepository).

  3. Click

        
    startTarOptimization()
    Start Tar PM optimization
  4. To stop the optimization process, click

        
    stopTarOptimization()
    Stop Tar PM optimization

 

ERROR:

javax.jcr.UnsupportedRepositoryOperationException at com.day.crx.sling.server.impl.jmx.ManagedRepository.startTarOptimization(ManagedRepository.java:201) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:75) at sun.reflect.GeneratedMethodAccessor185.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:279) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at javax.management.StandardMBean.invoke(StandardMBean.java:405) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at com.adobe.granite.jmx.internal.JMXConsolePlugin.invoke(JMXConsolePlugin.java:176) at com.adobe.granite.jmx.internal.JMXConsolePlugin.doPost(JMXConsolePlugin.java:134) at javax.servlet.http.HttpServlet.service(HttpServlet.java:641) at javax.servlet.http.HttpServlet.service(HttpServlet.java:722) at org.apache.felix.webconsole.internal.servlet.OsgiManager.service(OsgiManager.java:526) at org.apache.felix.webconsole.internal.servlet.OsgiManager.service(OsgiManager.java:450) at org.apache.felix.http.base.internal.handler.ServletHandler.doHandle(ServletHandler.java:339) at org.apache.felix.http.base.internal.handler.ServletHandler.handle(ServletHandler.java:300) at org.apache.felix.http.base.internal.dispatch.ServletPipeline.handle(ServletPipeline.java:93) at org.apache.felix.http.base.internal.dispatch.InvocationFilterChain.doFilter(InvocationFilterChain.java:50) at org.apache.felix.http.base.internal.dispatch.HttpFilterChain.doFilter(HttpFilterChain.java:31) at org.apache.sling.i18n.impl.I18NFilter.doFilter(I18NFilter.java:128) at org.apache.felix.http.base.internal.handler.FilterHandler.doHandle(FilterHandler.java:108) at org.apache.felix.http.base.internal.handler.FilterHandler.handle(FilterHandler.java:80) at org.apache.felix.http.base.internal.dispatch.InvocationFilterChain.doFilter(InvocationFilterChain.java:46) at org.apache.felix.http.base.internal.dispatch.HttpFilterChain.doFilter(HttpFilterChain.java:31) at com.adobe.granite.license.impl.LicenseCheckFilter.doFilter(LicenseCheckFilter.java:300) at org.apache.felix.http.base.internal.handler.FilterHandler.doHandle(FilterHandler.java:108) at org.apache.felix.http.base.internal.handler.FilterHandler.handle(FilterHandler.java:80) at org.apache.felix.http.base.internal.dispatch.InvocationFilterChain.doFilter(InvocationFilterChain.java:46) at org.apache.felix.http.base.internal.dispatch.HttpFilterChain.doFilter(HttpFilterChain.java:31) at org.apache.felix.http.sslfilter.internal.SslFilter.doFilter(SslFilter.java:55) at org.apache.felix.http.base.internal.handler.FilterHandler.doHandle(FilterHandler.java:108) at org.apache.felix.http.base.internal.handler.FilterHandler.handle(FilterHandler.java:80) at org.apache.felix.http.base.internal.dispatch.InvocationFilterChain.doFilter(InvocationFilterChain.java:46) at org.apache.felix.http.base.internal.dispatch.HttpFilterChain.doFilter(HttpFilterChain.java:31) at org.apache.felix.http.sslfilter.internal.SslFilter.doFilter(SslFilter.java:89) at org.apache.felix.http.base.internal.handler.FilterHandler.doHandle(FilterHandler.java:108) at org.apache.felix.http.base.internal.handler.FilterHandler.handle(FilterHandler.java:80) at org.apache.felix.http.base.internal.dispatch.InvocationFilterChain.doFilter(InvocationFilterChain.java:46) at org.apache.felix.http.base.internal.dispatch.HttpFilterChain.doFilter(HttpFilterChain.java:31) at org.apache.sling.security.impl.ReferrerFilter.doFilter(ReferrerFilter.java:290) at org.apache.felix.http.base.internal.handler.FilterHandler.doHandle(FilterHandler.java:108) at org.apache.felix.http.base.internal.handler.FilterHandler.handle(FilterHandler.java:80) at org.apache.felix.http.base.internal.dispatch.InvocationFilterChain.doFilter(InvocationFilterChain.java:46) at org.apache.felix.http.base.internal.dispatch.HttpFilterChain.doFilter(HttpFilterChain.java:31) at org.apache.sling.featureflags.impl.FeatureManager.doFilter(FeatureManager.java:115) at org.apache.felix.http.base.internal.handler.FilterHandler.doHandle(FilterHandler.java:108) at org.apache.felix.http.base.internal.handler.FilterHandler.handle(FilterHandler.java:80) at org.apache.felix.http.base.internal.dispatch.InvocationFilterChain.doFilter(InvocationFilterChain.java:46) at org.apache.felix.http.base.internal.dispatch.HttpFilterChain.doFilter(HttpFilterChain.java:31) at org.apache.sling.engine.impl.log.RequestLoggerFilter.doFilter(RequestLoggerFilter.java:75) at org.apache.felix.http.base.internal.handler.FilterHandler.doHandle(FilterHandler.java:108) at org.apache.felix.http.base.internal.handler.FilterHandler.handle(FilterHandler.java:80) at org.apache.felix.http.base.internal.dispatch.InvocationFilterChain.doFilter(InvocationFilterChain.java:46) at org.apache.felix.http.base.internal.dispatch.HttpFilterChain.doFilter(HttpFilterChain.java:31) at org.apache.felix.http.base.internal.dispatch.FilterPipeline.dispatch(FilterPipeline.java:76) at org.apache.felix.http.base.internal.dispatch.Dispatcher.dispatch(Dispatcher.java:49) at org.apache.felix.http.base.internal.DispatcherServlet.service(DispatcherServlet.java:67) at javax.servlet.http.HttpServlet.service(HttpServlet.java:722) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:684) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:501) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:229) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:370) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:644) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667) at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745)

 

Any help in this regards is really helpfull.

Thanks

Avatar

Level 2

There is no information on the AEM version, but looking at the errors referencing Oak segment store it is a 6.0 with TarMK. Tar optimize does not work for TarMK, it is for TarPM, so the error from tar optimize is expected, though a more useful message would have been more helpful. As suggested already, opening a Daycare ticket is your best option.

Offline compaction may work.  Have a look at: http://docs.adobe.com/docs/en/aem/6-0/deploy/upgrade/microkernels-in-aem-6-0.html  

Avatar

Level 10

Link checker will help to certain extent. But as @Sham mentioned this needs more analysis and please do file a daycare ticket

Avatar

Correct answer by
Community Advisor

then I would suggest take the repository backup and re-index the repository, I am not 100% sure but it might fix your TAR related issues, follow the article to re-index your Repo - http://www.wemblog.com/2011/12/how-to-rebuild-index-in-cq5-wem.html

- Runal