Hi guys,
Over the past weekend our production author instance became mostly unresponsive. It was odd, we could log in to crx console but not sites, assets, or system/console. It didn't render pages either.
Restarting the instance brought it back up and seems okay now, but obviously, this makes us a bit nervous. This could have been bad news if it were a publish that got into this state. Potentially relevant logs are posted at the end. I can post more logs if needed.
We are currently implementing jmx monitoring through SolarWinds and would like to make sure that we set alerts appropriately so that we can catch such issues but there are more than a thousand mbeans
What are best practices for what to monitor from a jmx perspective? We already have pretty good system level monitoring set up through Dynatrace.
Any recommendations would be much appreciated!
2021-06-07 11:13:10,731 *ERROR* [FelixStartLevel] com.adobe.granite.cors bundle com.adobe.granite.cors:1.0.10.CQ650-B0002 (237)[com.adobe.granite.cors.impl.CORSPolicyImpl(745)] : The activate method has thrown an exception (org.osgi.service.component.ComponentException: Support Credentials is not allowed when Origin is set to Any (*).) org.osgi.service.component.ComponentException: Support Credentials is not allowed when Origin is set to Any (*). at com.adobe.granite.cors.impl.CORSPolicyImpl.activate(CORSPolicyImpl.java:204) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.apache.felix.scr.impl.inject.methods.BaseMethod.invokeMethod(BaseMethod.java:228) at org.apache.felix.scr.impl.inject.methods.BaseMethod.access$500(BaseMethod.java:41) at org.apache.felix.scr.impl.inject.methods.BaseMethod$Resolved.invoke(BaseMethod.java:664) at org.apache.felix.scr.impl.inject.methods.BaseMethod.invoke(BaseMethod.java:510)
021-06-07 11:13:28,397 *ERROR* [FelixDispatchQueue] org.apache.felix.http.jetty FrameworkEvent ERROR (org.osgi.framework.ServiceException: Service factory returned null. (Component: com.adobe.granite.cors.impl.CORSFilter (744))) org.osgi.framework.ServiceException: Service factory returned null. (Component: com.adobe.granite.cors.impl.CORSFilter (744)) at org.apache.felix.framework.ServiceRegistrationImpl.getFactoryUnchecked(ServiceRegistrationImpl.java:381) at org.apache.felix.framework.ServiceRegistrationImpl.getService(ServiceRegistrationImpl.java:248) at org.apache.felix.framework.ServiceRegistry.getService(ServiceRegistry.java:350) at org.apache.felix.framework.Felix.getService(Felix.java:3954) at org.apache.felix.framework.BundleContextImpl$ServiceObjectsImpl.getService(BundleContextImpl.java:554)
2021-06-07 11:13:52,689 *ERROR* [FelixStartLevel] com.github.mickleroy.aem-sass-compiler bundle com.github.mickleroy.aem-sass-compiler:1.0.3 (617)[com.github.mickleroy.aem.sass.impl.SassCompilerImpl(4123)] : The activate method has thrown an exception (java.lang.UnsatisfiedLinkError: /hab/svc/author/data/tmp/libjsass-11549228781875460003/libjsass.so: libstdc++.so.6: cannot open shared object file: No such file or directory) java.lang.UnsatisfiedLinkError: /hab/svc/author/data/tmp/libjsass-11549228781875460003/libjsass.so: libstdc++.so.6: cannot open shared object file: No such file or directory at java.base/java.lang.ClassLoader$NativeLibrary.load0(Native Method) at java.base/java.lang.ClassLoader$NativeLibrary.load(ClassLoader.java:2430) at java.base/java.lang.ClassLoader$NativeLibrary.loadLibrary(ClassLoader.java:2487) at java.base/java.lang.ClassLoader.loadLibrary0(ClassLoader.java:2684)
2021-06-07 11:13:49,778 *ERROR* [FelixStartLevel] com.adobe.cq.dam.bp.cloudconfig.impl.MediaPortalCloudConfigurationListener exception occured in copying existing replication agents javax.jcr.AccessDeniedException: OakAccess0000: Access denied at org.apache.jackrabbit.oak.api.CommitFailedException.asRepositoryException(CommitFailedException.java:232) at org.apache.jackrabbit.oak.api.CommitFailedException.asRepositoryException(CommitFailedException.java:213) at org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.newRepositoryException(SessionDelegate.java:669) at org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate.save(SessionDelegate.java:495)
2021-06-07 11:13:46,184 *ERROR* [FelixStartLevel] com.adobe.acs.acs-aem-commons-bundle bundle com.adobe.acs.acs-aem-commons-bundle:4.8.4 (580)[com.adobe.acs.commons.replication.packages.automatic.impl.ConfigurationUpdateListener(3481)] : The activate method has thrown an exception (java.lang.NullPointerException) java.lang.NullPointerException: null at com.adobe.acs.commons.util.ResourceServiceManager.refreshCache(ResourceServiceManager.java:142) at com.adobe.acs.commons.util.ResourceServiceManager.activate(ResourceServiceManager.java:74) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.apache.felix.scr.impl.inject.methods.BaseMethod.invokeMethod(BaseMethod.java:228) at org.apache.felix.scr.impl.inject.methods.BaseMethod.access$500(BaseMethod.java:41)
2021-06-07 11:13:46,184 *ERROR* [FelixStartLevel] com.adobe.acs.commons.replication.packages.automatic.impl.ConfigurationUpdateListener Exception allocating resource resolver org.apache.sling.api.resource.LoginException: Cannot derive user name for bundle com.adobe.acs.acs-aem-commons-bundle [580] and sub service automatic-package-replicator at org.apache.sling.resourceresolver.impl.ResourceResolverFactoryImpl.getServiceResourceResolver(ResourceResolverFactoryImpl.java:79) at com.adobe.acs.commons.replication.packages.automatic.impl.ConfigurationUpdateListener.getResourceResolver(ConfigurationUpdateListener.java:97) at com.adobe.acs.commons.replication.packages.automatic.impl.ConfigurationUpdateListener.getResourceResolver(ConfigurationUpdateListener.java:107) at com.adobe.acs.commons.util.ResourceServiceManager.refreshCache(ResourceServiceManager.java:140) at com.adobe.acs.commons.util.ResourceServiceManager.activate(ResourceServiceManager.java:74)
Solved! Go to Solution.
Topics help categorize Community content and increase your ability to discover relevant content.
Views
Replies
Total Likes
Hi @jkpanera!
From my perspective, there are two different areas to look at:
Performing the root cause analysis
When it comes to production outages, it's always a bit hard to do a proper root cause analysis as you don't want to further affect your services uptime and performance (e. g. by increasing log levels, attaching debuggers or other analysis tools). So the ideal scenario is that you can somehow reproduce the behavior/issue on a lower environment or a production clone. Is this a one-time issue or is it reoccurring?
First touch points for an analysis should be:
With Dynatrace in place, you should have deep insight into the JVM and the application and probably have a good starting point for your root cause analysis.
Looking at the error messages from your logs, I'm not sure if these are related in terms of a cause or if they are a consequence of the outage.
Looking at the monitoring part of your question, you should first identify the root cause and add according monitoring probes that will notify you on all aspects that initially may have lead to the outage. There is no one-size-fits-all monitoring concept as most outages are - in my experience - not caused by AEM product code but by projects custom application development inside the AEM framework/stack.
However, I can provide a generic list of monitoring points that I usually recommend:
Apart from certain JMX checks, please also take a look at the existing and custom Sling Health Checks for in-depth application monitoring. See also Building Health Checks for AEM and this article for further details.
Hope that helps!
@jkpanera What version of AEM you are running ?, Can you please check below community article if the issue mentioned there is causing your instance to become unresponsive.
https://aem4beginner.blogspot.com/aem-65-upgrade-to-657-cfp-causing
Hi @jkpanera!
From my perspective, there are two different areas to look at:
Performing the root cause analysis
When it comes to production outages, it's always a bit hard to do a proper root cause analysis as you don't want to further affect your services uptime and performance (e. g. by increasing log levels, attaching debuggers or other analysis tools). So the ideal scenario is that you can somehow reproduce the behavior/issue on a lower environment or a production clone. Is this a one-time issue or is it reoccurring?
First touch points for an analysis should be:
With Dynatrace in place, you should have deep insight into the JVM and the application and probably have a good starting point for your root cause analysis.
Looking at the error messages from your logs, I'm not sure if these are related in terms of a cause or if they are a consequence of the outage.
Looking at the monitoring part of your question, you should first identify the root cause and add according monitoring probes that will notify you on all aspects that initially may have lead to the outage. There is no one-size-fits-all monitoring concept as most outages are - in my experience - not caused by AEM product code but by projects custom application development inside the AEM framework/stack.
However, I can provide a generic list of monitoring points that I usually recommend:
Apart from certain JMX checks, please also take a look at the existing and custom Sling Health Checks for in-depth application monitoring. See also Building Health Checks for AEM and this article for further details.
Hope that helps!
Views
Replies
Total Likes
Views
Likes
Replies