Expand my Community achievements bar.

Don’t miss the AEM Skill Exchange in SF on Nov 14—hear from industry leaders, learn best practices, and enhance your AEM strategy with practical tips.
SOLVED

AEM 6.5 Upgrade to 6.5.7 CFP Causing Unresponsive Instances

Avatar

Level 5

We've recently upgraded AEM from 6.5.5 to 6.5.7 (didn't install 6.5.6) - running on Ubuntu 18.04.5, Java 11.

We have been pretty consistent the last 8-9 months after making some performance adjustments in migrating from 6.3 windows to 6.5 Unbuntu.  Rarely having to restart.  However, since we've upgraded from SP 6.5.5 to 6.5.7 (we bipassed 6.5.6 - which SHOULDN'T matter), we've noticed really poor performance, author instance needing restarts nearly every 2-3 days and publish instances 5-6 days because of becoming unresponsive.  No heap dumps or OutOfMemory exceptions.

I'm just wondering if anybody else might know of any obvious reasons this could be happening before I bog myself down in a sea of heapdumps and threaddumps.

Topics

Topics help categorize Community content and increase your ability to discover relevant content.

6.5
1 Accepted Solution

Avatar

Correct answer by
Level 3

@sdouglasmc we also got the same package from Adobe, it resolved our issue.

Notes from adobe:

Checking the thread dumps and further researching internally, It seems you are running into a known issue(CQ-4312194) with SP7 where numerous threads get blocked due to a Timer with the Component Registry (org.apache.felix.scr.impl.ComponentRegistry). This causes the instance to become unresponsive.

  • Follow the steps below to resolve the issue:
    - Install the attached hotfix package
    - This will trigger a restart of a couple of bundles. So, need to wait 3-5 mins
    - Go to <host>:<port>/system/console and make sure the "org.apache.felix.scr" version is updated to 2.1.20

View solution in original post

32 Replies

Avatar

Employee Advisor

Can you check in the logs if there are any session leaks or if there are any slow unresponsive queries ? Do you see any errors in the logs ?  What do you see in health check dashboards ? Any patterns of high memory or CPU consumptions or disk utilizations ?

Avatar

Level 5
My question is more around trying to understand if anybody else has experienced anything of the sort after upgrading to the latest service pack. CPU usage is just fine. Typically before it would ride higher on large activations and large asset uploads but that is about it.

Avatar

Employee Advisor
Just got to know that someone is seeing similar performance problem with 6.5.7 running on RHEL OS. They have logged a ticket with Adobe support.

Avatar

Level 2

Can you check threaddumps. I've noticed AEM 6.5.7 hanging on following deadlock on my environments:

   java.lang.Thread.State: WAITING (on object monitor)
        at java.base@11.0.9/jdk.internal.misc.Unsafe.park(Native Method)
        - waiting to lock <0x1d24408f> (a java.util.concurrent.CountDownLatch$Sync) owned by "null" tid=0x-1
        at java.base@11.0.9/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
        at java.base@11.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885)
        at java.base@11.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1039)
        at java.base@11.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1345)
        at java.base@11.0.9/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:232)
        at org.apache.felix.framework.ServiceRegistry.getService(ServiceRegistry.java:365)

Avatar

Level 5
They've sent me a Hotfix-CQ-4312194-FELIX-6252-1.0.zip. I've installed it but I'm not sure if it is working or not. Lower amount of traffic but we were getting killed in publish after replications as well when the dispatcher would flush: now.

Avatar

Level 5
Supposedly, Felix issue FELIX-6252  My production author touch UI sites screen now takes about 1.2s per entry to load. so 40 items takes about 48s. Messing around with Felix framework should not be a service pack fix, this version stuff IMO - 6.6.0.

Avatar

Level 5
Another thing I've noticed since the upgrade is the package manager will keep displaying "Updating package" when creating a new package. Refreshing the screen shows the package was created as do the logs.

Avatar

Level 2
Thanks, looks like FELIX-6252 is what I'm observing in our environments. I'm raising support ticket for hotfix.

Avatar

Level 5
Please keep me updated as to how it works for you, please?

Avatar

Level 2
So what about all other customers?.. there is nothing mentioned on the adobe service pack release page. So we all should run against this problem and then ask a hotfix package from Support?..

Avatar

Level 5
royteeuwen, you running into the same issues?

Avatar

Level 2
Not yet, we are waiting for upgrading because I saw this and the same remark in AEM Tech slack from another company

Avatar

Level 5
So the touch screen issue was totally my fault it seems. In order to debug the initial problem, I created a logger on SessionDelegate set to Debug. That was not good. Removing it, speeds things up tremendously. That being said, the hotfix seems to fix the initial problem.

Avatar

Level 2

Here are my observations after multiple rounds of testing:

  1. Fix seems to work, did not manage to replicate the issue
  2. Deployment of hotfix reloads a lot of bundles (go figure, it's felix) and takes significant amount of time/unavailability
  3. I had one case when the original issue managed to present itself after installation of 6.5.7 before hotfix got applied on machine without a load.

Avatar

Level 5
We have 1 author, 3 publish instances. I employed the hotfix on author and 2 publish. We never experienced the issue again on all updated systems, but DID again on the UN-hotfixed. After hot-fixing, no issues on that one also. Hotfix took ~7 minutes to stabilize. Sounds like we rowed the same boat. Good to know!

Avatar

Employee Advisor
Thanks guys. We got the same issue in our environment yesterday. One of the publish instance went unresponsive and thread dumps point to FELIX-6252. Adobe has provided the same hotfix to us. Please let us know if you find any issues with the hotfix in your testing. It seems all AEM installations are impacted with this and Adobe should update their 6.5.7 release page.

Avatar

Level 2
We are in the same boat. And Adobe is not confirming if the hotfix will actually resolve the problem. They told us some customers are experiencing problems even after the hotfix. Can any of you confirm if the hotfix managed to stabilize the system and we are not seeing deadlock anymore??