Your achievements

Level 1

0% to

Level 2

Tip /
Sign in

Sign in to Community

to gain points, level up, and earn exciting badges like the new
BedrockMission!

Learn more

View all

Sign in to view all badges

SOLVED

AEM 6.5 Upgrade to 6.5.7 CFP Causing Unresponsive Instances

sdouglasmc
Level 4
Level 4

We've recently upgraded AEM from 6.5.5 to 6.5.7 (didn't install 6.5.6) - running on Ubuntu 18.04.5, Java 11.

We have been pretty consistent the last 8-9 months after making some performance adjustments in migrating from 6.3 windows to 6.5 Unbuntu.  Rarely having to restart.  However, since we've upgraded from SP 6.5.5 to 6.5.7 (we bipassed 6.5.6 - which SHOULDN'T matter), we've noticed really poor performance, author instance needing restarts nearly every 2-3 days and publish instances 5-6 days because of becoming unresponsive.  No heap dumps or OutOfMemory exceptions.

I'm just wondering if anybody else might know of any obvious reasons this could be happening before I bog myself down in a sea of heapdumps and threaddumps.

AEM 6.5.7
1 Accepted Solution
narayana_chirra
Correct answer by
Level 3
Level 3

@sdouglasmc we also got the same package from Adobe, it resolved our issue.

Notes from adobe:

Checking the thread dumps and further researching internally, It seems you are running into a known issue(CQ-4312194) with SP7 where numerous threads get blocked due to a Timer with the Component Registry (org.apache.felix.scr.impl.ComponentRegistry). This causes the instance to become unresponsive.

  • Follow the steps below to resolve the issue:
    - Install the attached hotfix package
    - This will trigger a restart of a couple of bundles. So, need to wait 3-5 mins
    - Go to <host>:<port>/system/console and make sure the "org.apache.felix.scr" version is updated to 2.1.20

View solution in original post

32 Replies
kunal23
Level 10
Level 10

Can you check in the logs if there are any session leaks or if there are any slow unresponsive queries ? Do you see any errors in the logs ?  What do you see in health check dashboards ? Any patterns of high memory or CPU consumptions or disk utilizations ?

sdouglasmc
Level 4
Level 4
My question is more around trying to understand if anybody else has experienced anything of the sort after upgrading to the latest service pack. CPU usage is just fine. Typically before it would ride higher on large activations and large asset uploads but that is about it.
kunal23
Level 10
Level 10
Just got to know that someone is seeing similar performance problem with 6.5.7 running on RHEL OS. They have logged a ticket with Adobe support.
Jakub_G
Level 2
Level 2

Can you check threaddumps. I've noticed AEM 6.5.7 hanging on following deadlock on my environments:

   java.lang.Thread.State: WAITING (on object monitor)
        at java.base@11.0.9/jdk.internal.misc.Unsafe.park(Native Method)
        - waiting to lock <0x1d24408f> (a java.util.concurrent.CountDownLatch$Sync) owned by "null" tid=0x-1
        at java.base@11.0.9/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
        at java.base@11.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885)
        at java.base@11.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1039)
        at java.base@11.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1345)
        at java.base@11.0.9/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:232)
        at org.apache.felix.framework.ServiceRegistry.getService(ServiceRegistry.java:365)
sdouglasmc
Level 4
Level 4
They've sent me a Hotfix-CQ-4312194-FELIX-6252-1.0.zip. I've installed it but I'm not sure if it is working or not. Lower amount of traffic but we were getting killed in publish after replications as well when the dispatcher would flush: now.
sdouglasmc
Level 4
Level 4
Supposedly, Felix issue FELIX-6252  My production author touch UI sites screen now takes about 1.2s per entry to load. so 40 items takes about 48s. Messing around with Felix framework should not be a service pack fix, this version stuff IMO - 6.6.0.
sdouglasmc
Level 4
Level 4
Another thing I've noticed since the upgrade is the package manager will keep displaying "Updating package" when creating a new package. Refreshing the screen shows the package was created as do the logs.
Jakub_G
Level 2
Level 2
Thanks, looks like FELIX-6252 is what I'm observing in our environments. I'm raising support ticket for hotfix.
sdouglasmc
Level 4
Level 4
Please keep me updated as to how it works for you, please?
royteeuwen
Level 2
Level 2
So what about all other customers?.. there is nothing mentioned on the adobe service pack release page. So we all should run against this problem and then ask a hotfix package from Support?..
royteeuwen
Level 2
Level 2
Not yet, we are waiting for upgrading because I saw this and the same remark in AEM Tech slack from another company
sdouglasmc
Level 4
Level 4
So the touch screen issue was totally my fault it seems. In order to debug the initial problem, I created a logger on SessionDelegate set to Debug. That was not good. Removing it, speeds things up tremendously. That being said, the hotfix seems to fix the initial problem.
Jakub_G
Level 2
Level 2

Here are my observations after multiple rounds of testing:

  1. Fix seems to work, did not manage to replicate the issue
  2. Deployment of hotfix reloads a lot of bundles (go figure, it's felix) and takes significant amount of time/unavailability
  3. I had one case when the original issue managed to present itself after installation of 6.5.7 before hotfix got applied on machine without a load.
sdouglasmc
Level 4
Level 4
We have 1 author, 3 publish instances. I employed the hotfix on author and 2 publish. We never experienced the issue again on all updated systems, but DID again on the UN-hotfixed. After hot-fixing, no issues on that one also. Hotfix took ~7 minutes to stabilize. Sounds like we rowed the same boat. Good to know!
kunal23
Level 10
Level 10
Thanks guys. We got the same issue in our environment yesterday. One of the publish instance went unresponsive and thread dumps point to FELIX-6252. Adobe has provided the same hotfix to us. Please let us know if you find any issues with the hotfix in your testing. It seems all AEM installations are impacted with this and Adobe should update their 6.5.7 release page.
sanjeevkumart45
Level 2
Level 2
We are in the same boat. And Adobe is not confirming if the hotfix will actually resolve the problem. They told us some customers are experiencing problems even after the hotfix. Can any of you confirm if the hotfix managed to stabilize the system and we are not seeing deadlock anymore??