Your achievements

Level 1

0% to

Level 2

Tip /
Sign in

Sign in to Community

to gain points, level up, and earn exciting badges like the new
BedrockMission!

Learn More

View all

Sign in to view all badges

AEM 6.5 Upgrade to 6.5.7 CFP Causing Unresponsive Instances

Avatar

Avatar
Validate 1
Level 2
sdouglasmc
Level 2

Likes

17 likes

Total Posts

80 posts

Correct Reply

3 solutions
Top badges earned
Validate 1
Give Back
Ignite 1
Boost 5
Boost 3
View profile

Avatar
Validate 1
Level 2
sdouglasmc
Level 2

Likes

17 likes

Total Posts

80 posts

Correct Reply

3 solutions
Top badges earned
Validate 1
Give Back
Ignite 1
Boost 5
Boost 3
View profile
sdouglasmc
Level 2

18-12-2020

We've recently upgraded AEM from 6.5.5 to 6.5.7 (didn't install 6.5.6) - running on Ubuntu 18.04.5, Java 11.

We have been pretty consistent the last 8-9 months after making some performance adjustments in migrating from 6.3 windows to 6.5 Unbuntu.  Rarely having to restart.  However, since we've upgraded from SP 6.5.5 to 6.5.7 (we bipassed 6.5.6 - which SHOULDN'T matter), we've noticed really poor performance, author instance needing restarts nearly every 2-3 days and publish instances 5-6 days because of becoming unresponsive.  No heap dumps or OutOfMemory exceptions.

I'm just wondering if anybody else might know of any obvious reasons this could be happening before I bog myself down in a sea of heapdumps and threaddumps.

AEM 6.5.7

Accepted Solutions (1)

Accepted Solutions (1)

Avatar

Avatar
Boost 5
Level 2
narayana_chirra
Level 2

Likes

20 likes

Total Posts

14 posts

Correct Reply

1 solution
Top badges earned
Boost 5
Boost 3
Boost 10
Boost 1
Applaud 5
View profile

Avatar
Boost 5
Level 2
narayana_chirra
Level 2

Likes

20 likes

Total Posts

14 posts

Correct Reply

1 solution
Top badges earned
Boost 5
Boost 3
Boost 10
Boost 1
Applaud 5
View profile
narayana_chirra
Level 2

18-01-2021

@sdouglasmc we also got the same package from Adobe, it resolved our issue.

Notes from adobe:

Checking the thread dumps and further researching internally, It seems you are running into a known issue(CQ-4312194) with SP7 where numerous threads get blocked due to a Timer with the Component Registry (org.apache.felix.scr.impl.ComponentRegistry). This causes the instance to become unresponsive.

  • Follow the steps below to resolve the issue:
    - Install the attached hotfix package
    - This will trigger a restart of a couple of bundles. So, need to wait 3-5 mins
    - Go to <host>:<port>/system/console and make sure the "org.apache.felix.scr" version is updated to 2.1.20

Answers (4)

Answers (4)

Avatar

Avatar
Validate 1
Level 1
Jakub_G
Level 1

Likes

4 likes

Total Posts

9 posts

Correct Reply

0 solutions
Top badges earned
Validate 1
Boost 3
Boost 1
View profile

Avatar
Validate 1
Level 1
Jakub_G
Level 1

Likes

4 likes

Total Posts

9 posts

Correct Reply

0 solutions
Top badges earned
Validate 1
Boost 3
Boost 1
View profile
Jakub_G
Level 1

20-01-2021

For those who haven't updated to 6.5.7 yet, Adobe has a new SP7 installation package with Hotfix-CQ-4312194-FELIX-6252 included in it. Makes upgrade faster and more stable. This is not publicly available (yet) but you can probably ask for it in a ticket.

Avatar

Avatar
Boost 5
Level 2
narayana_chirra
Level 2

Likes

20 likes

Total Posts

14 posts

Correct Reply

1 solution
Top badges earned
Boost 5
Boost 3
Boost 10
Boost 1
Applaud 5
View profile

Avatar
Boost 5
Level 2
narayana_chirra
Level 2

Likes

20 likes

Total Posts

14 posts

Correct Reply

1 solution
Top badges earned
Boost 5
Boost 3
Boost 10
Boost 1
Applaud 5
View profile
narayana_chirra
Level 2

18-01-2021

@sdouglasmc is there any resolution or solution on this issue? we are also facing the same issue in our Prod environment.

Avatar

Avatar
Validate 10
MVP
kunal23
MVP

Likes

166 likes

Total Posts

565 posts

Correct Reply

172 solutions
Top badges earned
Validate 10
Validate 1
Ignite 3
Ignite 1
Give Back 50
View profile

Avatar
Validate 10
MVP
kunal23
MVP

Likes

166 likes

Total Posts

565 posts

Correct Reply

172 solutions
Top badges earned
Validate 10
Validate 1
Ignite 3
Ignite 1
Give Back 50
View profile
kunal23
MVP

18-12-2020

Can you check in the logs if there are any session leaks or if there are any slow unresponsive queries ? Do you see any errors in the logs ?  What do you see in health check dashboards ? Any patterns of high memory or CPU consumptions or disk utilizations ?

Avatar

Avatar
Validate 1
Level 1
Jakub_G
Level 1

Likes

4 likes

Total Posts

9 posts

Correct Reply

0 solutions
Top badges earned
Validate 1
Boost 3
Boost 1
View profile

Avatar
Validate 1
Level 1
Jakub_G
Level 1

Likes

4 likes

Total Posts

9 posts

Correct Reply

0 solutions
Top badges earned
Validate 1
Boost 3
Boost 1
View profile
Jakub_G
Level 1

27-12-2020

Can you check threaddumps. I've noticed AEM 6.5.7 hanging on following deadlock on my environments:

   java.lang.Thread.State: WAITING (on object monitor)
        at java.base@11.0.9/jdk.internal.misc.Unsafe.park(Native Method)
        - waiting to lock <0x1d24408f> (a java.util.concurrent.CountDownLatch$Sync) owned by "null" tid=0x-1
        at java.base@11.0.9/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
        at java.base@11.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885)
        at java.base@11.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1039)
        at java.base@11.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1345)
        at java.base@11.0.9/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:232)
        at org.apache.felix.framework.ServiceRegistry.getService(ServiceRegistry.java:365)
sdouglasmc
They've sent me a Hotfix-CQ-4312194-FELIX-6252-1.0.zip. I've installed it but I'm not sure if it is working or not. Lower amount of traffic but we were getting killed in publish after replications as well when the dispatcher would flush: now.
sdouglasmc
Supposedly, Felix issue FELIX-6252  My production author touch UI sites screen now takes about 1.2s per entry to load. so 40 items takes about 48s. Messing around with Felix framework should not be a service pack fix, this version stuff IMO - 6.6.0.
sdouglasmc
Another thing I've noticed since the upgrade is the package manager will keep displaying "Updating package" when creating a new package. Refreshing the screen shows the package was created as do the logs.
Jakub_G
Thanks, looks like FELIX-6252 is what I'm observing in our environments. I'm raising support ticket for hotfix.
sdouglasmc
Please keep me updated as to how it works for you, please?
royteeuwen
So what about all other customers?.. there is nothing mentioned on the adobe service pack release page. So we all should run against this problem and then ask a hotfix package from Support?..
sdouglasmc
royteeuwen, you running into the same issues?
royteeuwen
Not yet, we are waiting for upgrading because I saw this and the same remark in AEM Tech slack from another company
sdouglasmc
So the touch screen issue was totally my fault it seems. In order to debug the initial problem, I created a logger on SessionDelegate set to Debug. That was not good. Removing it, speeds things up tremendously. That being said, the hotfix seems to fix the initial problem.
Jakub_G

Here are my observations after multiple rounds of testing:

  1. Fix seems to work, did not manage to replicate the issue
  2. Deployment of hotfix reloads a lot of bundles (go figure, it's felix) and takes significant amount of time/unavailability
  3. I had one case when the original issue managed to present itself after installation of 6.5.7 before hotfix got applied on machine without a load.
sdouglasmc
We have 1 author, 3 publish instances. I employed the hotfix on author and 2 publish. We never experienced the issue again on all updated systems, but DID again on the UN-hotfixed. After hot-fixing, no issues on that one also. Hotfix took ~7 minutes to stabilize. Sounds like we rowed the same boat. Good to know!
kunal23
Thanks guys. We got the same issue in our environment yesterday. One of the publish instance went unresponsive and thread dumps point to FELIX-6252. Adobe has provided the same hotfix to us. Please let us know if you find any issues with the hotfix in your testing. It seems all AEM installations are impacted with this and Adobe should update their 6.5.7 release page.
sdouglasmc
sanjeevkumart45
We are in the same boat. And Adobe is not confirming if the hotfix will actually resolve the problem. They told us some customers are experiencing problems even after the hotfix. Can any of you confirm if the hotfix managed to stabilize the system and we are not seeing deadlock anymore??
Jakub_G
After installation of this fix, I haven't observed the deadlock anymore.
sanjeevkumart45
Thanks for confirming.