We've recently upgraded AEM from 6.5.5 to 6.5.7 (didn't install 6.5.6) - running on Ubuntu 18.04.5, Java 11.
We have been pretty consistent the last 8-9 months after making some performance adjustments in migrating from 6.3 windows to 6.5 Unbuntu. Rarely having to restart. However, since we've upgraded from SP 6.5.5 to 6.5.7 (we bipassed 6.5.6 - which SHOULDN'T matter), we've noticed really poor performance, author instance needing restarts nearly every 2-3 days and publish instances 5-6 days because of becoming unresponsive. No heap dumps or OutOfMemory exceptions.
I'm just wondering if anybody else might know of any obvious reasons this could be happening before I bog myself down in a sea of heapdumps and threaddumps.
@sdouglasmc we also got the same package from Adobe, it resolved our issue.
Notes from adobe:
Checking the thread dumps and further researching internally, It seems you are running into a known issue(CQ-4312194) with SP7 where numerous threads get blocked due to a Timer with the Component Registry (org.apache.felix.scr.impl.ComponentRegistry). This causes the instance to become unresponsive.
For those who haven't updated to 6.5.7 yet, Adobe has a new SP7 installation package with Hotfix-CQ-4312194-FELIX-6252 included in it. Makes upgrade faster and more stable. This is not publicly available (yet) but you can probably ask for it in a ticket.
Can you check threaddumps. I've noticed AEM 6.5.7 hanging on following deadlock on my environments:
java.lang.Thread.State: WAITING (on object monitor) at email@example.com/jdk.internal.misc.Unsafe.park(Native Method) - waiting to lock <0x1d24408f> (a java.util.concurrent.CountDownLatch$Sync) owned by "null" tid=0x-1 at firstname.lastname@example.org/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194) at email@example.com/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885) at firstname.lastname@example.org/java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1039) at email@example.com/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1345) at firstname.lastname@example.org/java.util.concurrent.CountDownLatch.await(CountDownLatch.java:232) at org.apache.felix.framework.ServiceRegistry.getService(ServiceRegistry.java:365)
Can you check in the logs if there are any session leaks or if there are any slow unresponsive queries ? Do you see any errors in the logs ? What do you see in health check dashboards ? Any patterns of high memory or CPU consumptions or disk utilizations ?
Great article, thanks !
Is this package available publicly now ?