We've recently upgraded AEM from 6.5.5 to 6.5.7 (didn't install 6.5.6) - running on Ubuntu 18.04.5, Java 11.
We have been pretty consistent the last 8-9 months after making some performance adjustments in migrating from 6.3 windows to 6.5 Unbuntu. Rarely having to restart. However, since we've upgraded from SP 6.5.5 to 6.5.7 (we bipassed 6.5.6 - which SHOULDN'T matter), we've noticed really poor performance, author instance needing restarts nearly every 2-3 days and publish instances 5-6 days because of becoming unresponsive. No heap dumps or OutOfMemory exceptions.
I'm just wondering if anybody else might know of any obvious reasons this could be happening before I bog myself down in a sea of heapdumps and threaddumps.
@sdouglasmc we also got the same package from Adobe, it resolved our issue.
Notes from adobe:
Checking the thread dumps and further researching internally, It seems you are running into a known issue(CQ-4312194) with SP7 where numerous threads get blocked due to a Timer with the Component Registry (org.apache.felix.scr.impl.ComponentRegistry). This causes the instance to become unresponsive.
Follow the steps below to resolve the issue: - Install the attached hotfix package - This will trigger a restart of a couple of bundles. So, need to wait 3-5 mins - Go to <host>:<port>/system/console and make sure the "org.apache.felix.scr" version is updated to 2.1.20
Can you check in the logs if there are any session leaks or if there are any slow unresponsive queries ? Do you see any errors in the logs ? What do you see in health check dashboards ? Any patterns of high memory or CPU consumptions or disk utilizations ?
Can you check threaddumps. I've noticed AEM 6.5.7 hanging on following deadlock on my environments:
java.lang.Thread.State: WAITING (on object monitor)
at firstname.lastname@example.org/jdk.internal.misc.Unsafe.park(Native Method)
- waiting to lock <0x1d24408f> (a java.util.concurrent.CountDownLatch$Sync) owned by "null" tid=0x-1