We are frequently experiencing the High CPU usage Issue with the AEM instances. Please find the Below Description.
1.We have Production Environment where we configured 4 Publish Instances and 2 Author Instances.
2.We configured AEM on App nodes.
3.Frequently ,we are Experiencing High CPU usage Load i.e CPU usage is Hitting 98% on one of the Publish instances and remaining for 10 hours nearly.
Also,at the same time we are not seeing any load on other Publish Instances.
4.We verified the Dispatcher level configurations and logs also,we are seeing normal request processing and there are no errors in Error logs.
5.While Analyzing the Threads Dumps for that Time Instance we are seeing below threads in blocked state.
Thread dump 6/10 "ApacheSlingdefault_QuartzSchedulerThread" prio=5 tid=0x298 nid=0xffffffff waiting for monitor entry java.lang.Thread.State: BLOCKED at java.lang.Object.wait(Native Method) - waiting to lock <0x39d83e0e> (a java.lang.Object) owned by "null" tid=0x-1 at org.quartz.core.QuartzSchedulerThread.run(QuartzSchedulerThread.java:311)
Thread dump 9/10
"sling-default-28" prio=5 tid=0x104d nid=0xffffffff waiting for monitor entry java.lang.Thread.State: BLOCKED at sun.misc.Unsafe.park(Native Method) - waiting to lock <0x2a446ba5> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) owned by "null" tid=0x-1 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
Could anyone please Suggest any solution for this issue.
BLOCKED threads are symptoms of a problem not the problem itself.
You need to look at the RUNNABLE threads over a series of thread dump captures to understand what's causing resource contention.
Also use a better way to capture the thread dumps (like the Jstack series script @JaideepBrar suggested). Whenever you see the TID as 0xffffffff it implies you used the same JVM thats buggered to capture the thread dumps. You need to launch jstack using a separate jvm than the one running AEM.