Memory Leak Analysis | Community
Skip to main content
Level 3
May 25, 2017
Solved

Memory Leak Analysis

  • May 25, 2017
  • 13 replies
  • 11935 views

Hello,

We've had memory leak issues for a while that will eventually push our publishers into 100% GC after a week or two. Up until now we've just been ignoring them because it's easier to restart the publisher but i would like to really understand how to troubleshoot this. We have a heap dump but what i'm seeing doesn't seem that helpful. All of the classes it references are framework classes, so i'm not sure how to proceed in finding the actual cause of the leak in our code. Below is the main leak suspect "HttpListener" loaded by "BundleWiringImpl" -

There are hundreds of these instances, each with a URL that is called by the end user. The below example is a keepalive call to a static html page, so none of our custom code should even be running.

Does anyone have suggestions on how to proceed here? Every time we take a heap dump the problem suspects are from "org.apache.felix.framework.BundleWiringImpl$BundleClassLoaderJava", "com.day.j2ee.servletengine.HttpListener", and "com.day.j2ee.servletengine.ServletHandlerImpl".

We are still on 5.6.1

 

Thanks

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.
Best answer by smacdonald2008

I would recommend opening a ticket - there may be a required hotfix. Looks like some sort of bug and the support team can help. 

13 replies

jocampAuthor
Level 3
May 25, 2017

Thanks for that, i did read through those earlier but they seem to end about where i am now. The examples i've found all have obvious leak suspects, such as a custom class, so i'm not sure how to handle the leak suspects being part of CQ's framework classes. It just seems like all HTTP/Servlet calls are causing memory leaks.

smacdonald2008
smacdonald2008Accepted solution
Level 10
May 25, 2017

I would recommend opening a ticket - there may be a required hotfix. Looks like some sort of bug and the support team can help. 

jocampAuthor
Level 3
May 25, 2017

Ok, we will do that then. Thank you!

MC_Stuff
Level 10
May 26, 2017

Hi,

You are connecting to external service which does not have timeout configured. Make sure your external access have timeout set.

Thanks,

jocampAuthor
Level 3
May 26, 2017

Hm, that's definitely a possibility because we do connect to multiple other systems. What did you see that led you to that conclusion?

varunmitra
Adobe Employee
Adobe Employee
May 26, 2017

It is obvious from the screenshot you have uploaded.

Closetion.123I.html and ic_kal.html seem to be retaining a lot of heap space. it's likely these requests are getting stuck. you might also want to verify the underlying page component.

there are several online documents that you can refer to for analyzing heap dumps. the one below has helped me numerous times in resolving memory leak issues. hopefully it would serve you the same.

http://docwiki.cisco.com/wiki/How_to_analyze_heap_dumps 

jocampAuthor
Level 3
May 26, 2017

I'll take a look at that document, thanks.

For ic_kal.html, it is literally an html file stored under /content with - "<html><body>OK</body></html>", so there is no template or underlying page component that i'm aware of. That's why i chose this as an example, because it shouldn't be doing any processing besides serving the doc. However, i'm guessing if there are other pages causing issues this one could have just been caught after we were already at 100% GC

MC_Stuff
Level 10
May 27, 2017

jocamp wrote...

Hm, that's definitely a possibility because we do connect to multiple other systems. What did you see that led you to that conclusion?

 

http thread not released & it can be external connection most of time. Also based on experience since we have many backend integration with other legacy system line number & class are familiar.   If you can send heap dump easy to figure out culprit. Problem is heap dump will have all the data including some of your security environment info & hence not good idea to discuss on open forums. 

Level 2
May 27, 2017

I agree, external connection could be an issue here since Sling does return back HTTP to the thread pool. One quicker way to know all this will be taking 20 thread dumps every 500ms and then going through those which are running or in waiting state. Hopefully you will get something interesting there. However one thing I would like to know- what is the thread pool size configured in this server?