Expand my Community achievements bar.

Introducing Adobe LLM Optimizer: Own your brand’s presence in AI-Powered search and discovery

Service Outage: Looks like we are having some issues with our service. We are working hard to bring it online again.

Avatar

Level 1

Why is my AEM QA instance experiencing service outages when I invoke a servlet that exports image paths to an Excel file, especially when large content trees are involved.

Error  --   Service Outage: Looks like we are having some issues with our service. We are working hard to bring it online again.

 

Context & What I’m Trying to Achieve:

I have developed a utility tool in AEM that:  

• Accepts a page path input from a custom UI (utility tool in AEM console).
• Hits a servlet (/bin/export-image-paths).
• This servlet recursively traverses the page and its child pages, reads all components, extracts image paths from each component’s properties
• Checks if the image exists in DAM.
• Writes the results into an Excel file using XSSFWorkbook.
• Returns the Excel file as a download.

 

Requirements/Expectations:

• The tool should support any valid AEM page path (e.g., /content/wknd/us/en/TestPages) etc.
• The servlet should work reliably in Dev, QA, and Stage environments.
• The Excel file should be generated and downloaded without memory issues or long processing delays — even for large content structures or experience fragments.
 

Current Problem:

• In the QA environment, when I pass a path with a large number of child pages or components (e.g., XF structure or /us/en/services), the servlet eventually leads to:
• Service outage after processing in longer time duration
• Errors like: org.apache.poi.openxml4j.exceptions.OPCPackage$PackageIOException. (docProps/app.xml)

 

 

 

 

 

 

Topics

Topics help categorize Community content and increase your ability to discover relevant content.

3 Replies

Avatar

Community Advisor

Hi @yearahull,

Looks like your servlet is recursively traversing large content trees and generating Excel files in memory using Apache POI (XSSFWorkbook). This leads to excessive memory usage, long processing times, and ultimately service outages, especially with large XF or deep page hierarchies.

What might causing this:

Memory Overload with Apache POI (XSSFWorkbook)

  • XSSFWorkbook loads entire Excel data into memory, including styles, rows, and document metadata like docProps/app.xml.

  • For large content trees (hundreds/thousands of components/pages), this can exceed your JVM heap and trigger:

    • java.lang.OutOfMemoryError

    • OPCPackage$PackageIOException (related to writing docProps/app.xml or similar internal files).

Long-Running Servlet Execution

  • Recursive page traversal + DAM lookups + Excel generation = long-running task.

  • In AEM as a Cloud Service or resource-constrained environments (like QA), long requests can be:

    • Interrupted

    • Killed by dispatcher/firewall/load balancer timeouts


Could you try fixing this by Switching it to Streaming Workbook (SXSSFWorkbook)?

Apache POI provides SXSSFWorkbook as a streaming alternative to XSSFWorkbook for large files.

SXSSFWorkbook workbook = new SXSSFWorkbook(100); // keep 100 rows in memory
workbook.setCompressTempFiles(true); // optional
  • Writes data to disk as it goes, drastically lowering memory use.

  • Keeps only a small number of rows in memory.

  • Avoids docProps/app.xml errors due to memory overload.

SXSSFWorkbook – Apache POI documentation

Hope this helps!


Santosh Sai

AEM BlogsLinkedIn


Avatar

Level 1

Hi @SantoshSai 

Thanks for your quick response on it but this didn't worked, still the same issue.

Avatar

Community Advisor

Hi @yearahull 

Could you please share how are you processing ? Checking every node and then checking if assets exists or not, its time consuming efforts and you would endup getting Gateway/Timeout error.

I think you need to change the approach.You have to create a CSV/Excel asynchronously and give an option to author to download from report page etc. Use reference search api to search for references for a page instead of traversing each node. Once you collect the nodes then just perform quick asset node/resource check operation.


 

Arun Patidar

AEM LinksLinkedIn