Issue with Adobe CQ (AEM) 5.4 Repository - Help Required | Community
Skip to main content
wayne_licquoris
October 16, 2015
Solved

Issue with Adobe CQ (AEM) 5.4 Repository - Help Required

  • October 16, 2015
  • 4 replies
  • 1070 views

Has anyone else experienced the following issue with the CQ5.4 CRX2.2 repository:

07.02.2014 08:56:13 *WARN * LazyTextExtractorField: Failed to extract text from a binary property (LazyTextExtractorField.java, line 181)
org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser@68d7965d
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
    at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
    at org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField$ParsingTask.run(LazyTextExtractorField.java:175)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.poi.POIXMLException: java.lang.reflect.InvocationTargetException
    at org.apache.poi.xssf.usermodel.XSSFFactory.createDocumentPart(XSSFFactory.java:61)
    at org.apache.poi.POIXMLDocumentPart.read(POIXMLDocumentPart.java:277)
    at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:186)
    at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:182)
    at org.apache.poi.xssf.extractor.XSSFExcelExtractor.<init>(XSSFExcelExtractor.java:56)
    at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:172)
    at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:152)
    at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:65)
    at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:67)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
    ... 12 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
    at org.apache.poi.xssf.usermodel.XSSFFactory.createDocumentPart(XSSFFactory.java:59)
    ... 21 more
Caused by: java.lang.OutOfMemoryError: Java heap space


We believe this is due to a user uploading a large number of binary files in one go (.doc, xls, pdf). The instance is pushing them through workflow and this is consuming the entire heap.

The symptoms are very high CPU load, constant full garbage collections and no access to any of the interfaces such as system console or CRX, so no chance to stop workflow, also meaning authors currently have no access.

I'd appreciate any advice on this matter.

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.
Best answer by Sham_HC

Need to do couple of things

    *)   Make sure you have latest crx hotfix installed on your system.

    *)    Configure your system to run extraction as seperate process. That is The extractorPoolSize & forkJava variables are set in our repository.xml

    *)      If you are not using binary index disable it

4 replies

Sham_HC
Sham_HCAccepted solution
Level 10
October 16, 2015

Need to do couple of things

    *)   Make sure you have latest crx hotfix installed on your system.

    *)    Configure your system to run extraction as seperate process. That is The extractorPoolSize & forkJava variables are set in our repository.xml

    *)      If you are not using binary index disable it

Level 8
October 16, 2015

What is your max Heap size set to? 

wayne_licquoris
October 16, 2015

Thanks, I have configured the extraction to now run on a separate process with its own memory allocation and low cpu process priority as explained here:

http://helpx.adobe.com/experience-manager/kb/outOfProcessTextExtraction.html

I now have a usable instances again smiley

October 16, 2015

Hi,

I have the same issue. The JVM is started with 4Go on Windows 2008 R2 Server.

So, I have configured the extraction pool and fork command that way: extractorPoolSize = 2 and forkJavaCommand = cmd /c start /low /wait /b java -Xmx64m

In fact, the Author & Publish instance are usable now but now, I have these errors:

 

10.02.2014 10:39:56 *WARN * LazyTextExtractorField: Failed to extract text from a binary property (LazyTextExtractorField.java, line 181)
java.io.IOException: The pipe is being closed
at java.io.FileOutputStream.writeBytes(Native Method)
 

10.02.2014 13:57:40 *WARN * LazyTextExtractorField: Failed to extract text from a binary property (LazyTextExtractorField.java, line 181)
org.apache.tika.exception.TikaException: Failed to communicate with a forked parser process. The process has most likely crashed due to some error like running out of memory. A new process will be started for the next parsing request.
at org.apache.tika.fork.ForkParser.parse(ForkParser.java:123)
at org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField$ParsingTask.run(LazyTextExtractorField.java:175)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Lost connection to a forked server process
at org.apache.tika.fork.ForkClient.waitForResponse(ForkClient.java:169)
at org.apache.tika.fork.ForkClient.call(ForkClient.java:110)
at org.apache.tika.fork.ForkParser.parse(ForkParser.java:120)
... 9 more

 

Thanks.