Expand my Community achievements bar.

SOLVED

Issue with Adobe CQ (AEM) 5.4 Repository - Help Required

Avatar

Level 1

Has anyone else experienced the following issue with the CQ5.4 CRX2.2 repository:

07.02.2014 08:56:13 *WARN * LazyTextExtractorField: Failed to extract text from a binary property (LazyTextExtractorField.java, line 181)
org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser@68d7965d
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
    at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
    at org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField$ParsingTask.run(LazyTextExtractorField.java:175)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.poi.POIXMLException: java.lang.reflect.InvocationTargetException
    at org.apache.poi.xssf.usermodel.XSSFFactory.createDocumentPart(XSSFFactory.java:61)
    at org.apache.poi.POIXMLDocumentPart.read(POIXMLDocumentPart.java:277)
    at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:186)
    at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:182)
    at org.apache.poi.xssf.extractor.XSSFExcelExtractor.<init>(XSSFExcelExtractor.java:56)
    at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:172)
    at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:152)
    at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:65)
    at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:67)
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
    ... 12 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
    at org.apache.poi.xssf.usermodel.XSSFFactory.createDocumentPart(XSSFFactory.java:59)
    ... 21 more
Caused by: java.lang.OutOfMemoryError: Java heap space


We believe this is due to a user uploading a large number of binary files in one go (.doc, xls, pdf). The instance is pushing them through workflow and this is consuming the entire heap.

The symptoms are very high CPU load, constant full garbage collections and no access to any of the interfaces such as system console or CRX, so no chance to stop workflow, also meaning authors currently have no access.

I'd appreciate any advice on this matter.

1 Accepted Solution

Avatar

Correct answer by
Level 10

Need to do couple of things

    *)   Make sure you have latest crx hotfix installed on your system.

    *)    Configure your system to run extraction as seperate process. That is The extractorPoolSize & forkJava variables are set in our repository.xml

    *)      If you are not using binary index disable it

View solution in original post

4 Replies

Avatar

Correct answer by
Level 10

Need to do couple of things

    *)   Make sure you have latest crx hotfix installed on your system.

    *)    Configure your system to run extraction as seperate process. That is The extractorPoolSize & forkJava variables are set in our repository.xml

    *)      If you are not using binary index disable it

Avatar

Level 8

What is your max Heap size set to? 

Avatar

Level 1

Thanks, I have configured the extraction to now run on a separate process with its own memory allocation and low cpu process priority as explained here:

http://helpx.adobe.com/experience-manager/kb/outOfProcessTextExtraction.html

I now have a usable instances again smiley

Avatar

Level 1

Hi,

I have the same issue. The JVM is started with 4Go on Windows 2008 R2 Server.

So, I have configured the extraction pool and fork command that way: extractorPoolSize = 2 and forkJavaCommand = cmd /c start /low /wait /b java -Xmx64m

In fact, the Author & Publish instance are usable now but now, I have these errors:

 

10.02.2014 10:39:56 *WARN * LazyTextExtractorField: Failed to extract text from a binary property (LazyTextExtractorField.java, line 181)
java.io.IOException: The pipe is being closed
at java.io.FileOutputStream.writeBytes(Native Method)
 

10.02.2014 13:57:40 *WARN * LazyTextExtractorField: Failed to extract text from a binary property (LazyTextExtractorField.java, line 181)
org.apache.tika.exception.TikaException: Failed to communicate with a forked parser process. The process has most likely crashed due to some error like running out of memory. A new process will be started for the next parsing request.
at org.apache.tika.fork.ForkParser.parse(ForkParser.java:123)
at org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField$ParsingTask.run(LazyTextExtractorField.java:175)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Lost connection to a forked server process
at org.apache.tika.fork.ForkClient.waitForResponse(ForkClient.java:169)
at org.apache.tika.fork.ForkClient.call(ForkClient.java:110)
at org.apache.tika.fork.ForkParser.parse(ForkParser.java:120)
... 9 more

 

Thanks.