Adobe Experience Manager Sites & More

wayne_licquoris · 10/15/15

Has anyone else experienced the following issue with the CQ5.4 CRX2.2 repository:

07.02.2014 08:56:13 *WARN * LazyTextExtractorField: Failed to extract text from a binary property (LazyTextExtractorField.java, line 181)
org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser@68d7965d
   at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
   at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
   at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
   at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
   at org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField$ParsingTask.run(LazyTextExtractorField.java:175)
   at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
   at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207)
   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.poi.POIXMLException: java.lang.reflect.InvocationTargetException
   at org.apache.poi.xssf.usermodel.XSSFFactory.createDocumentPart(XSSFFactory.java:61)
   at org.apache.poi.POIXMLDocumentPart.read(POIXMLDocumentPart.java:277)
   at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:186)
   at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:182)
   at org.apache.poi.xssf.extractor.XSSFExcelExtractor.<init>(XSSFExcelExtractor.java:56)
   at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:172)
   at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:152)
   at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:65)
   at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:67)
   at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
   ... 12 more
Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at org.apache.poi.xssf.usermodel.XSSFFactory.createDocumentPart(XSSFFactory.java:59)
   ... 21 more
Caused by: java.lang.OutOfMemoryError: Java heap space

We believe this is due to a user uploading a large number of binary files in one go (.doc, xls, pdf). The instance is pushing them through workflow and this is consuming the entire heap.

The symptoms are very high CPU load, constant full garbage collections and no access to any of the interfaces such as system console or CRX, so no chance to stop workflow, also meaning authors currently have no access.

I'd appreciate any advice on this matter.

Sham_HC · 10/15/15

Need to do couple of things

*) Make sure you have latest crx hotfix installed on your system.

*) Configure your system to run extraction as seperate process. That is The extractorPoolSize & forkJava variables are set in our repository.xml

*) If you are not using binary index disable it

View solution in original post

Sham_HC · 10/15/15

Need to do couple of things

*) Make sure you have latest crx hotfix installed on your system.

*) Configure your system to run extraction as seperate process. That is The extractorPoolSize & forkJava variables are set in our repository.xml

*) If you are not using binary index disable it

Paul_McMahon · 10/15/15

What is your max Heap size set to?

wayne_licquoris · 10/15/15

Thanks, I have configured the extraction to now run on a separate process with its own memory allocation and low cpu process priority as explained here:

http://helpx.adobe.com/experience-manager/kb/outOfProcessTextExtraction.html

I now have a usable instances again

Salhaji_Nizar · 10/15/15

Hi,

I have the same issue. The JVM is started with 4Go on Windows 2008 R2 Server.

So, I have configured the extraction pool and fork command that way: extractorPoolSize = 2 and forkJavaCommand = cmd /c start /low /wait /b java -Xmx64m

In fact, the Author & Publish instance are usable now but now, I have these errors:

10.02.2014 10:39:56 *WARN * LazyTextExtractorField: Failed to extract text from a binary property (LazyTextExtractorField.java, line 181)
java.io.IOException: The pipe is being closed
at java.io.FileOutputStream.writeBytes(Native Method)

10.02.2014 13:57:40 *WARN * LazyTextExtractorField: Failed to extract text from a binary property (LazyTextExtractorField.java, line 181)
org.apache.tika.exception.TikaException: Failed to communicate with a forked parser process. The process has most likely crashed due to some error like running out of memory. A new process will be started for the next parsing request.
at org.apache.tika.fork.ForkParser.parse(ForkParser.java:123)
at org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField$ParsingTask.run(LazyTextExtractorField.java:175)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Lost connection to a forked server process
at org.apache.tika.fork.ForkClient.waitForResponse(ForkClient.java:169)
at org.apache.tika.fork.ForkClient.call(ForkClient.java:110)
at org.apache.tika.fork.ForkParser.parse(ForkParser.java:120)
... 9 more

Thanks.

Adobe Experience Manager Sites & More

Issue with Adobe CQ (AEM) 5.4 Repository - Help Required

Learn

Documentation

Community

Support

Resources

Adobe account

Adobe