Has anyone else experienced the following issue with the CQ5.4 CRX2.2 repository:
07.02.2014 08:56:13 *WARN * LazyTextExtractorField: Failed to extract text from a binary property (LazyTextExtractorField.java, line 181)
org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser@68d7965d
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
at org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField$ParsingTask.run(LazyTextExtractorField.java:175)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.poi.POIXMLException: java.lang.reflect.InvocationTargetException
at org.apache.poi.xssf.usermodel.XSSFFactory.createDocumentPart(XSSFFactory.java:61)
at org.apache.poi.POIXMLDocumentPart.read(POIXMLDocumentPart.java:277)
at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:186)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:182)
at org.apache.poi.xssf.extractor.XSSFExcelExtractor.<init>(XSSFExcelExtractor.java:56)
at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:172)
at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:152)
at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:65)
at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:67)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
... 12 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.poi.xssf.usermodel.XSSFFactory.createDocumentPart(XSSFFactory.java:59)
... 21 more
Caused by: java.lang.OutOfMemoryError: Java heap space
We believe this is due to a user uploading a large number of binary files in one go (.doc, xls, pdf). The instance is pushing them through workflow and this is consuming the entire heap.
The symptoms are very high CPU load, constant full garbage collections and no access to any of the interfaces such as system console or CRX, so no chance to stop workflow, also meaning authors currently have no access.
I'd appreciate any advice on this matter.
Solved! Go to Solution.
Views
Replies
Total Likes
Need to do couple of things
*) Make sure you have latest crx hotfix installed on your system.
*) Configure your system to run extraction as seperate process. That is The extractorPoolSize & forkJava variables are set in our repository.xml
*) If you are not using binary index disable it
Views
Replies
Total Likes
Need to do couple of things
*) Make sure you have latest crx hotfix installed on your system.
*) Configure your system to run extraction as seperate process. That is The extractorPoolSize & forkJava variables are set in our repository.xml
*) If you are not using binary index disable it
Views
Replies
Total Likes
What is your max Heap size set to?
Views
Replies
Total Likes
Thanks, I have configured the extraction to now run on a separate process with its own memory allocation and low cpu process priority as explained here:
http://helpx.adobe.com/experience-manager/kb/outOfProcessTextExtraction.html
I now have a usable instances again
Views
Replies
Total Likes
Hi,
I have the same issue. The JVM is started with 4Go on Windows 2008 R2 Server.
So, I have configured the extraction pool and fork command that way: extractorPoolSize = 2 and forkJavaCommand = cmd /c start /low /wait /b java -Xmx64m
In fact, the Author & Publish instance are usable now but now, I have these errors:
10.02.2014 10:39:56 *WARN * LazyTextExtractorField: Failed to extract text from a binary property (LazyTextExtractorField.java, line 181)
java.io.IOException: The pipe is being closed
at java.io.FileOutputStream.writeBytes(Native Method)
10.02.2014 13:57:40 *WARN * LazyTextExtractorField: Failed to extract text from a binary property (LazyTextExtractorField.java, line 181)
org.apache.tika.exception.TikaException: Failed to communicate with a forked parser process. The process has most likely crashed due to some error like running out of memory. A new process will be started for the next parsing request.
at org.apache.tika.fork.ForkParser.parse(ForkParser.java:123)
at org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField$ParsingTask.run(LazyTextExtractorField.java:175)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Lost connection to a forked server process
at org.apache.tika.fork.ForkClient.waitForResponse(ForkClient.java:169)
at org.apache.tika.fork.ForkClient.call(ForkClient.java:110)
at org.apache.tika.fork.ForkParser.parse(ForkParser.java:120)
... 9 more
Thanks.
Views
Replies
Total Likes
Views
Likes
Replies