Issue with Adobe CQ (AEM) 5.4 Repository - Help Required
Has anyone else experienced the following issue with the CQ5.4 CRX2.2 repository:
07.02.2014 08:56:13 *WARN * LazyTextExtractorField: Failed to extract text from a binary property (LazyTextExtractorField.java, line 181)
org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.ooxml.OOXMLParser@68d7965d
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
at org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField$ParsingTask.run(LazyTextExtractorField.java:175)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.poi.POIXMLException: java.lang.reflect.InvocationTargetException
at org.apache.poi.xssf.usermodel.XSSFFactory.createDocumentPart(XSSFFactory.java:61)
at org.apache.poi.POIXMLDocumentPart.read(POIXMLDocumentPart.java:277)
at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:186)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:182)
at org.apache.poi.xssf.extractor.XSSFExcelExtractor.<init>(XSSFExcelExtractor.java:56)
at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:172)
at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:152)
at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:65)
at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:67)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
... 12 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.poi.xssf.usermodel.XSSFFactory.createDocumentPart(XSSFFactory.java:59)
... 21 more
Caused by: java.lang.OutOfMemoryError: Java heap space
We believe this is due to a user uploading a large number of binary files in one go (.doc, xls, pdf). The instance is pushing them through workflow and this is consuming the entire heap.
The symptoms are very high CPU load, constant full garbage collections and no access to any of the interfaces such as system console or CRX, so no chance to stop workflow, also meaning authors currently have no access.
I'd appreciate any advice on this matter.