AEM 6.5.15 Reindexing loop issue
Hi everyone,
We have an AEM project where the repository size is ~ 90 GB (recently it started to grow rapidly to 190GB).
AEM version - 6.5.15
Apache Jackrabbit Oak - 1.22.13
On the project, we faced issues with very long re-indexing of indexes after the deployment. This process took ~ 4 hours and consumed all RAM (24 GB) and as a result, the instance was stuck. It didn't respond to any operation and OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead limit exceeded) was thrown. We killed the AEM process and restarted it.
After restarting of AEM, reindexing started again:
06.04.2023 16:38:02.192 *INFO* [async-index-update-async] org.apache.jackrabbit.oak.plugins.index.IndexUpdate /oak:index/someIndex1 => Indexed 10000 nodes in 2.506 s ...
06.04.2023 16:38:02.253 *INFO* [async-index-update-async] org.apache.jackrabbit.oak.plugins.index.IndexUpdate /oak:index/workflowDataLucene => Indexed 10000 nodes in 2.585 s ...
06.04.2023 16:38:02.255 *INFO* [async-index-update-async] org.apache.jackrabbit.oak.plugins.index.IndexUpdate /oak:index/ntBaseLucene => Indexed 10000 nodes in 2.560 s ...
06.04.2023 16:38:02.292 *INFO* [async-index-update-async] org.apache.jackrabbit.oak.plugins.index.IndexUpdate /oak:index/someIndex2 => Indexed 10000 nodes in 2.608 s ...
06.04.2023 16:38:02.292 *INFO* [async-index-update-async] org.apache.jackrabbit.oak.plugins.index.IndexUpdate /oak:index/someIndex3 => Indexed 10000 nodes in 2.605 s ...
06.04.2023 16:38:02.292 *INFO* [async-index-update-async] org.apache.jackrabbit.oak.plugins.index.IndexUpdate /oak:index/templateIndex => Indexed 10000 nodes in 2.609 s ...
06.04.2023 16:38:02.292 *INFO* [async-index-update-async] org.apache.jackrabbit.oak.plugins.index.IndexUpdate /oak:index/socialLucene => Indexed 10000 nodes in 2.626 s ...
06.04.2023 16:38:02.292 *INFO* [async-index-update-async] org.apache.jackrabbit.oak.plugins.index.IndexUpdate /oak:index/cmLucene => Indexed 10000 nodes in 2.611 s ...
06.04.2023 16:38:02.292 *INFO* [async-index-update-async] org.apache.jackrabbit.oak.plugins.index.IndexUpdate /oak:index/nodetypeLucene => Indexed 10000 nodes in 2.604 s ...
06.04.2023 16:38:02.305 *INFO* [async-index-update-async] org.apache.jackrabbit.oak.plugins.index.IndexUpdate Incremental indexing Traversed #10000 /var/audit/com.day.cq.wcm.core.page/content/dam/path/to/image.JPG [1.19 nodes/s, 4293.90 nodes/hr] (Elapsed 2.639 s)
We tried several times to wait until the end of reindexing, but every time our instance was stuck and not accessible.
After it, we decided to do offline compaction on the stopped AEM, and run all maintenance tasks on the running AEM and run offline reindexing of all Lucene indexes. The first try with 32 GB RAM failed. With 100 GB RAM is was successful and this process took 5 hours - Indexing completed and imported successfully in 4.964 h (17870089 ms). The command that I ran:
sudo nohup java -Dtar.memoryMapped=true -Doak.compaction.eagerFlush=true -Doak.index.ramBufferSizeMB=4096 -server -Xmx100g -Dcompaction-progress-log=5000000 -Dcompress-interval=150000000 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./ -jar compaction/oak-run-1.22.13.jar index --reindex --index-paths=/oak:index/workflowDataLucene,/oak:index/slingeventJob,/oak:index/versionStoreIndex,/oak:index/commerceLucene,/oak:index/authorizables,/oak:index/text,/oak:index/newsHighlightsIndex,/oak:index/templateIndex,/oak:index/ntFolderDamLucene,/oak:index/someIndex1,/oak:index/damAssetLucene,/oak:index/someIndex2,/oak:index/someIndex2,/oak:index/nodetypeLucene,/oak:index/ntBaseLucene,/oak:index/cqTagLucene,/oak:index/lucene,/oak:index/repTokenIndex,/oak:index/someIndex3,/oak:index/cqPageLucene,/content/project-path/oak:index/someIndex4,/content/project-path/oak:index/someIndex5,/content/project-path/markets/oak:index/lastModifiedIndex,/content/project-path/hq/de_DE/competitor/oak:index/scaleComponentIndex --read-write --fds-path=crx-quickstart/repository/repository/datastore crx-quickstart/repository/segmentstore >> compaction/oak-reindex.log 2>>compaction/oak-reindex-error.log
I was happy and I thought that we solved this issue, because when you ran AEM these reindexed indexes should be identified by AEM and it will import them.
However, after starting AEM with 120 GB of RAM it again decided to run indexing and incremental reindexing. This process took 7.808 h and instance was stuck because 120 GB of RAM WERE CONSUMED.
Please, suggest how to solve this issue with repetitive reindexing of the large repository.