AEM 6.5.15 Reindexing loop issue

Question

Hi everyone, We have an AEM project where the repository size is ~ 90 GB (recently it started to grow rapidly to 190GB).AEM version - 6.5.15Apache Jackrabbit Oak - 1.22.13On the project, we faced issues with very long re-indexing of indexes after the deployment. This process took ~ 4 hours and consumed all RAM (24 GB) and as a result, the instance was stuck. It didn't respond to any operation and OutOfMemoryError (java.lang.OutOfMemoryError: GC overhead limit exceeded) was thrown. We killed the AEM process and restarted it. After restarting of AEM, reindexing started again: 06.04.2023 16:38:02.192 *INFO* [async-index-update-async] org.apache.jackrabbit.oak.plugins.index.IndexUpdate /oak:index/someIndex1 => Indexed 10000 nodes in 2.506 s ...
06.04.2023 16:38:02.253 *INFO* [async-index-update-async] org.apache.jackrabbit.oak.plugins.index.IndexUpdate /oak:index/workflowDataLucene => Indexed 10000 nodes in 2.585 s ...
06.04.2023 16:38:02.255 *INFO* [async-index-update-async] org.apache.jackrabbit.oak.plugins.index.IndexUpdate /oak:index/ntBaseLucene => Indexed 10000 nodes in 2.560 s ...
06.04.2023 16:38:02.292 *INFO* [async-index-update-async] org.apache.jackrabbit.oak.plugins.index.IndexUpdate /oak:index/someIndex2 => Indexed 10000 nodes in 2.608 s ...
06.04.2023 16:38:02.292 *INFO* [async-index-update-async] org.apache.jackrabbit.oak.plugins.index.IndexUpdate /oak:index/someIndex3 => Indexed 10000 nodes in 2.605 s ...
06.04.2023 16:38:02.292 *INFO* [async-index-update-async] org.apache.jackrabbit.oak.plugins.index.IndexUpdate /oak:index/templateIndex => Indexed 10000 nodes in 2.609 s ...
06.04.2023 16:38:02.292 *INFO* [async-index-update-async] org.apache.jackrabbit.oak.plugins.index.IndexUpdate /oak:index/socialLucene => Indexed 10000 nodes in 2.626 s ...
06.04.2023 16:38:02.292 *INFO* [async-index-update-async] org.apache.jackrabbit.oak.plugins.index.IndexUpdate /oak:index/cmLucene => Indexed 10000 nodes in 2.611 s ...
06.04.2023 16:38:02.292 *INFO* [async-index-update-async] org.apache.jackrabbit.oak.plugins.index.IndexUpdate /oak:index/nodetypeLucene => Indexed 10000 nodes in 2.604 s ...
06.04.2023 16:38:02.305 *INFO* [async-index-update-async] org.apache.jackrabbit.oak.plugins.index.IndexUpdate Incremental indexing Traversed #10000 /var/audit/com.day.cq.wcm.core.page/content/dam/path/to/image.JPG [1.19 nodes/s, 4293.90 nodes/hr] (Elapsed 2.639 s) We tried several times to wait until the end of reindexing, but every time our instance was stuck and not accessible. After it, we decided to do offline compaction on the stopped AEM, and run all maintenance tasks on the running AEM and run offline reindexing of all Lucene indexes. The first try with 32 GB RAM failed. With 100 GB RAM is was successful and this process took 5 hours - Indexing completed and imported successfully in 4.964 h (17870089 ms). The command that I ran: sudo nohup java -Dtar.memoryMapped=true -Doak.compaction.eagerFlush=true -Doak.index.ramBufferSizeMB=4096 -server -Xmx100g -Dcompaction-progress-log=5000000 -Dcompress-interval=150000000 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./ -jar compaction/oak-run-1.22.13.jar index --reindex --index-paths=/oak:index/workflowDataLucene,/oak:index/slingeventJob,/oak:index/versionStoreIndex,/oak:index/commerceLucene,/oak:index/authorizables,/oak:index/text,/oak:index/newsHighlightsIndex,/oak:index/templateIndex,/oak:index/ntFolderDamLucene,/oak:index/someIndex1,/oak:index/damAssetLucene,/oak:index/someIndex2,/oak:index/someIndex2,/oak:index/nodetypeLucene,/oak:index/ntBaseLucene,/oak:index/cqTagLucene,/oak:index/lucene,/oak:index/repTokenIndex,/oak:index/someIndex3,/oak:index/cqPageLucene,/content/project-path/oak:index/someIndex4,/content/project-path/oak:index/someIndex5,/content/project-path/markets/oak:index/lastModifiedIndex,/content/project-path/hq/de_DE/competitor/oak:index/scaleComponentIndex --read-write --fds-path=crx-quickstart/repository/repository/datastore  crx-quickstart/repository/segmentstore >> compaction/oak-reindex.log 2>>compaction/oak-reindex-error.log I was happy and I thought that we solved this issue, because when you ran AEM these reindexed indexes should be identified by AEM and it will import them. However, after starting AEM with 120 GB of RAM it again decided to run indexing and incremental reindexing. This process took 7.808 h and instance was stuck because 120 GB of RAM WERE CONSUMED. Please, suggest how to solve this issue with repetitive reindexing of the large repository.

koha26 · Accepted Answer

We noticed, that re-indexing by AEM has a very low speed on traversing nodes - up to 100-200 nodes per second. Offline re-indexing - thousands of nodes per second.

We managed to fix this issue by the next plan:

Create checkpoint
Stop AEM
Run oak-run in console mode:

sudo java -jar compaction/oak-run-1.22.13.jar console --read-write --fds-path=crx-quickstart/repository/repository/datastore crx-quickstart/repository/segmentstore

Run the groovy script in oak-run console. This script marks that indexes are actual for now, no need to reindex them.

import org.apache.jackrabbit.oak.api.Type
import org.apache.jackrabbit.oak.commons.PathUtils
import org.apache.jackrabbit.oak.plugins.memory.ArrayBasedBlob
import org.apache.jackrabbit.oak.plugins.memory.PropertyStates
import org.apache.jackrabbit.oak.spi.commit.CommitInfo
import org.apache.jackrabbit.oak.spi.commit.EmptyHook
import org.apache.jackrabbit.oak.spi.state.ChildNodeEntry
import org.apache.jackrabbit.oak.spi.state.NodeBuilder
import org.apache.jackrabbit.oak.spi.state.NodeState
import org.apache.jackrabbit.oak.spi.state.NodeStateUtils
import org.apache.jackrabbit.oak.spi.state.NodeStore
 
updatedCheckpoint="<enter created checkpoint here>";
indexLane = "async"
NodeBuilder childBuilder(NodeBuilder root, String path){
 NodeBuilder nb = root;
 for (String nodeName : PathUtils.elements(path)){
 nb = nb.child(nodeName);
 }
 return nb;
}
ns = session.store
indexPath = "/:async"
nodeState = NodeStateUtils.getNode(ns.root, indexPath)
println "Info $nodeState"
builder = ns.root.builder()
file = childBuilder(builder, indexPath)
file.setProperty(indexLane, updatedCheckpoint, Type.STRING)
ns.merge(builder, EmptyHook.INSTANCE, CommitInfo.EMPTY)
newNodeState = NodeStateUtils.getNode(ns.root, indexPath)
println "updated $newNodeState"

4. Start AEM.
https://aem.author.host:4502/system/console/jmx/org.apache.jackrabbit.oak%3Aname%3Dasync%2Ctype%3DIndexStats
Status should be "done".
LastIndexedTime should be updated.

5. Create checkpoint

6. Run oak-run.jar reindexing (https://experienceleague.adobe.com/docs/experience-manager-65/deploying/deploying/oak-run-indexing-usecases.html?lang=en#reindexsegmentnodestore ) with created checkpoint

7. Start AEM

Saravanan_Dharmaraj · Answer

@koha26 IMO, Considering time consuming process in testing and figuring out, i would suggest to create the ticket with Adobe on finding solution. They might clone the instance and do the troubleshooting with heapdump on the engineering side. Hope that works!

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded