AEM61 stability
Hi all
I'm a little concerned about a problem I'm trying to solve out on one of our AEM 6.1 installations.
Some data:
. AEM 6.1 - author runmode - tarMK - file system datastore
. Oak version: 1.2.4 (I'm going to upgrade to latest version, but for the moment we have this version)
On a new environment (created starting from an OOTB installation few months ago) we have found the instance blocked. Restarting it the instance is not restarting giving the following error:
24.12.2015 13:04:00.974 *ERROR* [FelixStartLevel] org.apache.jackrabbit.oak-core [org.apache.jackrabbit.oak.plugins.segment.SegmentNodeStoreService(86)] The activate method has thrown an exception (java.lang.IllegalStateException: RefId '53' doesn't exist in data segment 1f13582c-af91-4d5a-a4ff-309fa78d91fe. Creation date delta is 17 ms.)
java.lang.IllegalStateException: RefId '53' doesn't exist in data segment 1f13582c-af91-4d5a-a4ff-309fa78d91fe. Creation date delta is 17 ms.
I have started a thread with Adobe support, but so far we had no change to solve out the issue.
I have tried to download oak-run tool and tried to:
. recover from a previous working configuration:
java -jar oak-run-*.jar check -d1 --bin=-1 -p crx-quickstart/repository/segmentstore/
The execution of this command ended up with no good configurations to restore from the journal.
9:09:23.818 [main] INFO o.a.j.o.p.s.f.t.ConsistencyChecker - Error while checking /oak:index/workflowDataLucene/:data/_2nb_Lucene41_0.tim: Segment 5d565860-64f6-4115-a530-04b6b0f1a842 not found
19:09:23.818 [main] INFO o.a.j.o.p.s.f.t.ConsistencyChecker - Broken revision 5d565860-64f6-4115-a530-04b6b0f1a842:260876
19:09:23.818 [main] INFO o.a.j.o.p.s.f.t.ConsistencyChecker - Checking revision facd46d6-2bdd-444a-a17c-85338ddbe5b1:4036
19:09:23.818 [main] INFO o.a.j.o.p.s.f.t.ConsistencyChecker - Checking /oak:index/workflowDataLucene/:data/_2nb_Lucene41_0.tim
19:09:23.818 [main] ERROR o.a.j.o.p.segment.SegmentTracker - Segment not found: facd46d6-2bdd-444a-a17c-85338ddbe5b1. Creation date delta is 0 ms.
org.apache.jackrabbit.oak.plugins.segment.SegmentNotFoundException: Segment facd46d6-2bdd-444a-a17c-85338ddbe5b1 not found
at org.apache.jackrabbit.oak.plugins.segment.file.FileStore.readSegment(FileStore.java:870) ~[oak-run-1.2.4.jar:1.2.4]
at org.apache.jackrabbit.oak.plugins.segment.SegmentTracker.getSegment(SegmentTracker.java:136) ~[oak-run-1.2.4.jar:1.2.4]
at org.apache.jackrabbit.oak.plugins.segment.SegmentId.getSegment(SegmentId.java:108) [oak-run-1.2.4.jar:1.2.4]
at org.apache.jackrabbit.oak.plugins.segment.Record.getSegment(Record.java:82) [oak-run-1.2.4.jar:1.2.4]
at org.apache.jackrabbit.oak.plugins.segment.SegmentNodeState.getTemplate(SegmentNodeState.java:79) [oak-run-1.2.4.jar:1.2.4]
at org.apache.jackrabbit.oak.plugins.segment.SegmentNodeState.getChildNode(SegmentNodeState.java:381) [oak-run-1.2.4.jar:1.2.4]
at org.apache.jackrabbit.oak.plugins.segment.SegmentNodeStore.getRoot(SegmentNodeStore.java:146) [oak-run-1.2.4.jar:1.2.4]
at org.apache.jackrabbit.oak.plugins.segment.SegmentNodeStore.<init>(SegmentNodeStore.java:98) [oak-run-1.2.4.jar:1.2.4]
at org.apache.jackrabbit.oak.plugins.segment.file.tooling.ConsistencyChecker.checkPath(ConsistencyChecker.java:142) [oak-run-1.2.4.jar:1.2.4]
at org.apache.jackrabbit.oak.plugins.segment.file.tooling.ConsistencyChecker.check(ConsistencyChecker.java:131) [oak-run-1.2.4.jar:1.2.4]
at org.apache.jackrabbit.oak.plugins.segment.file.tooling.ConsistencyChecker.checkConsistency(ConsistencyChecker.java:83) [oak-run-1.2.4.jar:1.2.4]
at org.apache.jackrabbit.oak.run.Main.check(Main.java:736) [oak-run-1.2.4.jar:1.2.4]
at org.apache.jackrabbit.oak.run.Main.main(Main.java:159) [oak-run-1.2.4.jar:1.2.4]
19:09:23.818 [main] INFO o.a.j.o.p.s.f.t.ConsistencyChecker - Error while checking /oak:index/workflowDataLucene/:data/_2nb_Lucene41_0.tim: Segment facd46d6-2bdd-444a-a17c-85338ddbe5b1 not found
19:09:23.818 [main] INFO o.a.j.o.p.s.f.t.ConsistencyChecker - Broken revision facd46d6-2bdd-444a-a17c-85338ddbe5b1:4036
19:09:23.989 [main] INFO o.a.j.o.p.s.f.t.ConsistencyChecker - No good revision found
Trying to start the oak-run tool for opening maintenance console:
java -jar oak-run-*.jar console /app/aem61/crx-quickstart/repository/segmentstore
Apache Jackrabbit Oak 1.2.4
Exception in thread "main" java.lang.IllegalStateException: RefId '53' doesn't exist in data segment 1f13582c-af91-4d5a-a4ff-309fa78d91fe. Creation date delta is 9 ms.
at org.apache.jackrabbit.oak.plugins.segment.Segment.getRefId(Segment.java:239)
at org.apache.jackrabbit.oak.plugins.segment.Segment.internalReadRecordId(Segment.java:351)
at org.apache.jackrabbit.oak.plugins.segment.Segment.readRecordId(Segment.java:347)
at org.apache.jackrabbit.oak.plugins.segment.SegmentNodeState.getTemplateId(SegmentNodeState.java:70)
at org.apache.jackrabbit.oak.plugins.segment.SegmentNodeState.getTemplate(SegmentNodeState.java:79)
at org.apache.jackrabbit.oak.plugins.segment.SegmentNodeState.getChildNode(SegmentNodeState.java:381)
at org.apache.jackrabbit.oak.plugins.segment.SegmentNodeStore.getRoot(SegmentNodeStore.java:146)
at org.apache.jackrabbit.oak.plugins.segment.SegmentNodeStore.<init>(SegmentNodeStore.java:98)
at org.apache.jackrabbit.oak.console.Console$SegmentFixture.<init>(Console.java:153)
at org.apache.jackrabbit.oak.console.Console$SegmentFixture.<init>(Console.java:147)
at org.apache.jackrabbit.oak.console.Console.main(Console.java:98)
at org.apache.jackrabbit.oak.run.Main.main(Main.java:153)
In conclusion: the same code and contents have been running fine on a CQ 5.4 installation for 5 years without any issue. Now, after moving to AEM 6.1 we had this issue that seems to force us to recover the installation from a previous backup.
Clearly I would like to understand:
. which are the possible causes for such a problem;
. how to recover from such a situation. I don't consider restoring a full backup a good solution to the problem since it's taking a lot of time (1Tb of repository to restore can take up to 20 hours only for copying the files) and moreover we are completely loosing activity since last backup: even few hours may be a big issue with editors producing lots of contents every day.
Any idea / suggestions?
Thanks
Ignazio