Expand my Community achievements bar.

Learn about Edge Delivery Services in upcoming GEM session
SOLVED

AEM61 stability

Avatar

Level 2

Hi all

I'm a little concerned about a problem I'm trying to solve out on one of our AEM 6.1 installations.

Some data:

. AEM 6.1 - author runmode - tarMK - file system datastore

. Oak version: 1.2.4  (I'm going to upgrade to latest version, but for the moment we have this version)

 

On a new environment (created starting from an OOTB installation few months ago) we have found the instance blocked. Restarting it the instance is not restarting giving the following error:

24.12.2015 13:04:00.974 *ERROR* [FelixStartLevel] org.apache.jackrabbit.oak-core [org.apache.jackrabbit.oak.plugins.segment.SegmentNodeStoreService(86)] The activate method has thrown an exception (java.lang.IllegalStateException: RefId '53' doesn't exist in data segment 1f13582c-af91-4d5a-a4ff-309fa78d91fe. Creation date delta is 17 ms.)
java.lang.IllegalStateException: RefId '53' doesn't exist in data segment 1f13582c-af91-4d5a-a4ff-309fa78d91fe. Creation date delta is 17 ms.

I have started a thread with Adobe support, but so far we had no change to solve out the issue.

I have tried to download oak-run tool and tried to:

. recover from a previous working configuration:   

java -jar oak-run-*.jar check -d1 --bin=-1 -p crx-quickstart/repository/segmentstore/

The execution of this command ended up with no good configurations to restore from the journal.

9:09:23.818 [main] INFO o.a.j.o.p.s.f.t.ConsistencyChecker - Error while checking /oak:index/workflowDataLucene/:data/_2nb_Lucene41_0.tim: Segment 5d565860-64f6-4115-a530-04b6b0f1a842 not found
19:09:23.818 [main] INFO o.a.j.o.p.s.f.t.ConsistencyChecker - Broken revision 5d565860-64f6-4115-a530-04b6b0f1a842:260876
19:09:23.818 [main] INFO o.a.j.o.p.s.f.t.ConsistencyChecker - Checking revision facd46d6-2bdd-444a-a17c-85338ddbe5b1:4036
19:09:23.818 [main] INFO o.a.j.o.p.s.f.t.ConsistencyChecker - Checking /oak:index/workflowDataLucene/:data/_2nb_Lucene41_0.tim
19:09:23.818 [main] ERROR o.a.j.o.p.segment.SegmentTracker - Segment not found: facd46d6-2bdd-444a-a17c-85338ddbe5b1. Creation date delta is 0 ms.
org.apache.jackrabbit.oak.plugins.segment.SegmentNotFoundException: Segment facd46d6-2bdd-444a-a17c-85338ddbe5b1 not found
at org.apache.jackrabbit.oak.plugins.segment.file.FileStore.readSegment(FileStore.java:870) ~[oak-run-1.2.4.jar:1.2.4]
at org.apache.jackrabbit.oak.plugins.segment.SegmentTracker.getSegment(SegmentTracker.java:136) ~[oak-run-1.2.4.jar:1.2.4]
at org.apache.jackrabbit.oak.plugins.segment.SegmentId.getSegment(SegmentId.java:108) [oak-run-1.2.4.jar:1.2.4]
at org.apache.jackrabbit.oak.plugins.segment.Record.getSegment(Record.java:82) [oak-run-1.2.4.jar:1.2.4]
at org.apache.jackrabbit.oak.plugins.segment.SegmentNodeState.getTemplate(SegmentNodeState.java:79) [oak-run-1.2.4.jar:1.2.4]
at org.apache.jackrabbit.oak.plugins.segment.SegmentNodeState.getChildNode(SegmentNodeState.java:381) [oak-run-1.2.4.jar:1.2.4]
at org.apache.jackrabbit.oak.plugins.segment.SegmentNodeStore.getRoot(SegmentNodeStore.java:146) [oak-run-1.2.4.jar:1.2.4]
at org.apache.jackrabbit.oak.plugins.segment.SegmentNodeStore.<init>(SegmentNodeStore.java:98) [oak-run-1.2.4.jar:1.2.4]
at org.apache.jackrabbit.oak.plugins.segment.file.tooling.ConsistencyChecker.checkPath(ConsistencyChecker.java:142) [oak-run-1.2.4.jar:1.2.4]
at org.apache.jackrabbit.oak.plugins.segment.file.tooling.ConsistencyChecker.check(ConsistencyChecker.java:131) [oak-run-1.2.4.jar:1.2.4]
at org.apache.jackrabbit.oak.plugins.segment.file.tooling.ConsistencyChecker.checkConsistency(ConsistencyChecker.java:83) [oak-run-1.2.4.jar:1.2.4]
at org.apache.jackrabbit.oak.run.Main.check(Main.java:736) [oak-run-1.2.4.jar:1.2.4]
at org.apache.jackrabbit.oak.run.Main.main(Main.java:159) [oak-run-1.2.4.jar:1.2.4]
19:09:23.818 [main] INFO o.a.j.o.p.s.f.t.ConsistencyChecker - Error while checking /oak:index/workflowDataLucene/:data/_2nb_Lucene41_0.tim: Segment facd46d6-2bdd-444a-a17c-85338ddbe5b1 not found
19:09:23.818 [main] INFO o.a.j.o.p.s.f.t.ConsistencyChecker - Broken revision facd46d6-2bdd-444a-a17c-85338ddbe5b1:4036
19:09:23.989 [main] INFO o.a.j.o.p.s.f.t.ConsistencyChecker - No good revision found

 

Trying to start the oak-run tool for opening maintenance console:

 java -jar oak-run-*.jar console /app/aem61/crx-quickstart/repository/segmentstore
Apache Jackrabbit Oak 1.2.4
Exception in thread "main" java.lang.IllegalStateException: RefId '53' doesn't exist in data segment 1f13582c-af91-4d5a-a4ff-309fa78d91fe. Creation date delta is 9 ms.
        at org.apache.jackrabbit.oak.plugins.segment.Segment.getRefId(Segment.java:239)
        at org.apache.jackrabbit.oak.plugins.segment.Segment.internalReadRecordId(Segment.java:351)
        at org.apache.jackrabbit.oak.plugins.segment.Segment.readRecordId(Segment.java:347)
        at org.apache.jackrabbit.oak.plugins.segment.SegmentNodeState.getTemplateId(SegmentNodeState.java:70)
        at org.apache.jackrabbit.oak.plugins.segment.SegmentNodeState.getTemplate(SegmentNodeState.java:79)
        at org.apache.jackrabbit.oak.plugins.segment.SegmentNodeState.getChildNode(SegmentNodeState.java:381)
        at org.apache.jackrabbit.oak.plugins.segment.SegmentNodeStore.getRoot(SegmentNodeStore.java:146)
        at org.apache.jackrabbit.oak.plugins.segment.SegmentNodeStore.<init>(SegmentNodeStore.java:98)
        at org.apache.jackrabbit.oak.console.Console$SegmentFixture.<init>(Console.java:153)
        at org.apache.jackrabbit.oak.console.Console$SegmentFixture.<init>(Console.java:147)
        at org.apache.jackrabbit.oak.console.Console.main(Console.java:98)
        at org.apache.jackrabbit.oak.run.Main.main(Main.java:153)

 

In conclusion: the same code and contents have been running fine on a CQ 5.4 installation for 5 years without any issue. Now, after moving to AEM 6.1 we had this issue that seems to force us to recover the installation from a previous backup.

Clearly I would like to understand:

. which are the possible causes for such a problem;

. how to recover from such a situation. I don't consider restoring a full backup a good solution to the problem since it's taking a lot of time (1Tb of repository to restore can take up to 20 hours only for copying the files) and moreover we are completely loosing activity since last backup: even few hours may be a big issue with editors producing lots of contents every day.

Any idea / suggestions?

 

Thanks

Ignazio

1 Accepted Solution

Avatar

Correct answer by
Level 10

Not able to get into crx explorer indicated repository is not up.   There might be other issue like not configured the instance properly. Please get in touch with official support because this need to analyze complete logs & to know history how it was setup.

Sharing datastore is supported platform & used by many customers. Only thing you need to make sure to follow additional steps while running datastore garbage collection asmentioned in section Multi-Repository Data Store. at https://helpx.adobe.com/experience-manager/kb/DataStoreGarbageCollection.html  If datastore garbage collection executed without following Multi-Repository Data Store then what you are experiencing is one of the possibility.  

View solution in original post

5 Replies

Avatar

Level 8

Is this a production environment or a test environment?

Did you upgrade from 5.4 to 6.1 directly, or were there other upgrades in between?

Avatar

Level 2

This is an environment that we are setting up to become the new production environment.  So let's assume it's a production tobe.

It has been built taking content packages from a migrated instance:

. copied the 5.4 instance

. upgraded to AEM 6.1: result not very good (the migration process had a lot of issues)

. I have set up a new OOTB AEM 6.1: replicated contents from AEM 6.1 migrated to new OOTB AEM 6.1

So the target environment where the issue happened can be considered an OOTB AEM 6.1 (not migrated) where I deployed the content packages.

Avatar

Level 10

Sounds like many referenced file deleted from blob store. You need to restore missing blob from backup Or create an empty file Or identify the file & take from other environment.

For now let us identify missing node & then will decide about next steps. Please follow below steps.

*) Login at http://localhost:4502/crx/explorer/
*) Go to http://localhost:4502/crx/explorer/config/check.jsp
 

  • Check "Log each node" & "Data store consistency check"
  • Have / for " Check nodes below: "

*) Click Run.
*) Send me the complete output from screen after it is complete.
 

Avatar

Level 2

HI

I have no possibility to enter crx explorer. In this state trying load  http://localhost:4502/crx/explorer  turn  in  a  NullPointerException. No way to see the crx explorer or any other page under /crx.

One thing that maybe is worth to point out to complete the description of my system: the architecture is based on a shared DataStore between author and publish. This is a possible solution when you have a lot of assets and you don't want to continuosly replicate them.  I mind if such an architecture may be responsible for a problem like this (hoping not). Publish instance (based on the same datastore) is correctly working.

Thanks for your support.

Avatar

Correct answer by
Level 10

Not able to get into crx explorer indicated repository is not up.   There might be other issue like not configured the instance properly. Please get in touch with official support because this need to analyze complete logs & to know history how it was setup.

Sharing datastore is supported platform & used by many customers. Only thing you need to make sure to follow additional steps while running datastore garbage collection asmentioned in section Multi-Repository Data Store. at https://helpx.adobe.com/experience-manager/kb/DataStoreGarbageCollection.html  If datastore garbage collection executed without following Multi-Repository Data Store then what you are experiencing is one of the possibility.