Expand my Community achievements bar.

Guidelines for the Responsible Use of Generative AI in the Experience Cloud Community.
SOLVED

AEM 6.1 not starting due to Lucene errors

Avatar

Level 4

Hi,

Our AEM 6.1 is not starting.  The error.log file has the following warning and error error

 *WARN* [pool-7-thread-16] org.apache.jackrabbit.oak.plugins.index.lucene.IndexCopier Error occurred while copying file [_sf.cfs] from OakDirectory@78da42b4 lockFactory=org.apache.lucene.store.NoLockFactory@41517568 to MMapDirectory@/build2/install/cq/migratedauthor/561Upgrade/author/crx-quickstart/repository/index/e5a943cdec3000bd8ce54924fd2070ab5d1d35b9ecf530963a3583d43bf28293/1 lockFactory=NativeFSLockFactory@/build2/install/cq/migratedauthor/561Upgrade/author/crx-quickstart/repository/index/e5a943cdec3000bd8ce54924fd2070ab5d1d35b9ecf530963a3583d43bf28293/1
java.io.EOFException: reached end of stream after reading 0 bytes; 11726 bytes expected

 

*ERROR* [FelixStartLevel] org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker Could not access the Lucene index at /oak:index/lucene
java.io.EOFException: reached end of stream after reading 0 bytes; 11726 bytes expected

The error.log file did state that the Repository was started

*INFO* [FelixStartLevel] com.adobe.granite.repository.impl.CRX3RepositoryImpl Repository started.

 

However, we are not able to connect to AEM 6.1.  The logs so far have not been very verbose...

Any advice?

Thank You

1 Accepted Solution

Avatar

Correct answer by
Level 4

Ok, finally, our CQ is upgraded to AEM 6.1 successfully.  A couple of gotchas 

1. Prior to upgrading to AEM 6.1, we had datastore consistency errors.  We needed to make sure that those errors are resolved.  They stemmed from -1 length cq:properties node in /var/audit and -1 length data node in CQ created versions for some dam assets.  For versioning errors due to missing data, use the version manager tool in etc to purge old versions.  For -1 cq:properties node in /var/audit, locate the path and remove the node

2. Remove all replication agents (although this might not be necessary).  

3. Clear all assigned, cancelled and unassigned jobs in /var/eventing/jobs

4. Uninstall all custom 3rd party vendor packages

5. Once the repository is migrated, do NOT update the start script and start AEM in hopes of upgrading CQ.  Please use "java -jar aem-quickstart-6.1.0.jar -r <proper runmodes> -Doak.mongo.uri=mongodb://<host>:<port> -Doak.mongo.db=<db name>"

6. Once AEM is started and login is possible, stop AEM, update the start script (pay attention to the run modes near the end of the provided start script), start AEM

Thank You ALL for your wonderful help! 

View solution in original post

18 Replies

Avatar

Employee Advisor

I think you need to rebuild the OAK indexes as it seems they are corrupt. 

Avatar

Employee

Hi,

have you already raised a daycare ticket?

Were there any changes made to the system before your restarted it? Is there enough disk space?

Regards,

Opkar

Avatar

Level 4

Hi Kunal,

Do you know how we can rebuild the OAK indexes?  I have tried lookup for instructions but failed to find any...

Thank You

Avatar

Level 4

Hi,

I have an existing daycare ticket, which is about missing authentication service.  With help from the forums, I managed to get pass that error by creating a blob in the datastore folder.  I kept the daycare ticket open because I wanted to know if daycare support has a better way to address the missing blob issue.

Now that the repository can be started, AEM is complaining about Lucene errors.  Disk space is not an issue.

Thank you.

Avatar

Employee Advisor

You can do it by updating the reindex property to true of index definition nodes under /oak:index. Are you able to access /crx/explorer or /crx/de UIs after restart ?  

Avatar

Employee

The blob will have contained data that the repository is expecting and recreating it is not ideal, were you able to get a copy of the blob file from a previous backup? You could also check another AEM instance, it may be the blob file is created by a system file, so can be copied from another instance.

Regards,

Opkar

Avatar

Level 4

The pre-upgraded AEM 6.1 was based on a CQ 5.6.1 backup.  The blob does not exist in the backup used to upgrade the CQ instance.  The blob does not exist in the running CQ, from which the backup was taken.  

Thank You

Avatar

Level 4

I read that you can do a reindex by adding a node to /oak:index ins crx/de, but unfortunately, I cannot get to crx/de, crx/explorer or any of the UIs...How about reindexing from the Mongo shell?  For eg, db.collection.reIndex()?

Avatar

Employee Advisor

I think that will recreate the mongo collection index and not the OAK index. I am just guessing this but you can give it a try. Just rename the index folder under /crx-quickstart/repository folder and restart AEM. 

Avatar

Level 4

kunal23 wrote...

I think that will recreate the mongo collection index and not the OAK index. I am just guessing this but you can give it a try. Just rename the index folder under /crx-quickstart/repository folder and restart AEM. 

 

I did a db.nodes.reIndex() and AEM did not start.  I restored the backup I created when I completed the upgrade the first time and renamed index to index_old.  I tried to start AEM and I was able to get to system/console.  I was still not able to get to crx/de because AEM threw a "Missing Authentication Service".  The error.log file still had the missing blob error and the repository was shut down.  I followed your previous recommendation and copied a blob file from some other folder and gave it the ID AEM was complaining about.  I tried to start AEM 6 again and now I am back at 

org.apache.jackrabbit.oak.plugins.index.lucene.IndexTracker Could not access the Lucene index at /oak:index/lucene
org.apache.lucene.index.CorruptIndexException: Invalid CFS entry offset: 1173827933404397826 (resource: _sf.cfs)

 

I am still able to access system/console, but the bundles now show only 173 instead of the full 469.  I see a lot of mongo DB activities going on 

2015-10-05T22:20:20.258-0700 I QUERY    [conn14] query aem6.nodes query: { $query: { _id: { $gt: "8:/jcr:system/jcr:versionStorage/03/b4/00/03b4003a-7324-4af7-95be-999b5331180d/1.0/", $lt: "8:/jcr:system/jcr:versionStorage/03/b4/00/03b4003a-7324-4af7-95be-999b5331180d/1.00" } }, $orderby: { _id: 1 }, $hint: { _id: 1 }, $maxTimeMS: 60000 } planSummary: IXSCAN { _id: 1 } ntoreturn:0 ntoskip:0 nscanned:1 nscannedObjects:1 keyUpdates:0 writeConflicts:0 numYields:1 nreturned:1 reslen:119439 locks:{ Global: { acquireCount: { r: 4 } }, MMAPV1Journal: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 2 } }, Collection: { acquireCount: { R: 2 } } } 244ms

Could it be that it is rebuilding the indexes?  Accessing crx/de at this point shows "Startup in progress".  I am waiting to see if AEM can start up.

Avatar

Level 4

I am not sure what is happening, but I see many queries in Mongo.  With access to system/console, I boosted the log level for error.log to TRACE, but I do not see any new information.  I triggered startPropertyIndexAsyncReindex and DataStoreGC via JMX.  I am not sure if any of these will help...AEM is still saying "Startup in progress"

Avatar

Employee Advisor

Which version of Java are you using ?  Could you please share your error log file ?

Avatar

Employee Advisor

It seems that some of your OSGI bundles are not getting Active and thats why you see the startup in progress message.  Do you know which bundle is having the problems ?  You can disable the startup filter to just to go past this message and check any exceptions when you access any UIs.  Uncheck "Active by default" in the following configuration to disable the filter - http://localhost:4502/system/console/configMgr/org.apache.sling.startupfilter.impl.StartupFilterImpl

Avatar

Level 4

I unchecked "Active by default" and I saw that the System Bundle (org.apache.felix.framework)'s Status is "Starting".  The other bundles are all Resolved.  I tried starting a couple of them but nothing happens.  I checked the error.log file and nothing was logged.

I went to <server>:<port> and no longer saw "Startup in progress".  Instead, it is now showing 

Problem accessing /. Reason:

Not Found

Avatar

Level 4

Hi,

We are using java 1.7.

Attached is the error.log

Thank You!

p/s: I gotta hand it to you.  You are even more responsive than daycare support

Avatar

Employee Advisor

Not sure why your instance is not starting. The only exception I see is that it can not find "sling:jobEvent" node type definition in the repository. Did you see any errors while you upgraded in the upgrade.log file ? You can try registering the node type manually by importing the contents of the cnd file here - http://localhost:4502/crx/explorer/nodetypes/index.jsp. But I am not sure whether this is the root cause of not starting up. 

Avatar

Level 4

Prior to upgrading our AEM, the data consistency check did complain about errors.  Hence, we went back, restored the old backup, ran the data consistency check, fixed the errors, re-ran the data consistency check, confirmed that there were no more errors and executed the upgrade again.

Now that the upgrade is done, AEM is still not starting up.  There are no Lucene index errors and no missing blobs.  Out of the 408 bundles, only 208 bundles were installed.  System Bundle is stuck at Starting.  There are Installed bundles that refuse to start and the log keeps saying 

14.10.2015 23:48:28.885 *WARN* [pool-6-thread-1] org.apache.sling.commons.scheduler.impl.QuartzScheduler No discovery info available. Executing job org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate@3099f67 with name Registered Service.365 and config SINGLE anyway.

Thank You

Avatar

Correct answer by
Level 4

Ok, finally, our CQ is upgraded to AEM 6.1 successfully.  A couple of gotchas 

1. Prior to upgrading to AEM 6.1, we had datastore consistency errors.  We needed to make sure that those errors are resolved.  They stemmed from -1 length cq:properties node in /var/audit and -1 length data node in CQ created versions for some dam assets.  For versioning errors due to missing data, use the version manager tool in etc to purge old versions.  For -1 cq:properties node in /var/audit, locate the path and remove the node

2. Remove all replication agents (although this might not be necessary).  

3. Clear all assigned, cancelled and unassigned jobs in /var/eventing/jobs

4. Uninstall all custom 3rd party vendor packages

5. Once the repository is migrated, do NOT update the start script and start AEM in hopes of upgrading CQ.  Please use "java -jar aem-quickstart-6.1.0.jar -r <proper runmodes> -Doak.mongo.uri=mongodb://<host>:<port> -Doak.mongo.db=<db name>"

6. Once AEM is started and login is possible, stop AEM, update the start script (pay attention to the run modes near the end of the provided start script), start AEM

Thank You ALL for your wonderful help!