Help ways to avoid segment store corruption

pavankumarr6544

04-02-2019

Team,

Need support in resolving AEM segment store corruption issues.

Every week when we check the last known good configuration in both author and publish instances publish or author gets corrupted due to the garbage collection.

We are in critcal business situation and need help in resolving issues.

Adobe says only option is to restore the whole repository, but we want any way to avoid issues after restoration.

I mean we are not able to exactly find the root cause why index or repository segment store is getting corrupted.

Accepted Solutions (1)

Accepted Solutions (1)

smacdonald2008

05-02-2019

If this is happening - its something that community cannot really help with. The AEM COmmunity is great at teaching you HOW TO perform a use case - not address bugs or broken software. 

You may need to reach out to the customer care team and you may require a hotfix. Please contact them

Answers (5)

Answers (5)

Gaurav-Behl

MVP

05-02-2019

You mentioned that this happens each week, is that correct? Did you try to find out what triggers this corruption based on your log-messages?

If you don't compress the repo then you would not get the last good configuration. You should configure maintenance scripts as mentioned in the articles below. Refer [0]

To remove nodes manually check - Offline Compaction fails with SegmentNotFoundException & IllegalArgumentException

[0]

Adobe Experience Manager Help | Using Online Revision Clean-up in AEM https://helpx.adobe.com/experience-manager/kb/AEM6-Maintenance-Guide.html

Revision Cleanup https://helpx.adobe.com/experience-manager/kb/AEM6-Maintenance-Guide.html

AEM 6.x Maintenance Guide

Oak Datastore Consistency Check · GitHub

Fix Inconsistencies in the repository when SegmentNotFound Issue is reported in AEM 6.x https://gist.github.com/andrewmkhoury/39151ca954b26d5a516d

https://helpx.adobe.com/experience-manager/kb/offline-compaction-fails-with-SegmentNotFoundException...Online/Offline Compaction : AEM 6.3

Consistency and Traversal Checks

Adobe Experience Manager Help | Using oak-run.jar to Manage Indexes in AEM

P.S.  After the first online revision cleanup run the repository will double in size. This happens because there are now two generations that are kept on disk.

pavankumarr6544

05-02-2019

Hi gaurav,

Actually when we are checking for last known good configuration we are not able to find one.

When we search using the groovyscript load method then we are finding a 1000 nodes which are corrupted.

We asked adobe if there are any steps to remove them automatically without deleting one by one. They have not responded to it.

We just want to understand where we are going wrong in checking configuration of corrupted segment store.

We need support in maintenance of the server to check how to maintan the server and do disaster recover of repository

  • Does the corruption happen with segmentstore or indexes or both?
  • The corruption is happening with segment store when we check last known good configuration with oak-run jar file we are getting segment not found error and its not showing any good configuration
  • Do you compact/compress the repo periodically, offline or online?

We do not compress the repository as we are new to it we would like to know which is best method.

  • What commands/scripts do you use to check the last good known configuration/other tasks as you mentioned?

java  -jar oak-run-1.8.2.jar check  /datadir/AEM/author/crx-quickstart/repository/segmentstore/

Can you make sure that oak-jar version that you use for maintenance tasks matches with CRX-oak versions in both author and publish instances?

Yes we are using the version compatible with our oak-repository

our aem version is 6.4

pavankumarr6544

05-02-2019

Hi gaurav,

Actually when we are checking for last known good configuration we are not able to find one.

When we search using the groovyscript load method then we are finding a 1000 nodes which are corrupted.

We asked adobe if there are any steps to remove them automatically without deleting one by one. They have not responded to it.

We just want to understand where we are going wrong in checking configuration of corrupted segment store.

We need support in maintenance of the server to check how to maintan the server and do disaster recover of repository

Gaurav-Behl

MVP

04-02-2019

Do you mean to say corruption happens due to garbage collection and the trigger event is when you check last known good configuration or you mean that you fix the corruption after reverting to the last good known configuration?

In any case, the root cause must be logged in error.log file(s).

Couple of questions:

  • Does the corruption happen with segmentstore or indexes or both?
  • Do you compact/compress the repo periodically, offline or online?
  • What commands/scripts do you use to check the last good known configuration/other tasks as you mentioned?
  • Can you make sure that oak-jar version that you use for maintenance tasks matches with CRX-oak versions in both author and publish instances?