If you search this blog, you find one recurring theme over the years: The lifecycle of JCR sessions and Sling ResourceResolvers. That you should not keep them open for a long time. And that you definitely have to close them. But I never gave you an example what can happen if you don’t follow this recommendation. Until now.
These days I learned that was is actual problem which can arise because of it. And the problem is called “SegmentNotFoundException”.
In the past a SegmentNotFoundException was a clear indication of a corrupt JCR repository. The recommendation was always either to fix it or to restore from backup. Both operations are tedious, require downtimes and possibly also mean a loss of data. That’s probably also the reason why this specific problem is often taken for the sign of such a repository exception. So let’s systematically look at it.
The root cause
With AEM 6.4 the feature of “tail-compaction” was introduced, which is a version of the online compaction feature. It is less efficient but takes less time than the full compaction. By default in AEM the tail compaction runs daily and the online compaction once a week.
But from what I understood, this tail compaction has a problem with long running sessions, and it can happen, that tar files are compacted and removed, which are still referenced. That means, that it’s not really a on-disk corruption which needs to be fixed, but rather that some “old sessions” (read about MVCC in the previous post) are referencing data which is not there (anymore).