Getting Failed to create checkpoint message with AsyncIndexUpdate | Community
Skip to main content
October 14, 2020

Getting Failed to create checkpoint message with AsyncIndexUpdate

  • October 14, 2020
  • 2 replies
  • 7976 views

Hi,

 

We are getting 504 Gateway Timeout server error when trying to access AEM author instance in our stage author environment. In error logs we see following "Failed to create checkpoint warning messages when this happens:

 

14.10.2020 04:01:01.940 *WARN* [sling-oak-54-org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate-async] org.apache.jackrabbit.oak.segment.scheduler.LockBasedScheduler Failed to create checkpoint 8a02222a-3c8d-44dc-8066-05e78ea31fad in 10 seconds.
14.10.2020 04:01:01.940 *WARN* [sling-oak-53-org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate-fulltext-async] org.apache.jackrabbit.oak.segment.scheduler.LockBasedScheduler Failed to create checkpoint a330f028-13fa-4c1f-9430-fb98fdc84cbd in 10 seconds.
14.10.2020 04:01:16.929 *WARN* [sling-oak-57-org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate-async] org.apache.jackrabbit.oak.segment.scheduler.LockBasedScheduler Failed to create checkpoint bc3de514-3188-4383-b138-c205a81bac68 in 10 seconds.
14.10.2020 04:01:16.938 *WARN* [sling-oak-52-org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate-fulltext-async] org.apache.jackrabbit.oak.segment.scheduler.LockBasedScheduler Failed to create checkpoint 13cc8004-2cc6-4c7e-8abb-4d579670c510 in 10 seconds.
14.10.2020 04:01:31.932 *WARN* [sling-oak-54-org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate-fulltext-async] org.apache.jackrabbit.oak.segment.scheduler.LockBasedScheduler Failed to create checkpoint ffd8d722-61ef-49a9-bde5-54d43ebc7663 in 10 seconds.
14.10.2020 04:01:31.941 *WARN* [sling-oak-58-org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate-async] org.apache.jackrabbit.oak.segment.scheduler.LockBasedScheduler Failed to create checkpoint 2d8be0dc-8fca-4a04-97d2-d0d4646cd87d in 10 seconds.
14.10.2020 04:01:46.924 *WARN* [sling-oak-53-org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate-async] org.apache.jackrabbit.oak.segment.scheduler.LockBasedScheduler Failed to create checkpoint 54e30c6c-3ee3-401e-a8a3-75949436449b in 10 seconds.
14.10.2020 04:01:46.929 *WARN* [sling-oak-57-org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate-fulltext-async] org.apache.jackrabbit.oak.segment.scheduler.LockBasedScheduler Failed to create checkpoint 1903c0fe-9f4a-47e9-b1d6-5328f9d85750 in 10 seconds.
14.10.2020 04:02:01.940 *WARN* [sling-oak-52-org.apache.jackrabbit.oak.plugins.index.AsyncIndexUpdate-async] org.apache.jackrabbit.oak.segment.scheduler.LockBasedScheduler Failed to create checkpoint ed77ef61-3bac-4d59-b98a-4a98108abb3a in 10 seconds.

 

Server starts responding after a while and these warning messages also stops getting logged once server starts responding back. Last time it happened for nearly 100 minutes.

 

Any guidance here will be really helpful to understand the issue.

 

Thanks,

Bhawesh

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.

2 replies

October 21, 2020

Hi Bhawesh,
We faced the same issue earlier, Adobe recommended us to add JVM init argument -Doak.segmentNodeStore.commitFairLock=true and to restart the cq5 service. It reduced the error rate but does not solve the issue completely.

 

Adobe Employee
May 2, 2021
The error just shows that during that time some other thread was holding the lock and as a result a checkpoint couldn't be created. Checkpoints should have been created at other times. You can perform routine checks (oak-run check) to confirm that checkpoints are indeed being created and there are good revisions available to which repo can be reverted to in times of crisis. Please add JVM parameter -Doak.segmentNodeStore.commitFairLock=true and restart AEM this should help resolve the issue
Adobe Employee
May 3, 2021
did you try -Doak.segmentNodeStore.commitFairLock=true, it should have helped
this-that-the-otter
August 28, 2023

FYI, I'm seeing this issue in AEM 6.5.13. I'm also seeing heavier load and slow replication queue processing accompanying this.

 

The JVM argument:

-Doak.segmentNodeStore.commitFairLock=true 

doesn't seem to resolve it. I'm planning to compact the repository and rebuild indexes.