Concurrency issue on event handler | Community
Skip to main content
Level 2
April 9, 2026
Solved

Concurrency issue on event handler

  • April 9, 2026
  • 2 replies
  • 48 views

We have on premise AEM (6.5) author and publiser instance and during last 2-3 days our error.log file is full of following errors on publisher instance

  • OakState0001 errors

09.04.2026 13:37:45.729 *ERROR* [55.32.21.101 [1775734665631] POST /bin/receive HTTP/1.1] com.day.cq.replication.impl.servlets.ReplicationServlet Error during replication: Repository error during node import: OakState0001: Unresolved conflicts in /content/some-page/jcr:content

com.day.cq.replication.ReplicationException: Repository error during node import: OakState0001: Unresolved conflicts in /content/some-page/jcr:content

Caused by: javax.jcr.InvalidItemStateException: OakState0001: Unresolved conflicts in /content/some-page/jcr:content

Caused by: org.apache.jackrabbit.oak.api.CommitFailedException: OakState0001: Unresolved conflicts in /content/some-page/jcr:content

 

  • Concurrency warning 

09.04.2026 14:50:46.659 *WARN* [EventAdminAsyncThread #9] org.apache.jackrabbit.oak.jcr.delegate.SessionDelegate Attempted to perform getItemOrNull while thread EventAdminAsyncThread #10 was concurrently writing to this session. Blocked until the other thread finished using this session. Please review your code to avoid concurrent use of a session.

 

to be mentioned that this warning from SessionDelegate is displayed everytime when SessionDelegate tries to aquire the lock during method calls like getItemOrNull, isNodeType, etc (all methods which require synchronization)

 

The context

This is happening during page publishing on Publisher instance and I will try to describe our flow.

1. Everything starts from an EventHandler which is listening on:

  • PageEvent.EVENT_TOPIC,
  • ReplicationEvent.EVENT_TOPIC,
  • ReplicationAction.EVENT_TOPIC

2. Inside of handleEvent method we are calling

replicationTrigger.triggerReplication(path, type);

3. Inside of triggerReplication method we are calling the 

try (ResourceResolver resolver = new ResourceResolverSecurity(resolverFactory, ResourceResolverSecurity.SubserviceUser.REPLICATION_SERVICE_USER).getSpecifiedResourceResolver()) {
Session session = resolver.adaptTo(Session.class);
replicator.replicate(session, type, path, replicationOptions);
}

so the session is aquired from a new ResourceResolver

4. This replicator via replicate method is calling the related Agent which is reponsible for the given page

5. A TransportHandler implementation is invoked and the content is delivered to disk 

From debugging what I saw is that this part is handled by other threads from some ThreadPool (not on the session passed to replicator)

 

Also to be mentioned that not on all pages this is happening and also only on one environemnt where we have same code as we have it on another env where this issue is not happening.

 

What we tried so far:

  1. Unpublish page by removing manually via CRX and publish again 
  2. Checked the /system/console/status-jstack-threaddump
    1. Here all threads of type EventAdminAsyncThread have been in RUNNABLE state
    2. restarted the server and now all threads are in WAITING but same errors occurs in console

Any ideeas what can we do a step by step debugging or for what we can look for, because from my understanding, that SessionDelegator is shared at some point by 2 threads and since one thread aquired the lock, we have that warning message https://github.com/apache/jackrabbit-oak/blob/4b8fe706c52851ac063e2753f473a77738fc4104/oak-jcr/src/main/java/org/apache/jackrabbit/oak/jcr/delegate/SessionDelegate.java#L813

Also from my understanding this errors are happening during EventHandler since an EventAdminAsyncThread is bound to this process, but what I dont understand is where exactly in our code this session is shared, because everything is statefull? 

Only place where the session is passed is from eventHandler via replicator.replicate method, but again, that session is aquired from a new resource resolver (see code)

 

If someone has any ideea to what else we can looking for would be great.

 

Thanks in advice,

Razvan

    Best answer by RazvanParautiu

    Hi ​@VishalKa5 ​@happyBojack 

    We found the issue and it was not related to code itself, but due to agents. By mistake we had 2 agents on author who was activated (only one should have been active) and when a page was published, both agents send request to publisher so basically 2 async event threads have been started and automatically 2 replication proceses started at same time.

    Of course your solution ​@happyBojack would work for this usecase since all replication tasks are started one by one via that queue.

    Thanks for support,

    BR, Razvan

    2 replies

    VishalKa5
    Level 6
    April 10, 2026

    Hi ​@RazvanParautiu,

    This looks like a concurrency issue caused by multiple event threads trying to replicate the same page at the same time.

    1. Multiple events (PageEvent, ReplicationEvent, ReplicationActionEvent) may trigger duplicate replication for the same page.
    2. This causes parallel writes on the same node, leading to OakState0001 conflicts.
    3. The SessionDelegate warning means the same content is being accessed concurrently by different threads.
    4. Even with new ResourceResolvers, conflicts happen if the same page is updated simultaneously.
    5. Add logging to track duplicate event triggers for the same path.
    6. Avoid recursive replication loops from replication events.
    7. Use filtering/queueing so only one replication per page runs at a time.

    In short: the problem is most likely duplicate parallel replication on the same page, not the replicator API itself.

    Thanks & Regards,

    Vishal

    Level 2
    April 14, 2026

    Hi ​@VishalKa5 

    First of all, thank you for reply and explanation. 

    One more question I would have.. could not be something related to some index/segmentstore files corruption, since this issue we have only on one environemnt (on other envs we have same running code but no issue)

    Also regarding point 7 (filtering/queueing), do you suggest that we should start replication via sling job? What I mean by this is adding to a queue a replication process and inside of that job to call the actual replicator.replicate method and like this we assure that all pages are synchronously replicated? If so, then I assume from my understanding is this replicator.replicate doesn’t trigger an asynchronous process, right?  

    Level 1
    April 25, 2026

    Hi, I’ve recently resolved some concurrency issues therefore jumping in to share some solution which might work for you (worked for me).

    Yes, sling job would resolve the issue provided the job is configured as single threaded and RepositoryException is used for retrial.

    try {     replicator.replicate(session, ReplicationActionType.ACTIVATE, path);     return JobResult.OK;

        } catch (ReplicationException e) {          return JobResult.FAILED; // triggers retry }

    This would resolve your issue as concurrency is going to be handled by sling job queue configuration and retrial by the ReplicationException. 

    Apache Sling Job Queue Configuration

    • Queue Name: aem/replication/job 
    • Max Parallel Jobs: 1 
    • Retry Delay: e.g. 2000 ms
    • Max Retries: e.g. 3-5 

    Looking forward to your response after using this solution.

    RazvanParautiuAuthorAccepted solution
    Level 2
    April 28, 2026

    Hi ​@VishalKa5 ​@happyBojack 

    We found the issue and it was not related to code itself, but due to agents. By mistake we had 2 agents on author who was activated (only one should have been active) and when a page was published, both agents send request to publisher so basically 2 async event threads have been started and automatically 2 replication proceses started at same time.

    Of course your solution ​@happyBojack would work for this usecase since all replication tasks are started one by one via that queue.

    Thanks for support,

    BR, Razvan