Our content management system has export tools which allow users to export content into excel spreadsheet. Generally business users export content in the upward of 5000 items. No two users export content from the same parent node. We have noticed that when two users concurrently export the content, the request processing times start to become unusual highly. We have profiled the application to see where the issue is, results indicate that TarPersistentManager is synchronizing reading the nodes. The method in question is TarPersistanceManager.loadBundle(NodeId id)). We only have authoring instance and there were no background processes that were updating the content when these tests were run. The session is not shared between these two request threads.
Is there a reason why TarPersistanceManager synchronizes concurrent reads?
Thanks for your reply, the version that we are running on production is 5.4 and we did not have any hotfixes installed. Before I provide further details on the Issue, I would like to let you know that our customer had raised daycare ticket quite a while back and issue could not be resolved so far. So I am pursuing other channels to find answers for this problem.
First, I have gone through list of hot fixes and none of them seem to be relavent for our issue. But I will install these hot fixes and run the same tests again. Here is the list of hot fixes available for 5.4
I have done some more tests to see how concurrency works. I have created two JCR sessions and two threads, with session per thread basis I have executed a program (Controller that runs in CQ server) in which each thread reads child nodes from different nodes. Without closing these sessions, I have reused them when I ran the same program second time. Now I can see two threads running completely in parallel. Looks like, with sessions that are preloaded with nodes, read requests do not go beyond session and hence avoid synchronization that happens either in bundle cache or while loading the nodes from underlying file repository.
I have dug myself into jackrabbit framework and its source code. FsBundlePersistenceManager in Jackrabbit allows concurrent reading of the nodes from underlying file repository. BundleFsPersistenceManager.loadBundle(NodeId id) method is not synchronized. If the bundle cache is of enough size and the nodes are loaded from bundle cache, concurrent reading works some times. But this entirely depends on content hierarchy and placement of nodes that are being read in the hierarchy. Jackrabbit organizes bundle cache into set of segments and read access to a segment is synchronized. Placing of the nodes into segments depends on few bits in the node Id. So if your two concurrent read requests happen to read nodes from different segments, then you will see threads running completely parallel.
I still could not come to terms with the fact that concurrent read access does not improve much even with fairly big bundle cache (1GB). Our CQ5.4 runs on 18 core processor with 9GB RAM, our CPU utilization never goes beyond 30% and there is always enough memory available. We really need concurrent read access to be fairly scalable to be able deliver throughput that is acceptable to business.
I would be happy to know if the above issues have been addressed in any of the hotfixes or would be addressed some time soon