Adobe Experience Manager Sites & More

prithwi · 5/9/24

Hi Team,

I'm looking for a solution to replicate large number of nodes from author to publish server without impacting performance. In my AEM author instance there is a folder under /etc/com which consists of approx. 200K nodes. There is a business requirement for that I need these nodes in publish and preview servers as well. When we tried to replicate this folder in non-prod environment, it completely blocked the distribution queue (SCD) for hours. At that moment any page replication was taking more than minute. Few cases it was more than 15 minutes.

So I'm looking for a solution to replicate these huge number of nodes to publish and preview servers without impacting server performance or blocking the SCD queue.

Any suggestion would be appreciated! Thanks

prithwi · 6/20/24

Hi,

Responding to this thread to close it with the solution we opted. We tried many things but finally we have changed the data storage structure in JCR, i.e., instead of storing the nodes in a single folder, we have divided those into multiple sub-folders. This restructuring helped us to select one folder at a time and replicate it one-by-one. We wrote a small program to perform this replication task.

To group the nodes into different subfolder we use a simple logic. Every node denotes by a unique number so we have divided it by 10000 and created subfolder with the quotient value and place that node under it. Though this logic is not very efficient or group equally but solved our problem.

Thanks to all community members.

View solution in original post

anupampat · 5/9/24

Hi @prithwi ,

You can make use of Manage Controlled Processes, you will need to write the code and use batches. This will make sure you retail performance and not put load on the system. You can explore the APIs. More info -https://kiransg.com/tag/mcp/

Thanks.

prithwi · 5/10/24

Thanks @anupampat for your solution. This is a one time task so MCP will not be useful once this task is done. I'm looking for solution like some tool which makes this process easy or if I can make this replication process in low priority etc.

I thought of using package replication. But due to large size, package creation itself failed. 😞

Jörg_Hoh · 5/10/24

Have you tried to create a content package of that path, and then replicate the package? Installing that package might definitely take a while, so I am not sure if that is acceptable.

prithwi · 5/12/24

Hi @Jörg_Hoh,

Yes I tried to create content package but it stuck for hour and finally failed. So, my idea of package replication did not work. That's why I'm looking for alternate solution.

Jörg_Hoh · 5/13/24

I assume that it can take a lot of time, but it should not have failed. How did you determine that it failed?

prithwi · 5/13/24

Hi @Jörg_Hoh, the process was running for more than 5 hours so we had to kill it to free the servers. Certainly this cannot be done in stage and prod. So we left this idea.

However, now we have planned to create a servlet and pull the same data from external server to publish instance directly. So we don't need any replication anymore. As of now it worked in lower environments, I'm now waiting for the stage implementation. Will update here if that work in after 2 days.

kautuk_sahni · 5/17/24

@prithwi Did you find the suggestions from users helpful? Please let us know if more information is required. Otherwise, please mark the answer as correct for posterity. If you have found out solution yourself, please share it with the community.

Kautuk Sahni

prithwi · 5/17/24

Hi @kautuk_sahni, no definite solution yet. However I'm trying to import the data in publishers directly. I'll respond back if this works by today.

prithwi · 6/20/24

Hi,

Responding to this thread to close it with the solution we opted. We tried many things but finally we have changed the data storage structure in JCR, i.e., instead of storing the nodes in a single folder, we have divided those into multiple sub-folders. This restructuring helped us to select one folder at a time and replicate it one-by-one. We wrote a small program to perform this replication task.

To group the nodes into different subfolder we use a simple logic. Every node denotes by a unique number so we have divided it by 10000 and created subfolder with the quotient value and place that node under it. Though this logic is not very efficient or group equally but solved our problem.

Thanks to all community members.

Jörg_Hoh · 6/20/24

I assume that the bottleneck in the initial case was the child node list, which needs to be maintained as many node types support only ordered child lists. If the ordering wouldn't matter at all, a nodetype like "oak:unstructured" would have helped, because there the updates of the childnode list would not be the bottleneck any more (there would be none).

but thanks for posting the approach you have choosen.