Expand my Community achievements bar.

Don’t miss the AEM Skill Exchange in SF on Nov 14—hear from industry leaders, learn best practices, and enhance your AEM strategy with practical tips.
SOLVED

Replicate 200K nodes to publish and preview without blocking SCD queue in AEMaaCS

Avatar

Level 2

Hi Team,

 

I'm looking for a solution to replicate large number of nodes from author to publish server without impacting performance. In my AEM author instance there is a folder under /etc/com which consists of approx. 200K nodes. There is a business requirement for that I need these nodes in publish and preview servers as well. When we tried to replicate this folder in non-prod environment, it completely blocked the distribution queue (SCD) for hours. At that moment any page replication was taking more than minute. Few cases it was more than 15 minutes.

 

So I'm looking for a solution to replicate these huge number of nodes to publish and preview servers without impacting server performance or blocking the SCD queue.

 

Any suggestion would be appreciated! Thanks

 

Topics

Topics help categorize Community content and increase your ability to discover relevant content.

1 Accepted Solution

Avatar

Correct answer by
Level 2

Hi, 

Responding to this thread to close it with the solution we opted. We tried many things but finally we have changed the data storage structure in JCR, i.e., instead of storing the nodes in a single folder, we have divided those into multiple sub-folders. This restructuring helped us to select one folder at a time and replicate it one-by-one. We wrote a small program to perform this replication task. 

To group the nodes into different subfolder we use a simple logic. Every node denotes by a unique number so we have divided it by 10000 and created subfolder with the quotient value and place that node under it. Though this logic is not very efficient or group equally but solved our problem. 

 

Thanks to all community members.  

View solution in original post

10 Replies

Avatar

Level 5

Hi @prithwi ,

 

You can make use of Manage Controlled Processes, you will need to write the code and use batches. This will make sure you retail performance and not put load on the system. You can explore the APIs. More info -https://kiransg.com/tag/mcp/

Thanks.

Avatar

Level 2

Thanks @anupampat  for your solution. This is a one time task so MCP will not be useful once this task is done. I'm looking for solution like some tool which makes this process easy or if I can make this replication process in low priority etc. 

 

I thought of using package replication. But due to large size, package creation itself failed.  

Avatar

Employee Advisor

Have you tried to create a content package of that path, and then replicate the package?  Installing that package might definitely take a while, so I am not sure if that is acceptable.

 

 

Avatar

Level 2

Hi @Jörg_Hoh,

Yes I tried to create content package but it stuck for hour and finally failed. So, my idea of package replication did not work. That's why I'm looking for alternate solution. 

Avatar

Employee Advisor

I assume that it can take a lot of time, but it should not have failed. How did you determine that it failed?

Avatar

Level 2

Hi @Jörg_Hoh, the process was running for more than 5 hours so we had to kill it to free the servers. Certainly this cannot be done in stage and prod. So we left this idea. 

 

However, now we have planned to create a servlet and pull the same data from external server to publish instance directly. So we don't need any replication anymore. As of now it worked in lower environments, I'm now waiting for the stage implementation. Will update here if that work in after 2 days. 

Avatar

Administrator

@prithwi  Did you find the suggestions from users helpful? Please let us know if more information is required. Otherwise, please mark the answer as correct for posterity. If you have found out solution yourself, please share it with the community.



Kautuk Sahni

Avatar

Level 2

Hi @kautuk_sahni, no definite solution yet. However I'm trying to import the data in publishers directly. I'll respond back if this works by today. 

Avatar

Correct answer by
Level 2

Hi, 

Responding to this thread to close it with the solution we opted. We tried many things but finally we have changed the data storage structure in JCR, i.e., instead of storing the nodes in a single folder, we have divided those into multiple sub-folders. This restructuring helped us to select one folder at a time and replicate it one-by-one. We wrote a small program to perform this replication task. 

To group the nodes into different subfolder we use a simple logic. Every node denotes by a unique number so we have divided it by 10000 and created subfolder with the quotient value and place that node under it. Though this logic is not very efficient or group equally but solved our problem. 

 

Thanks to all community members.  

Avatar

Employee Advisor

I assume that the bottleneck in the initial case was the child node list, which needs to be maintained as many node types support only ordered child lists. If the ordering wouldn't matter at all, a nodetype like "oak:unstructured" would have helped, because there the updates of the childnode list would not be the bottleneck any more (there would be none).

 

but thanks for posting the approach you have choosen.