Expand my Community achievements bar.

Dive into Adobe Summit 2024! Explore curated list of AEM sessions & labs, register, connect with experts, ask questions, engage, and share insights. Don't miss the excitement.
SOLVED

Bulk replication of Pages, Tags, Product Nodes

Avatar

Level 2

Hi Everyone,

 

I have a scenario to automate the replication of huge number of pages, tags, nodes  (overall 30k will be there).
which is the best process to achieve this?

 

Thanks in Advance

1 Accepted Solution

Avatar

Correct answer by
Level 10

Hi @bhanuprakashdod,

What is your concern exactly? I can see two possibilities:

  1. You are concerned about performance. You a huge replication could take several hours and you don't want your site to be slow during that time. In this case the simple answer is just to perform the replication in a lower environment to determine how long it would take (is it 30min or 6hours?) then identify the slot in the day when you have the least amount of traffic and do it then. If you have a X publishers and a load balancer to balance the traffic, you can do it even more seamlessly by doing the replication one publisher at a time, making sure to balance traffic to other publisher instances during that period.

  2. The content is not hierarchically related, ie: you want to publish /content/site/pageA and /content/site/pageB/1 but not /content/site/pageA/1 or something like that, so it's not as simple as publishing the root of a single website. Doing the publication via the TouchUI might require hundreds or thousands of clicks and there is a large risk of human error. In this case you will need to write some backend logic to replicate your content programatically. You can do this by using the com.day.cq.replication.Replicator service to trigger resource replication. See more information on how to use the API here.

View solution in original post

9 Replies

Avatar

Level 2
I have tried with code and it tooks around 6 hours. But if i try it by creating a package and replicate manually it took couple of minutes only. But I have a scenario to automate this and should be fast

Avatar

Community Advisor
Try using acs scheduler which i mentioned above which will run as scheduled.

Avatar

Correct answer by
Level 10

Hi @bhanuprakashdod,

What is your concern exactly? I can see two possibilities:

  1. You are concerned about performance. You a huge replication could take several hours and you don't want your site to be slow during that time. In this case the simple answer is just to perform the replication in a lower environment to determine how long it would take (is it 30min or 6hours?) then identify the slot in the day when you have the least amount of traffic and do it then. If you have a X publishers and a load balancer to balance the traffic, you can do it even more seamlessly by doing the replication one publisher at a time, making sure to balance traffic to other publisher instances during that period.

  2. The content is not hierarchically related, ie: you want to publish /content/site/pageA and /content/site/pageB/1 but not /content/site/pageA/1 or something like that, so it's not as simple as publishing the root of a single website. Doing the publication via the TouchUI might require hundreds or thousands of clicks and there is a large risk of human error. In this case you will need to write some backend logic to replicate your content programatically. You can do this by using the com.day.cq.replication.Replicator service to trigger resource replication. See more information on how to use the API here.

Avatar

Level 2
I have tried with code and it tooks around 6 hours. But if i try it by creating a package and replicate manually it took couple of minutes only. But I have a scenario to automate this and should be fast

Avatar

Community Advisor

@bhanuprakashdod Agree with @Theo_Pendle , the only issue you might have here is the performance and he has suggested the best way to figure that out.

Adding to that , in my experience the best way to do it is to write custom back-end process , may be a JOB which will run at a particular time window ( so that you don't disturb the business time) , if not you can write a servlet and hit the servlet to run the process . I would say a JOB will be the best approach in this case as it will ensure that the job is completed (If it fails , it will retry for the number of times you have mentioned ) . It will let you know if the Job is completed or failed ( If still it fails after retries ) by which you can be assured that all the content replication has been successfully completed or not. I have tried to explain the benefits of Job here  Just see if this helps. ✌

Avatar

Level 5

Adding to the suggestions shared by others, if you are unable to perform the operation outside business hours , do consider creating a new replication agent and use this replication agent to replicate the content using your custom code, this will ensure any normal content activity is not impacted. You could possibly reduce the load by replicating content in batches.

Avatar

Level 10
If your biggest concern is performance then using a package could indeed be better (no HTTPS connection and data transfer to slow you down). However, keep in mind that your content would not be considered to be "replicated" by AEM. You would have no publication status, last publication dates, etc. So before you do this, just make sure you don't have other systems (services, workflows, components) that rely on the publication status of content to function correctly or you might find a problem later down the line

Avatar

Level 5
You can use https://adobe-consulting-services.github.io/acs-aem-commons/features/package-replication-status-upda... to ensure the replication status is updated when you install the package. Did you try to replicate the all the pages together or in batches ?