Expand my Community achievements bar.

SOLVED

Replication Issue in Production Environment - AEM 6.1

Avatar

Level 3

Hi,

We are facing issue related to Replication clogging. Below is the scenario:

Every day around 10-20 pages are getting updated/edited, but the replication queue used to show 7000+ pending items.

Is this the default behaviour?

Not able to understand the behaviour and where to look for this issue. It's impacting the authoring work.

Can we try printing that in the logs like what all pages have been sent to replication queue. If Yes then How?

Thanks

1 Accepted Solution

Avatar

Correct answer by
Community Advisor

Hi Akshita,

    When any page is replicated and is delayed for some reason it might stay in the queue till you try the publish. There could be a high possibility that the pages or nodes replication failed/delayed over a long period of time are staying in the queue and ended up being a big pile of 7000 nodes. So the nodes which you replicate later will wait in the queue for the nodes prior to that to get replicated successfully

     That said , the 10 -20 pages you replicate everyday is now getting accumulated in queue waiting for replication. So the best option to get around this issue to clear your queue completely. Replication queue issues  This might help you to understand more on this. On the worst case to clear all the content ( Highly not recommended in PRODUCTION. But worst case. ) , refer

Force the queue clearance by deleting corresponding Sling Jobs section in the above documentation.

   Please retry the replication after clearing the existing queue and then make sure you are not facing any issues . If yes, take the logs and post it to the forum to see what could have possibly gone wrong. The prior failure could have been because of different reasons and since the queue is full better to clear it , try again and then track the logs if still you see replication is failing.

PS:- The above said statements are my understanding of the issues.

Thanks

Veena

View solution in original post

7 Replies

Avatar

Correct answer by
Community Advisor

Hi Akshita,

    When any page is replicated and is delayed for some reason it might stay in the queue till you try the publish. There could be a high possibility that the pages or nodes replication failed/delayed over a long period of time are staying in the queue and ended up being a big pile of 7000 nodes. So the nodes which you replicate later will wait in the queue for the nodes prior to that to get replicated successfully

     That said , the 10 -20 pages you replicate everyday is now getting accumulated in queue waiting for replication. So the best option to get around this issue to clear your queue completely. Replication queue issues  This might help you to understand more on this. On the worst case to clear all the content ( Highly not recommended in PRODUCTION. But worst case. ) , refer

Force the queue clearance by deleting corresponding Sling Jobs section in the above documentation.

   Please retry the replication after clearing the existing queue and then make sure you are not facing any issues . If yes, take the logs and post it to the forum to see what could have possibly gone wrong. The prior failure could have been because of different reasons and since the queue is full better to clear it , try again and then track the logs if still you see replication is failing.

PS:- The above said statements are my understanding of the issues.

Thanks

Veena

Avatar

Level 10

Thanks for the detailed response

Avatar

Employee Advisor

If the queue is accumulating entries, replication items are activated (that means published or unpublished) by this queue.

Please validate if this replication agent is supposed to work at all. That means is it pointing to a valid publish instance? Have you tried the "test connection" function already to validate that the publish instance can be reached? Then there is logging available which should indicate problems with replication.

Jörg

Avatar

Community Advisor

I would like to add an edit to my answer .

The replication does retry activation every 1 minute

1406903_pastedImage_0.png

     So in every one minute the replication action will try to activate the pages again. As Jorg suggested please do a test connection and make sure your publish is available for replication and you are getting a succeed message. Track the logs and  respond back for more insight on the issue. Meanwhile you can raise a day care ticket since this is production instance.

Avatar

Level 3

Veena_07​ and @Jörg Hoh  

We have done the test connection and it is successful.

Also, whenever we are getting this issue, we keep on clearing the queue and status also shows idle.

But after some time it again grows in number. Not sure whether there are backlog entries else it shouldn't show idle status.

Will track the logs and let you know.

Thanks for the response.

Avatar

Level 3

One more thing i would like to point out is that its Salesforce pages that are piling up in the queue. Any idea as to what can be the solution to get the status as idle, apart from clearing queue, restarting instances, etc.

Avatar

Employee Advisor

What are "Salesforce pages"? A blocked replication agent can also point to issues on the "remote" side (a.k.a. publish instances). Please check the log of the replication agent.

Jörg