Expand my Community achievements bar.

Enhance your AEM Assets & Boost Your Development: [AEM Gems | June 19, 2024] Improving the Developer Experience with New APIs and Events
SOLVED

What's best practice for disabling replication agent such that queue continues to accept activations w/o processing them?

Avatar

Level 2

I'm going through the process of virtualizing all of the servers in my production AEM environment.  I thought if I disabled the replication agent on author I would be able isolate one of two available publishing instances in the cluster, perform the P2V process the re-enable the replication agent when I was all done and all the activation requests that occured while I had the replication agent disabled would flush through so that at the end of things my new VM publishing instance would be up to date with the one that is still a phyiscal machine.  Good plan, I thought, except that disabling the replication agent apparenlty takes it's replication queue along with it.  Yep, I should have checked that but I honestly thought I'd done it before and everything worked out.  Not this time.

I"ve geneted a list of the assets that were activated while my first replication agent was down from the replication log of the agent that remained up and I'm manully activating everything I missed now so I'll eventually work my way out of this BUT I've got the other publishing instance to do next week.

So how do you disable a replication agent but still allow any activation transactions that occur to queue up in the replication queue for said replication agent.  Total down time on the first one was 26 hour, 9 minutes.  I've read that for short outages you can sabotauge the replication agents target hostname/port and that will do the trick but they said only over short outages.  I'm thinking 26 hours isn't short in most people's minds.  So what's the solution and asking the authors to standdown is not the answer I'm looking for.  I know I could do that.

1 Accepted Solution

Avatar

Correct answer by
Employee Advisor

Do you have CQ 5.5? Then I would recommend to update first to SP3 and get in touch with Daycare support in parallel regarding this issue.

View solution in original post

3 Replies

Avatar

Employee Advisor

Well, modifying the hostname/port settings of an agent and let it point somewhere in the wild is a good idea. And I would use it always, when the queue needs to accept new replications.

Regarding time: it shouldn't be "short time", but more "reasonable amount of items in the queue". And I cannot give you any number for it, so try to perform your action when your editors aren't under full load and activating/deactiving all the time. Because in the end a full queue and slow down the complete authoring environment.

Jörg

Avatar

Level 2

We got about 350 activation transactions while my first publishing instance in the cluster was down for the 26+ hours it took to get it back from our VM group.  That's not so bad in my mind so I think I'll go for misdirecting the replication agent.when we go for the second one next week.  Thanks alot for the advice.  I don't want to go through having to manually activate a bunch of content again next week.

I'm tempted to ask if anybody else has problems with their replication agents leaving stragelers in the replication queue that never seem to go anywhere until you clear them out by hand.  If one gets enough of those in your replication queue over time it can clog up a replication agent so bad it doesn't wan to process anything anymore.  It's because of that behavior that I was afraid to just go in and pop-off a tree activation from high in the CRX node hierarchy to make the job of re-activating the assets I'd missed easy.  Rather I took the time to locate each and every asset that had been activated in the publishing instance that remained up so I could re-actiavte them for the benefit of the one that was down.

Does you happen to know what causes those activate transactions that never seem to go away?  Many of the times that I've taken the time to check it looked like the asset had actually replicated just that the transaction was never removed from the queue.  The fact that typically just clear those guys out when I come across them and nobody has yet complained of failed activations would seem to support that finding.  But it's the part about this situation causing the replication agent to become unresponsive if one doesn't pay attention to it that is truely troubling.

Avatar

Correct answer by
Employee Advisor

Do you have CQ 5.5? Then I would recommend to update first to SP3 and get in touch with Daycare support in parallel regarding this issue.