Expand my Community achievements bar.

Don’t miss the AEM Skill Exchange in SF on Nov 14—hear from industry leaders, learn best practices, and enhance your AEM strategy with practical tips.
SOLVED

CQ5.6 Disaster Recovery Strategies

Avatar

Level 3

Hi All.

I'm trying to develop a disaster recovery strategy for CQ5.6 but it seems like all solutions are ultimately deficient in one way or another. The environment has 1 author and 4 publishers.

These appear to be some of the options:

1) Clustering: This gets me an exact replica of my environment but clustering requires very good network throughput between instances. If my primary and DR instances are on opposite sides of the United States then I can't get the required throughput. And does clustering guarantee that the DR site author and publishers will be in sync? If an activation completes at the primary site just before a major failure, is that activation guaranteed to be properly reflected on both the author and publishers at the DR site? I believe this solution also requires additional CQ5 licenses.

2) Author-to-Author Replication: This solution copes better with the slow network but the entire repository on author needs to be replicated. Is there some way to configure author-to-author replication so that absolutely everything is replicated? And if so, does replication to the DR author instance trigger a replication to the DR publisher? If not, then each primary publisher must replicate to its DR partner. And that doesn't guarantee that the DR author will be in sync with its DR publishers. This solution also requires additional CQ5 licenses.

3) Storage Replication (e.g. via Amazon EBS Volumes): This is more of an offline situation where, in case of a disaster, I spin up new instances at the DR site using the most recent backups. This solution doesn't require additional CQ5 licenses. However, it doesn't guarantee that the DR author will be in sync with its DR publishers. If I want to guarantee synchronization between the DR instances then I have to shut down the primary author (and maybe the publishers as well) before I create backups for each. The downtime could be minimized by replicating the data store separately but I still have to shut down the primary author.

4) Use Database Persistence Manager: This would offload all backups/clustering to the database instead. And it should be possible to achieve full synchronization at the DR site without having to stop either the primary author or the database. The drawback here, however, is that you need to maintain a database and spend money on additional infrastructure.

So, assuming you stay with the tar persistence manager, are there ways to maintain a DR site and guarantee that the DR author is perfectly in sync with all its DR publishers without having to shut down author?

Thanks in advance for any suggestions.

David Frenkiel

1 Accepted Solution

Avatar

Correct answer by
Level 10

Hi David,

DR depends on your organization policy affordable data loss & offline time. Which is not clear from your description but i am assuming you are looking for zero data loss & office time which is little challenging to meet in 5.6 & hopefull will be better in next release with introduction on Mongodb.If your primary concern is DR author is perfectly in sync with all its DR publishers then replication should work well. 

Option1)   Agree with network letancy also running in different geo is not supported. Cluster will be in sync as long as both are running & good network letancy. You can verify with finding last transaction in tarjornal. If failure happens before sync completes they chances are there it will not be in sync & if dr acts as ready only then on next restore both will be in sync.  AFAIK require additional licence

Option2)  You can configure author to author replication as you do with regular replication.  From DR author to dr publish you can use chain replication. http://aemfaq.blogspot.com/2013/05/chain-replication-sample.html  Regarding licence check with sales team & I am not 100% confident

Option3)   Alternative you can block repository writes in primary author instead of shutdown. However you have to sync again from DR to primary.  Additionally you will have ip switcher to point to DR so do you really have it shutdown instead it can be taken care at network level ? 

Option4)  Personally I do not prefer this option and agree with your understanding. 

  Thanks,

Sham

View solution in original post

4 Replies

Avatar

Correct answer by
Level 10

Hi David,

DR depends on your organization policy affordable data loss & offline time. Which is not clear from your description but i am assuming you are looking for zero data loss & office time which is little challenging to meet in 5.6 & hopefull will be better in next release with introduction on Mongodb.If your primary concern is DR author is perfectly in sync with all its DR publishers then replication should work well. 

Option1)   Agree with network letancy also running in different geo is not supported. Cluster will be in sync as long as both are running & good network letancy. You can verify with finding last transaction in tarjornal. If failure happens before sync completes they chances are there it will not be in sync & if dr acts as ready only then on next restore both will be in sync.  AFAIK require additional licence

Option2)  You can configure author to author replication as you do with regular replication.  From DR author to dr publish you can use chain replication. http://aemfaq.blogspot.com/2013/05/chain-replication-sample.html  Regarding licence check with sales team & I am not 100% confident

Option3)   Alternative you can block repository writes in primary author instead of shutdown. However you have to sync again from DR to primary.  Additionally you will have ip switcher to point to DR so do you really have it shutdown instead it can be taken care at network level ? 

Option4)  Personally I do not prefer this option and agree with your understanding. 

  Thanks,

Sham

Avatar

Level 3

Thanks for the info, Sham.

You're correct, I only need to ensure that the DR author is in sync with all its DR publishers. It's ok if there's some downtime or data loss.

Regarding option 2, how do I set up replication between the authors so that everything is replicated? By default only activated items are replicated. For example, if I upload an image to the DAM (but I don't activate it) how do I set up replication so that the image is copied to the DR author instance?

For option 3, I need to prevent updates to author while I create backups for it and all its publishers. That way I'll have a set of backups that are in sync. If the primary datacenter goes down then I do a DNS switch (e.g. via http://docs.aws.amazon.com/Route53/latest/DeveloperGuide/dns-failover.html).

Thanks again.

David Frenkiel

Avatar

Employee

Keep in mind regarding #2 that replication is a lossy process - you cannot produce a complete replica based on replication. Most critically, versions and access control are completely lost. So if your DR requirements include being able to handle the case where the primary site is permanently unavailablle, then it is not a valid solution. Only #1 and #3 are lossless processes.

Avatar

Level 3

Thanks, Justin. I think we'll go with option 3. It's straightforward and there's no need for extra licenses.