Expand my Community achievements bar.

SOLVED

best way to backup AEM content for 10 years

Avatar

Level 1

Hi everybody

I hope that this is the right place to give me may be a future idea how I can solve the following requirement. The thing is that we have to backup the content from AEM for the next 10 years (ones per month).

So that we every time can get a backup from the history backups. At the moment I think that the best way to solve this requirement would be to crawl the page with an external tool and save the page as
html. With this solution in the future the risk that we haven't the old AEM or Java Version every more would be eliminated.

Or know someone a way to solve this requirement with AEM standard instrument ?

And when not, have someone a recommendation which external tool would be the best choose?

At the moment we are on AEM 6.1 but we are planning to update to AEM 6.3.

I am very glad for a good hint and many thanks

Marc

1 Accepted Solution

Avatar

Correct answer by
Employee Advisor

Hi Marc,

do I understand you correctly that you rather need the archive function?

I had a similar case and we had to archive every single variant of all pages which have ever been made publicly available. We considered many approaches and also came up with creating a PDF of every page (and asset), which has been activated. And afterwards these PDF files are moved into a dedicated archiving solution, which business people can access and search directly. That means as soon as the PDF has been submitted to the archive, we don't have to care about it any more.

(moving old AEM instances to an archive and restore them on demand is causing a lot of isses: At best you preserve the complete virtual machine along wth the AEM image, otherwise you might end up in a situation where you cannot start the AEM anymore. And then you might have had issues witihin the repository at that time, which required certain special actions to take; on restore you need to apply them as well (have you them documented?). And this also totally ignores the evolution of 3rd party software outside of AEM. Is the current data or interfaces compatible with the one you restored from your archive? Making old software working again is not easy...)

View solution in original post

9 Replies

Avatar

Employee

AEM 6.3 is going to be end-of-core-support in April 2020 --- you really should upgrade to 6.5 from 6.1.

It makes no sense to migrate to a version of AEM that won't be supported in 6 months from now.

No comment in regards to your requirement though .. someone else may chime in there.

Avatar

Level 10

Hi Marc,

I join aemmarc in warning you that there is little point in upgrading to AEM 6.3 now. By the time your migration is done, tehre is a good chance you will no longer supported. Worse even, if support ends during your migration you will might be left high and dry if you have any issues (or a large bill from Adobe for extended support).

Regarding your backup, I am a bit confused.. what do you need exactly? If the requirement is to purely to keep a record of what content you were serving historically (in the style of the Wayback Machine) then yes, a web crawler saving HTML + CSS snapshots would make sense. The problem with this is that is is not an AEM backup so to speak. You would not be able to recover anything from this kind of backup.

There are many ways to create AEM backups, but probably the most simple and complete backup method is to simply create a zip of the crx-quickstart folder in the installation directory of your instance. All you need to recover from such a snapshot is the zip itself and the corresponding AEM JAR (which you should save because it may be difficult to recover a 10 year-old JAR from Adobe).

Regarding the Java version, I think this is a non-issue. You can easily download Java SE 1.1 from back in February 1997, so I wouldn't be too worried about that.

The best might be to create a an archive which holds a monthly ZIP of the crx-quickstart and add in the AEM JARs and Java versions each time you upgrade

Avatar

Employee Advisor

Do you need to restore this backup eventually (to perform more changes) or do you rather need an archive? It makes a huge difference in the number options you have available.

Jörg

Avatar

Level 1

Hi Jörg

We need the restore more to archive.

Thanks for help and best  greets

Marc

Avatar

Correct answer by
Employee Advisor

Hi Marc,

do I understand you correctly that you rather need the archive function?

I had a similar case and we had to archive every single variant of all pages which have ever been made publicly available. We considered many approaches and also came up with creating a PDF of every page (and asset), which has been activated. And afterwards these PDF files are moved into a dedicated archiving solution, which business people can access and search directly. That means as soon as the PDF has been submitted to the archive, we don't have to care about it any more.

(moving old AEM instances to an archive and restore them on demand is causing a lot of isses: At best you preserve the complete virtual machine along wth the AEM image, otherwise you might end up in a situation where you cannot start the AEM anymore. And then you might have had issues witihin the repository at that time, which required certain special actions to take; on restore you need to apply them as well (have you them documented?). And this also totally ignores the evolution of 3rd party software outside of AEM. Is the current data or interfaces compatible with the one you restored from your archive? Making old software working again is not easy...)

Avatar

Level 1
Hi Jorg, I have the same problem. How do I create a PDF of every page that had been published but is no old and unpublished? Any links or something you recommend?

Avatar

Level 1

Hi Jörg

Best thanks for your extensive answer. The fact is that when I use AEM to get a backup somone in the future  I see also problem to get the system running. Maybe we have licence Problem or something with the Environment.
So I favour a solution without AEM.
I have the idea to use an Website crawler maybe with a Sitemap to save the hole content as html-Page and to create the backup this way. But i am not sure, that there is not a better solution for my problem.

Thanks and best regards

Marc

Avatar

Employee Advisor

It completely depends on your requirements. If you publish a page and republish it a few minutes later with an important fix, a crawler is likely to miss this intermediate incorrect state. This can be handled on any incoming replication though.

Avatar

Level 1

in effetti n non ho avuto modo di realizzare il senso di quello strumento