Expand my Community achievements bar.

SOLVED

Content Archival - Adobe Recommendation?

Avatar

Level 5

Hi all,

Question just came up recently about what we should be doing with content that we've deactivated but must keep around for legal reasons (for n years). We can't delete it and certain authors may still need to access it (and its version history) for the aforementioned duration. At the moment, it would appear that some authors have decided to create a "deactivated-content" folder in certain site areas and have been moving all of their content there. Ideally, we'd like to just delete content that isn't being used anymore to prevent the repository from growing out of hand, but there are certain things we just can't delete for now.

We've considered setting up an extra author instance and migrating content there that needs to be archived, but just thinking about automating something like that....getting every single little node/property (version history, referenced assets, etc) to move along with the page just seems like a massive headache.

I'm wondering what is Adobe's recommendation for handling content like this?

1 Accepted Solution

Avatar

Correct answer by
Employee Advisor

Hi Greg,

I wouldn't consider the PDF way the worst case scenario. It's a bigger investment in the first place, of course. But if you snapshot every page only upon change, the amount of PDFs you create on each day shouldn't be problematic.

But keeping daily backups for years will cause huge storage costs. If a "restore" is requested (normally just a page for a specific date) you need to restore that backup, spin up the instance and make the instance available again to the business. You need to maintain operating systems, java versions and many other things. Not to mention backend systems if your application relies on their presence. And have you checked, if you need to restore the data of these backend systems as well?

Creating PDFs doesn't sound elegant in an IT way, but you store just what you need: The page and it's content. And even business people and navigate through an archive and compare timestamps.

Jörg

View solution in original post

4 Replies

Avatar

Level 10

Greg -- i will check with some internal ppl on this. 

Avatar

Employee

Hi Greg,

Are you using VM's? An option may be to keep monthly backups for n years. Once content has been deactivated delete content. If you are required to retrieve any content, then you would go back to the backup that was taken just after the date in question and start the instance and view the content.

You could also take repository backups, but the problem with this is that if you were required to keep content for say 7 years, can be a legal requirement for Financial Services clients in the UK, then the OS and Java version may be very different from that used with a backup taken nearly 7 years ago, hence the recommendation to archive the OS as well as the repository.

Another option might be to create a PDF of all the content you publish and archive that instead.

Hope that helps,

Regards,

Opkar

Avatar

Level 5

Opkar Gill wrote...

Hi Greg,

Are you using VM's? An option may be to keep monthly backups for n years. Once content has been deactivated delete content. If you are required to retrieve any content, then you would go back to the backup that was taken just after the date in question and start the instance and view the content.

You could also take repository backups, but the problem with this is that if you were required to keep content for say 7 years, can be a legal requirement for Financial Services clients in the UK, then the OS and Java version may be very different from that used with a backup taken nearly 7 years ago, hence the recommendation to archive the OS as well as the repository.

Another option might be to create a PDF of all the content you publish and archive that instead.

Hope that helps,

Regards,

Opkar

 

No, we're not using VMs. Right now we've got some workflow automation software that essentially runs a cold backup daily (off business hours) of the Author server's /repository directory (I believe). We had previously been doing UNIX backups through Tivoli but we were having some issues with consistency, with the tar files being written to during backup, causing the entire job to fail sometimes.

Like you said, it hasn't been a problem yet because we've been on 5.5/5.6.1 for 2 years or so now, but we're going to have some issues next year when we finish upgrading to AEM 6.2, essentially making our older backups useless without keeping at least one server around on 5.6.1 for backup restoration purposes.

As for the PDF route, that's like the worst case scenario, I would say. Unless we could automate that somehow to print the contents of a page (essentially a snapshot) and store the file as a PDF in the repository for download or to then be emailed out to the content owners.

Thanks for your input, Opkar! I really appreciate it.
- Greg

Avatar

Correct answer by
Employee Advisor

Hi Greg,

I wouldn't consider the PDF way the worst case scenario. It's a bigger investment in the first place, of course. But if you snapshot every page only upon change, the amount of PDFs you create on each day shouldn't be problematic.

But keeping daily backups for years will cause huge storage costs. If a "restore" is requested (normally just a page for a specific date) you need to restore that backup, spin up the instance and make the instance available again to the business. You need to maintain operating systems, java versions and many other things. Not to mention backend systems if your application relies on their presence. And have you checked, if you need to restore the data of these backend systems as well?

Creating PDFs doesn't sound elegant in an IT way, but you store just what you need: The page and it's content. And even business people and navigate through an archive and compare timestamps.

Jörg