Comment

Avatar

Employee Advisor

01-10-2020

Archival is a tough thing, also because it's so diverse.

 

First of all, from my point of view, archival is not backup. You archive things which are finished and closed, and you need to keep them around for mostly legal reasons. Just storing files on a fileshare is not an option, because everyone can read and write on them (you cannot check for integrity), you don't have an audit trail, and you can not delete these records on time (because you must not keep them longer than required).

For this requirement I always recommend PDFs and a dedicated archival solution. Because AEM is not an application for archival.

 

When you want to have "old" content available and searchable, I don't see any other chance than to retain it in an archive folder, remove write access for most/all users and try to reduce the number of versions for them (you don't need them). But be sure, that you understand the drawbacks of that:

  • no one can guarantee the integrity of these unpublished pages.
  • They create some overhead, as they consist of JCR nodes and that increases the size of some indexes
  • You will always have them with you, you cannot externalize it.
  • You always have to consider them when your application is evolving. That means you either consider them with every style and component change, or you find a different solution for it. What about the assets referenced in these pages? And what if the rendering components/scripts require certain OSGI services/components in a certain version?

Especially the last item can be a longterm burden your development velocity, the more old stuff you have the more you need to invest in it.

 

Some suggestions how you can improve it (of course everything is customization and not available ootb):

  • Think about you can reduce the overhead of these pages, and potentially even transform into something which can standalone and does not have any dependency to the application itself. For example a PDF. You should still be able to find all text in the fulltext index.
  • Then you could reconfigure the ootb indexes not to cover your archive area anymore, but instead feed all these data into an external search engine, and let the authors search the archive only there.

And there are probably a ton more possibilities.

 

The most important thing you should consider is the impact of an "in-repo" archive to your application velocity. As long as you maintain this old content as pages, you have to test it.

 

HTH,

Jörg