CQ 5.6 - Saving Published Web Pages?

Avatar

Avatar

Gl369

Avatar

Gl369

Gl369

08-12-2017

Hello,

Our legal department has asked that we archive all of the Intranet web pages regarding our company news & history as a PDF.  What they want is a viewable version of what was published and believe it or not each week a person is opening each page and saving as PDF!!! 

Is there a way to automate crawling the libraries and saving a PDF file within the CQ environment (author or publisher)?  


At this point, I am still thinking a PDF is best and we could load these artifacts into AEM Assets and use OCR/AI type features to make any part of the asset findable.  We could then publish within our Asset Share libraries and maintain the records along with other Corporate Archives.  But I am open to other ideas/solutions.

FYI, we discussed the following but determined this isn't the best solution for a multi-billion dollar company - until we run out of options

  1. Setup a separate instance of AEM, copy the content and use as Archive - but would require additional hardware & maintenance (too scrappy)
  2. Use and open source tool like Heritex to crawl the published pages (too risky for large corp enterprise, Infosec would have to approve)
  3. Create a new PRINT template that includes all of the images, text and comments on the page to facilitate the PDF creation (but doesn't solve for the volume of pages)

Accepted Solutions (0)

Answers (4)

Answers (4)

Avatar

Avatar

smacdonald2008

Total Posts

12.7K

Likes

1.4K

Correct Reply

2.3K

Avatar

smacdonald2008

Total Posts

12.7K

Likes

1.4K

Correct Reply

2.3K
smacdonald2008

08-12-2017

Not that know of -- i will reach out to AEM people internally.

Avatar

Avatar

smacdonald2008

Total Posts

12.7K

Likes

1.4K

Correct Reply

2.3K

Avatar

smacdonald2008

Total Posts

12.7K

Likes

1.4K

Correct Reply

2.3K
smacdonald2008

08-12-2017

IN AEM 5.6 - there is no OOTB feature that would perform this use case. This may require a custom solution. If you need help doing this - you can reach out to the AEM consulting team too.

Avatar

Avatar

smacdonald2008

Total Posts

12.7K

Likes

1.4K

Correct Reply

2.3K

Avatar

smacdonald2008

Total Posts

12.7K

Likes

1.4K

Correct Reply

2.3K
smacdonald2008

08-12-2017

One way to proceed here is to look at this blog:

Get the rendered HTML for an AEM resource, component or page - Adobe Experience Manager | AEM/CQ | A...

So you can read the HTML (as shown here) and use then a lib like PDFBOX to generate the PDF

Avatar

Avatar

Gl369

Avatar

Gl369

Gl369

08-12-2017

Thanks!   Do you know of any other customer that has addressed the need to 'save off' web pages?