Your achievements

Level 1

0% to

Level 2

Tip /
Sign in

Sign in to Community

to gain points, level up, and earn exciting badges like the new
Bedrock Mission!

Learn more

View all

Sign in to view all badges

CQ 5.6 - Saving Published Web Pages?


Level 3


Our legal department has asked that we archive all of the Intranet web pages regarding our company news & history as a PDF.  What they want is a viewable version of what was published and believe it or not each week a person is opening each page and saving as PDF!!! 

Is there a way to automate crawling the libraries and saving a PDF file within the CQ environment (author or publisher)?  

At this point, I am still thinking a PDF is best and we could load these artifacts into AEM Assets and use OCR/AI type features to make any part of the asset findable.  We could then publish within our Asset Share libraries and maintain the records along with other Corporate Archives.  But I am open to other ideas/solutions.

FYI, we discussed the following but determined this isn't the best solution for a multi-billion dollar company - until we run out of options

  1. Setup a separate instance of AEM, copy the content and use as Archive - but would require additional hardware & maintenance (too scrappy)
  2. Use and open source tool like Heritex to crawl the published pages (too risky for large corp enterprise, Infosec would have to approve)
  3. Create a new PRINT template that includes all of the images, text and comments on the page to facilitate the PDF creation (but doesn't solve for the volume of pages)
4 Replies


Level 10

IN AEM 5.6 - there is no OOTB feature that would perform this use case. This may require a custom solution. If you need help doing this - you can reach out to the AEM consulting team too.


Level 3

Thanks!   Do you know of any other customer that has addressed the need to 'save off' web pages?


Level 10

One way to proceed here is to look at this blog:

Get the rendered HTML for an AEM resource, component or page - Adobe Experience Manager | AEM/CQ | A...

So you can read the HTML (as shown here) and use then a lib like PDFBOX to generate the PDF