Hello,
Our legal department has asked that we archive all of the Intranet web pages regarding our company news & history as a PDF. What they want is a viewable version of what was published and believe it or not each week a person is opening each page and saving as PDF!!!
Is there a way to automate crawling the libraries and saving a PDF file within the CQ environment (author or publisher)?
At this point, I am still thinking a PDF is best and we could load these artifacts into AEM Assets and use OCR/AI type features to make any part of the asset findable. We could then publish within our Asset Share libraries and maintain the records along with other Corporate Archives. But I am open to other ideas/solutions.
FYI, we discussed the following but determined this isn't the best solution for a multi-billion dollar company - until we run out of options
Views
Replies
Total Likes
IN AEM 5.6 - there is no OOTB feature that would perform this use case. This may require a custom solution. If you need help doing this - you can reach out to the AEM consulting team too.
Thanks! Do you know of any other customer that has addressed the need to 'save off' web pages?
Views
Replies
Total Likes
Not that know of -- i will reach out to AEM people internally.
One way to proceed here is to look at this blog:
So you can read the HTML (as shown here) and use then a lib like PDFBOX to generate the PDF
Views
Replies
Total Likes