Your achievements

Level 1

0% to

Level 2

Tip /
Sign in

Sign in to Community

to gain points, level up, and earn exciting badges like the new
BedrockMission!

Learn more

View all

Sign in to view all badges

SOLVED

Access or export old versions of sites content to file

clintg6
Level 1
Level 1

Hello AEM team,

 

I have a collection of policies (thousands) published through AEM. Each of these policies has as many as 10 older versions of the current one.

I am in need of extracting the older versions text content due to litigation. Do you know of a way to export the content from older versions as text or HTML or PDF files?

1 Accepted Solution
Vaibhavi
Correct answer by
Community Advisor
Community Advisor

Hi @clintg6 , 

As you have more than 1000 policies, do not suggest manual extraction. 

You can extract with a simple custom solution.

  • Fetch the older version of nodes using path and jcr:created identifier.
  • Once you get the list of node paths, appending path with .infinity.json should extract the content. /content/nodeName/jcr_content.infinity.json
  • Copy the required content to any text document. 

View solution in original post

5 Replies
ibishika
Level 4
Level 4

Although it will depend on what you want to do with the extracted content as html, but you can get the content as xml files by packaging up from the package manager or by pulling them into your projects content folder using some IDE plugin and then convert the extracted xml files to html.

Vaibhavi
Correct answer by
Community Advisor
Community Advisor

Hi @clintg6 , 

As you have more than 1000 policies, do not suggest manual extraction. 

You can extract with a simple custom solution.

  • Fetch the older version of nodes using path and jcr:created identifier.
  • Once you get the list of node paths, appending path with .infinity.json should extract the content. /content/nodeName/jcr_content.infinity.json
  • Copy the required content to any text document. 

View solution in original post

clintg6
Level 1
Level 1
Thanks for the help Vaibhavi. What if I also wanted to save the old version as an HTML or PDF file. How would I do that in addition to the text extraction?
Vaibhavi
Community Advisor
Community Advisor

For html first step remains as it is. 

For step 2 instead of json, make a html call to node using httpservletRequest. Here you will get the content in html formate. So copy the content to html file. 

  HttpServletRequest request = requestResponseFactory.createRequest("GET", "/path/to/your/node.html") ;
 ByteArrayOutputStream out = new ByteArrayOutputStream();
    HttpServletResponse response = requestResponseFactory.createResponse(out);
    requestProcessor.processRequest(request, response, resourceResolver);        
     out.toString(response.getCharacterEncoding());

 Similar way for pdf, 

Once you get the json data(mentioned in above steps) , you can use the pdf API to create the pdf file and copy the required content 

clintg6
Level 1
Level 1
To confirm this would include unpublished versions as well?