HTML/JSON extraction (in bulk) option | Community
Skip to main content
Level 3
October 9, 2023
Solved

HTML/JSON extraction (in bulk) option

  • October 9, 2023
  • 2 replies
  • 572 views

Hello Community members, 

 

We have a requirement where the end application owner only requires HTML/JSON extraction for the all available site pages(in bulk, based on modified date) from AEM.

We are looking to explore any built-in options or solutions that would allow us to provide this extraction. 

 

thank you in advance for you time, your insights and suggestions on how to achieve this would be greatly appreciated. 

 

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.
Best answer by arunpatidar

Hi @nj2 
You can achieve this OOTB using

1. Create an servlet which will expose the json that includes only the page paths.

2. Retrieve json of the page, one by one for the 1st calls response pages. this could be done by model.json selector.

 

I would not suggest the bulk operations here, because of :

1. Creating very high load on publish

2. 504 Gateway errors due to timeout, Unexpected results.

3. caching can't be achieve at the dispatcher/cdn/consumer side.

 

2 replies

arunpatidar
Community Advisor
arunpatidarCommunity AdvisorAccepted solution
Community Advisor
October 10, 2023

Hi @nj2 
You can achieve this OOTB using

1. Create an servlet which will expose the json that includes only the page paths.

2. Retrieve json of the page, one by one for the 1st calls response pages. this could be done by model.json selector.

 

I would not suggest the bulk operations here, because of :

1. Creating very high load on publish

2. 504 Gateway errors due to timeout, Unexpected results.

3. caching can't be achieve at the dispatcher/cdn/consumer side.

 

Arun Patidar
Nitin_laad
Community Advisor
Community Advisor
October 11, 2023

Hey, 

If you're looking for a unconventional approach:

  • Explore the Dispatcher cache.
    • Obtain cached HTML files that can be bundled.
    • Note: If you have multiple web servers, you'll need to gather and package HTML from all of them.

Note: Raise a ticket and check with Adobe in case of Cloud environment

  • Consider using the List core component.
    • Configure the root path and child depth levels.
    • Generate a list of all available page URLs.
    • Invoke the retrieval of HTML for each page.
  • List Component | Adobe Experience Manager