Expand my Community achievements bar.

Join us in celebrating the outstanding achievement of our AEM Community Member of the Year!
SOLVED

Servlet to iterate over several AEM pages and get html

Avatar

Level 5

I'm looking to create a servlet in AEM that will loop over several pages created in AEM and get the html, ideally for the paragraph system component on the page but if that is not possible then the entire page will do, I know if I write this in a jsp I can do it with sling.include("/content/foo") but how would I do this in my backend servlet, I know I could just do a http request but I feel there must be a quicker way to do it? If it helps we are on AEM 5.6.

1 Accepted Solution

Avatar

Correct answer by
Level 5

I have solved this by creating a new workflow and launcher, the launcher sits on the publisher and monitors for any changes to the nodes I am interested in, when there are eg an author publishes then it triggers a workflow that will call the page that has been modified via a normal http call but only the column control component eg /content/apage/parsys/colcomponent.html it then sets the result against the node as a property and reverse replicates. Then when i run my servlet I can simply iterate over the pages and grab the HTML from the node property.  

I really think there should be a simpler way to "render" a component or page in the backend java!

View solution in original post

9 Replies

Avatar

Administrator

Hi

You can write a custom OSGI component, which can read a JCR node value. So, same way you can read read HTML nodes.

Create custom components :- https://helpx.adobe.com/experience-manager/using/creating-custom-cq-component-uses.html

 

Reading JCR nodes, please have a look at any of these options:-

Link:- https://helpx.adobe.com/experience-manager/using/jqom.html

Link:- https://helpx.adobe.com/experience-manager/using/using-query-builder-api1.html (Query Builder)

Link:- https://helpx.adobe.com/experience-manager/using/querying-experience-manager-data-using1.html (JCR API)

 

I hope this would help you.

Thanks and Regards

Kautuk Sahni



Kautuk Sahni

Avatar

Level 9

@Sutty100,

I don't think we have any better way than Sing to get the HTML content on the page.. Even, Http would be hard because you need to perform authentication to execute any httpRequest from the backend.

-- Jitendra

Avatar

Level 5

I'm not sure that is exactly what I am after. To clarify I am creating a servlet that will create a JSON object that contains several AEM pages url and the HTML that would be rendered out on them, ideally I want to have just the html would be rendered out by components contained in a paragraph system. So an author has already created these pages and then somebody hits my servlet, it will iterate over all those pages in a specific directory and get the urls and the html that those pages would render out if they were hit directly.

Avatar

Level 5

That's a shame, I will have to look at doing this from the .jsp using sling.include which feels messy and I would rather do this from the backend! 

Avatar

Level 2

hi,
have you looked into using standard servlet request including?

request.getRequestDispatcher("/ConfirmationServlet").include(request, response);

You would have to create a ResponseWrapper which records all html output on that call rather than writing it to the actual response.

In your servlet, you can then get the html of each forward call and assemble the JSON. If you call the parsys resource directly, you should be able to get only the html you are interested in.

I have not done this with multiple forwards in one request, don't know if that can actually work.

hope that helps!

Avatar

Level 10

As Kautuk pointed out  - the way to perform this use case is to write a Servlet and use JCR API and walk down the content path. For example: /content/geometrixx/en/services/jcr:content/par/text_6 represents text in the services page. You would write code to walk through nodes like:

HTML code that is located in the services page is stored in a collection of nodes such as: 

services/jcr:content/par/text_6

It has a property named text that stores:

<p><span class="large">Geometrixx is committed to providing high-quality services for all your geometry related needs. From banking to training to consulting, we can help prepare you for success.</span></p>

Read the text and dynamically add to JSON. This is how you would perform your use case. 

Avatar

Level 10

Hi Sutty,

As @scott and @kautuk mentioned, you can parse through the nodes using JCR API and get to the individual content/html at the parsys level or you can also use .json of the page and get all the content of the page and parse the json to get the html content.

However, what is your usecase for this ??

Avatar

Level 5

Thanks, my use case is for an E-commerce site. We will have a drop down menu and as you drill through the menu content will be shown in the menu. The category information comes from our E-commerce platform but for the content the plan is to allow content authors to manage that content in AEM eg create a page per category they want to customize and have a set of defaults to fall back to if they don't set up a customization. For performance what I wanted to do was on the first visit to the site to make a call to AEM to get a json representation of all of this content and then cache it in the users browser in the session storage. The javaScript is then responsible for parsing the json as the user clicks through the categories. The menu appears on both AEM pages and pages outside of AEM that are rendered by our E-commerce platform. The JSON might look like this: 

 

{
  "categories": [
    "category": {
      "id": "default",
      "content": "<p>hello</p><div><img src='bla'/></div>"
    },
    "category": {
    "id": "001",
    "content": "<p>Really cool content</p>"
    }
    "category": {
    "id": "002",
    "content": "<div>...maybe a video or something</div>"
    }
  ]
}

Avatar

Correct answer by
Level 5

I have solved this by creating a new workflow and launcher, the launcher sits on the publisher and monitors for any changes to the nodes I am interested in, when there are eg an author publishes then it triggers a workflow that will call the page that has been modified via a normal http call but only the column control component eg /content/apage/parsys/colcomponent.html it then sets the result against the node as a property and reverse replicates. Then when i run my servlet I can simply iterate over the pages and grab the HTML from the node property.  

I really think there should be a simpler way to "render" a component or page in the backend java!