Jsoup in aem | Community
Skip to main content
AdobeID24
April 3, 2020
Solved

Jsoup in aem

  • April 3, 2020
  • 2 replies
  • 2349 views

I want to get all the url/href used in a aem page .....so i thought to access it through jsoup but it is giving me all loginform 

 

document =jsoup.connect("http://ocalhost:4502/content/we-retail/en-us.html")

 

it means not able to login inside the aem 

 

I tried to create session object that also didnt work .

 

Please suggest what can be the best approch to do that ?

 

I need all urls/href used in an aem page need to create report .

whether its external llink or internal page link if it is in  an aem page i need it in my java ,...need to use for reporting 

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.
Best answer by Nupur_Jain

Hi @adobeid24 

 

you can use java.net.HttpURLConnection to get inputstream of the page. Find the code snippet below:

 

 

InputStream content = null; try { URL url = new URL("http://ocalhost:4502/content/we-retail/en-us.html"); String encoding = Base64.getEncoder() .encodeToString("admin:admin"); HttpURLConnection connection = (HttpURLConnection) url.openConnection(); connection.setRequestMethod("GET"); connection.setDoOutput(true); connection.setRequestProperty("Authorization", "Basic " + encoding); if (connection.getResponseCode() == 200) { content = (InputStream) connection.getInputStream(); } } catch (Exception io) { LOGGER.error("IOException occured {}", io); }

 

 

Hope it helps!

Thanks,

Nupur

2 replies

vanegi
Adobe Employee
Adobe Employee
July 16, 2020

You can leverage this tool https://adobe-consulting-services.github.io/acs-aem-commons/features/report-builder/configuring.html to create such reports.

Or

 

Create a query on properties like "linkTo" and full text containing "<p><a href="/path">test</a></p>"
to fetch pages containing urls/links. 

Nupur_Jain
Adobe Employee
Nupur_JainAdobe EmployeeAccepted solution
Adobe Employee
July 16, 2020

Hi @adobeid24 

 

you can use java.net.HttpURLConnection to get inputstream of the page. Find the code snippet below:

 

 

InputStream content = null; try { URL url = new URL("http://ocalhost:4502/content/we-retail/en-us.html"); String encoding = Base64.getEncoder() .encodeToString("admin:admin"); HttpURLConnection connection = (HttpURLConnection) url.openConnection(); connection.setRequestMethod("GET"); connection.setDoOutput(true); connection.setRequestProperty("Authorization", "Basic " + encoding); if (connection.getResponseCode() == 200) { content = (InputStream) connection.getInputStream(); } } catch (Exception io) { LOGGER.error("IOException occured {}", io); }

 

 

Hope it helps!

Thanks,

Nupur