Jsoup in aem

AdobeID24

03-04-2020

I want to get all the url/href used in a aem page .....so i thought to access it through jsoup but it is giving me all loginform 

 

document =jsoup.connect("http://ocalhost:4502/content/we-retail/en-us.html")

 

it means not able to login inside the aem 

 

I tried to create session object that also didnt work .

 

Please suggest what can be the best approch to do that ?

 

I need all urls/href used in an aem page need to create report .

whether its external llink or internal page link if it is in  an aem page i need it in my java ,...need to use for reporting 

Accepted Solutions (1)

Accepted Solutions (1)

Nupur_Jain

MVP

16-07-2020

Hi @AdobeID24 

 

you can use java.net.HttpURLConnection to get inputstream of the page. Find the code snippet below:

 

 

        InputStream content = null;
        try {
            URL url = new URL("http://ocalhost:4502/content/we-retail/en-us.html");
            String encoding = Base64.getEncoder()
                    .encodeToString("admin:admin");
            HttpURLConnection connection = (HttpURLConnection) url.openConnection();
            connection.setRequestMethod("GET");
            connection.setDoOutput(true);
            connection.setRequestProperty("Authorization", "Basic " + encoding);
            if (connection.getResponseCode() == 200) {
                content = (InputStream) connection.getInputStream();
            } 
        } catch (Exception io) {
            LOGGER.error("IOException occured {}", io);
        }

 

 

Hope it helps!

Thanks,

Nupur

Answers (1)

Answers (1)

vanegi

Employee

16-07-2020

You can leverage this tool https://adobe-consulting-services.github.io/acs-aem-commons/features/report-builder/configuring.html to create such reports.

Or

 

Create a query on properties like "linkTo" and full text containing "<p><a href="/path">test</a></p>"
to fetch pages containing urls/links.