Expand my Community achievements bar.

Don’t miss the AEM Skill Exchange in SF on Nov 14—hear from industry leaders, learn best practices, and enhance your AEM strategy with practical tips.
SOLVED

Jsoup in aem

Avatar

Level 7

I want to get all the url/href used in a aem page .....so i thought to access it through jsoup but it is giving me all loginform 

 

document =jsoup.connect("http://ocalhost:4502/content/we-retail/en-us.html")

 

it means not able to login inside the aem 

 

I tried to create session object that also didnt work .

 

Please suggest what can be the best approch to do that ?

 

I need all urls/href used in an aem page need to create report .

whether its external llink or internal page link if it is in  an aem page i need it in my java ,...need to use for reporting 

1 Accepted Solution

Avatar

Correct answer by
Community Advisor

Hi @AdobeID24 

 

you can use java.net.HttpURLConnection to get inputstream of the page. Find the code snippet below:

 

 

        InputStream content = null;
        try {
            URL url = new URL("http://ocalhost:4502/content/we-retail/en-us.html");
            String encoding = Base64.getEncoder()
                    .encodeToString("admin:admin");
            HttpURLConnection connection = (HttpURLConnection) url.openConnection();
            connection.setRequestMethod("GET");
            connection.setDoOutput(true);
            connection.setRequestProperty("Authorization", "Basic " + encoding);
            if (connection.getResponseCode() == 200) {
                content = (InputStream) connection.getInputStream();
            } 
        } catch (Exception io) {
            LOGGER.error("IOException occured {}", io);
        }

 

 

Hope it helps!

Thanks,

Nupur

View solution in original post

2 Replies

Avatar

Employee

You can leverage this tool https://adobe-consulting-services.github.io/acs-aem-commons/features/report-builder/configuring.html to create such reports.

Or

 

Create a query on properties like "linkTo" and full text containing "<p><a href="/path">test</a></p>"
to fetch pages containing urls/links. 

Avatar

Correct answer by
Community Advisor

Hi @AdobeID24 

 

you can use java.net.HttpURLConnection to get inputstream of the page. Find the code snippet below:

 

 

        InputStream content = null;
        try {
            URL url = new URL("http://ocalhost:4502/content/we-retail/en-us.html");
            String encoding = Base64.getEncoder()
                    .encodeToString("admin:admin");
            HttpURLConnection connection = (HttpURLConnection) url.openConnection();
            connection.setRequestMethod("GET");
            connection.setDoOutput(true);
            connection.setRequestProperty("Authorization", "Basic " + encoding);
            if (connection.getResponseCode() == 200) {
                content = (InputStream) connection.getInputStream();
            } 
        } catch (Exception io) {
            LOGGER.error("IOException occured {}", io);
        }

 

 

Hope it helps!

Thanks,

Nupur