Your achievements

Level 1

0% to

Level 2

Tip /
Sign in

Sign in to Community

to gain points, level up, and earn exciting badges like the new
Bedrock Mission!

Learn more

View all

Sign in to view all badges

SOLVED

Jsoup in aem

AdobeID24
Level 6
Level 6

I want to get all the url/href used in a aem page .....so i thought to access it through jsoup but it is giving me all loginform 

 

document =jsoup.connect("http://ocalhost:4502/content/we-retail/en-us.html")

 

it means not able to login inside the aem 

 

I tried to create session object that also didnt work .

 

Please suggest what can be the best approch to do that ?

 

I need all urls/href used in an aem page need to create report .

whether its external llink or internal page link if it is in  an aem page i need it in my java ,...need to use for reporting 

1 Accepted Solution
Nupur_Jain
Correct answer by
Community Advisor
Community Advisor

Hi @AdobeID24 

 

you can use java.net.HttpURLConnection to get inputstream of the page. Find the code snippet below:

 

 

        InputStream content = null;
        try {
            URL url = new URL("http://ocalhost:4502/content/we-retail/en-us.html");
            String encoding = Base64.getEncoder()
                    .encodeToString("admin:admin");
            HttpURLConnection connection = (HttpURLConnection) url.openConnection();
            connection.setRequestMethod("GET");
            connection.setDoOutput(true);
            connection.setRequestProperty("Authorization", "Basic " + encoding);
            if (connection.getResponseCode() == 200) {
                content = (InputStream) connection.getInputStream();
            } 
        } catch (Exception io) {
            LOGGER.error("IOException occured {}", io);
        }

 

 

Hope it helps!

Thanks,

Nupur

View solution in original post

2 Replies
vanegi
Employee
Employee

You can leverage this tool https://adobe-consulting-services.github.io/acs-aem-commons/features/report-builder/configuring.html to create such reports.

Or

 

Create a query on properties like "linkTo" and full text containing "<p><a href="/path">test</a></p>"
to fetch pages containing urls/links. 

Nupur_Jain
Correct answer by
Community Advisor
Community Advisor

Hi @AdobeID24 

 

you can use java.net.HttpURLConnection to get inputstream of the page. Find the code snippet below:

 

 

        InputStream content = null;
        try {
            URL url = new URL("http://ocalhost:4502/content/we-retail/en-us.html");
            String encoding = Base64.getEncoder()
                    .encodeToString("admin:admin");
            HttpURLConnection connection = (HttpURLConnection) url.openConnection();
            connection.setRequestMethod("GET");
            connection.setDoOutput(true);
            connection.setRequestProperty("Authorization", "Basic " + encoding);
            if (connection.getResponseCode() == 200) {
                content = (InputStream) connection.getInputStream();
            } 
        } catch (Exception io) {
            LOGGER.error("IOException occured {}", io);
        }

 

 

Hope it helps!

Thanks,

Nupur

View solution in original post