Highlighted

Create a report of broken links

himasreep445197

14-08-2020

Hi,

I need to create a report on broken links inside the site, so please suggest me how to get the links from the page content and how to check whether the link is valid or invalid programatically.

 

Thanks in advance.

Replies

Highlighted

sunjot16

Employee

14-08-2020

You can write a groovy script that crawls over your /content/<site> looking for strings that start with /content. Then, use ResourceResolver to verify whether those paths exist.

 

The following links may be helpful:

a) Sample Groovy Script => https://gist.github.com/trekawek/72b3515a6641ca5f4b29

b) ResourceResolver API => https://helpx.adobe.com/experience-manager/6-4/sites/developing/using/reference-materials/javadoc/or...

c) Community Article => https://experienceleaguecommunities.adobe.com/t5/adobe-experience-manager/broken-link-scan/qaq-p/220...

 

I hope it helps. 🙂

Highlighted

himasreep445197

17-08-2020

Hi Ravi,

 

Thanks for your suggestion, actually I need to get the html content of an internal page in my servlet/service, so that I can get the href present in it. Can you help me in reading the content of a page in aem.

 

Thanks 

Himasree

Highlighted

himasreep445197

17-08-2020

Hi Sunjot,

I have no idea on groovy, my requirement is to be done in java using servlet/service.

Please suggest me a way to get the content of internal page and read the href's present in it and check whether those links are valid or not using Java.

 

Thanks

Himasree

Highlighted

sunjot16

Employee

17-08-2020

Thank you for clarifying it. 🙂

 

You can use any HTML Parser library(eg: JSoup HTML Parser) to do that. Include that dependency in pom.xml file and then use it to read HTML content or even links on any internal page.

 

Sample Reference code can be found here:

https://mkyong.com/java/java-how-to-get-all-links-from-a-web-page/

 

You can include the similar code in your servlet to achieve your use case.

 

I hope it helps !! 🙂