Create a report of broken links | Community
Skip to main content
Level 2
August 14, 2020

Create a report of broken links

  • August 14, 2020
  • 2 replies
  • 2369 views

Hi,

I need to create a report on broken links inside the site, so please suggest me how to get the links from the page content and how to check whether the link is valid or invalid programatically.

 

Thanks in advance.

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.

2 replies

Ravi_Pampana
Community Advisor
Community Advisor
August 14, 2020

Hi,

 

Using Jsoup we can parse the html and get the links. Once the links are retrieved you can check whether the link is valid or not.

 

https://jsoup.org/cookbook/extracting-data/attributes-text-html

https://www.geeksforgeeks.org/check-if-url-is-valid-or-not-in-java/

 

Hope this helps!

 

Level 2
August 17, 2020

Hi Ravi,

 

Thanks for your suggestion, actually I need to get the html content of an internal page in my servlet/service, so that I can get the href present in it. Can you help me in reading the content of a page in aem.

 

Thanks 

Himasree

sunjot16
Adobe Employee
Adobe Employee
August 14, 2020

You can write a groovy script that crawls over your /content/<site> looking for strings that start with /content. Then, use ResourceResolver to verify whether those paths exist.

 

The following links may be helpful:

a) Sample Groovy Script => https://gist.github.com/trekawek/72b3515a6641ca5f4b29

b) ResourceResolver API => https://helpx.adobe.com/experience-manager/6-4/sites/developing/using/reference-materials/javadoc/org/apache/sling/api/resource/ResourceResolver.html

c) Community Article => https://experienceleaguecommunities.adobe.com/t5/adobe-experience-manager/broken-link-scan/qaq-p/220892

 

I hope it helps. 🙂

Level 2
August 17, 2020

Hi Sunjot,

I have no idea on groovy, my requirement is to be done in java using servlet/service.

Please suggest me a way to get the content of internal page and read the href's present in it and check whether those links are valid or not using Java.

 

Thanks

Himasree

sunjot16
Adobe Employee
Adobe Employee
August 17, 2020

Thank you for clarifying it. 🙂

 

You can use any HTML Parser library(eg: JSoup HTML Parser) to do that. Include that dependency in pom.xml file and then use it to read HTML content or even links on any internal page.

 

Sample Reference code can be found here:

https://mkyong.com/java/java-how-to-get-all-links-from-a-web-page/

 

You can include the similar code in your servlet to achieve your use case.

 

I hope it helps !! 🙂