Expand my Community achievements bar.

Don’t miss the AEM Skill Exchange in SF on Nov 14—hear from industry leaders, learn best practices, and enhance your AEM strategy with practical tips.

Create a report of broken links

Avatar

Level 2

Hi,

I need to create a report on broken links inside the site, so please suggest me how to get the links from the page content and how to check whether the link is valid or invalid programatically.

 

Thanks in advance.

5 Replies

Avatar

Community Advisor

Hi,

 

Using Jsoup we can parse the html and get the links. Once the links are retrieved you can check whether the link is valid or not.

 

https://jsoup.org/cookbook/extracting-data/attributes-text-html

https://www.geeksforgeeks.org/check-if-url-is-valid-or-not-in-java/

 

Hope this helps!

 

Avatar

Level 2

Hi Ravi,

 

Thanks for your suggestion, actually I need to get the html content of an internal page in my servlet/service, so that I can get the href present in it. Can you help me in reading the content of a page in aem.

 

Thanks 

Himasree

Avatar

Employee

You can write a groovy script that crawls over your /content/<site> looking for strings that start with /content. Then, use ResourceResolver to verify whether those paths exist.

 

The following links may be helpful:

a) Sample Groovy Script => https://gist.github.com/trekawek/72b3515a6641ca5f4b29

b) ResourceResolver API => https://helpx.adobe.com/experience-manager/6-4/sites/developing/using/reference-materials/javadoc/or...

c) Community Article => https://experienceleaguecommunities.adobe.com/t5/adobe-experience-manager/broken-link-scan/qaq-p/220...

 

I hope it helps.

Avatar

Level 2

Hi Sunjot,

I have no idea on groovy, my requirement is to be done in java using servlet/service.

Please suggest me a way to get the content of internal page and read the href's present in it and check whether those links are valid or not using Java.

 

Thanks

Himasree

Avatar

Employee

Thank you for clarifying it.

 

You can use any HTML Parser library(eg: JSoup HTML Parser) to do that. Include that dependency in pom.xml file and then use it to read HTML content or even links on any internal page.

 

Sample Reference code can be found here:

https://mkyong.com/java/java-how-to-get-all-links-from-a-web-page/

 

You can include the similar code in your servlet to achieve your use case.

 

I hope it helps !!