Your achievements

Level 1

0% to

Level 2

Tip /
Sign in

Sign in to Community

to gain points, level up, and earn exciting badges like the new
BedrockMission!

Learn More

View all

Sign in to view all badges

Create a report of broken links

Avatar

Avatar
Ignite 5
Level 2
himasreep445197
Level 2

Likes

4 likes

Total Posts

12 posts

Correct Reply

0 solutions
Top badges earned
Ignite 5
Ignite 3
Ignite 1
Give Back 3
Give Back
View profile

Avatar
Ignite 5
Level 2
himasreep445197
Level 2

Likes

4 likes

Total Posts

12 posts

Correct Reply

0 solutions
Top badges earned
Ignite 5
Ignite 3
Ignite 1
Give Back 3
Give Back
View profile
himasreep445197
Level 2

14-08-2020

Hi,

I need to create a report on broken links inside the site, so please suggest me how to get the links from the page content and how to check whether the link is valid or invalid programatically.

 

Thanks in advance.

Replies

Avatar

Avatar
Establish
MVP
Ravi_Pampana
MVP

Likes

190 likes

Total Posts

239 posts

Correct Reply

74 solutions
Top badges earned
Establish
Contributor
Shape 1
Ignite 5
Ignite 3
View profile

Avatar
Establish
MVP
Ravi_Pampana
MVP

Likes

190 likes

Total Posts

239 posts

Correct Reply

74 solutions
Top badges earned
Establish
Contributor
Shape 1
Ignite 5
Ignite 3
View profile
Ravi_Pampana
MVP

14-08-2020

Hi,

 

Using Jsoup we can parse the html and get the links. Once the links are retrieved you can check whether the link is valid or not.

 

https://jsoup.org/cookbook/extracting-data/attributes-text-html

https://www.geeksforgeeks.org/check-if-url-is-valid-or-not-in-java/

 

Hope this helps!

 

Avatar

Avatar
Give Back 5
Employee
sunjot16
Employee

Likes

104 likes

Total Posts

164 posts

Correct Reply

50 solutions
Top badges earned
Give Back 5
Give Back 3
Give Back 25
Give Back 10
Give Back
View profile

Avatar
Give Back 5
Employee
sunjot16
Employee

Likes

104 likes

Total Posts

164 posts

Correct Reply

50 solutions
Top badges earned
Give Back 5
Give Back 3
Give Back 25
Give Back 10
Give Back
View profile
sunjot16
Employee

14-08-2020

You can write a groovy script that crawls over your /content/<site> looking for strings that start with /content. Then, use ResourceResolver to verify whether those paths exist.

 

The following links may be helpful:

a) Sample Groovy Script => https://gist.github.com/trekawek/72b3515a6641ca5f4b29

b) ResourceResolver API => https://helpx.adobe.com/experience-manager/6-4/sites/developing/using/reference-materials/javadoc/or...

c) Community Article => https://experienceleaguecommunities.adobe.com/t5/adobe-experience-manager/broken-link-scan/qaq-p/220...

 

I hope it helps. 🙂

Avatar

Avatar
Ignite 5
Level 2
himasreep445197
Level 2

Likes

4 likes

Total Posts

12 posts

Correct Reply

0 solutions
Top badges earned
Ignite 5
Ignite 3
Ignite 1
Give Back 3
Give Back
View profile

Avatar
Ignite 5
Level 2
himasreep445197
Level 2

Likes

4 likes

Total Posts

12 posts

Correct Reply

0 solutions
Top badges earned
Ignite 5
Ignite 3
Ignite 1
Give Back 3
Give Back
View profile
himasreep445197
Level 2

17-08-2020

Hi Ravi,

 

Thanks for your suggestion, actually I need to get the html content of an internal page in my servlet/service, so that I can get the href present in it. Can you help me in reading the content of a page in aem.

 

Thanks 

Himasree

Avatar

Avatar
Ignite 5
Level 2
himasreep445197
Level 2

Likes

4 likes

Total Posts

12 posts

Correct Reply

0 solutions
Top badges earned
Ignite 5
Ignite 3
Ignite 1
Give Back 3
Give Back
View profile

Avatar
Ignite 5
Level 2
himasreep445197
Level 2

Likes

4 likes

Total Posts

12 posts

Correct Reply

0 solutions
Top badges earned
Ignite 5
Ignite 3
Ignite 1
Give Back 3
Give Back
View profile
himasreep445197
Level 2

17-08-2020

Hi Sunjot,

I have no idea on groovy, my requirement is to be done in java using servlet/service.

Please suggest me a way to get the content of internal page and read the href's present in it and check whether those links are valid or not using Java.

 

Thanks

Himasree

Avatar

Avatar
Give Back 5
Employee
sunjot16
Employee

Likes

104 likes

Total Posts

164 posts

Correct Reply

50 solutions
Top badges earned
Give Back 5
Give Back 3
Give Back 25
Give Back 10
Give Back
View profile

Avatar
Give Back 5
Employee
sunjot16
Employee

Likes

104 likes

Total Posts

164 posts

Correct Reply

50 solutions
Top badges earned
Give Back 5
Give Back 3
Give Back 25
Give Back 10
Give Back
View profile
sunjot16
Employee

17-08-2020

Thank you for clarifying it. 🙂

 

You can use any HTML Parser library(eg: JSoup HTML Parser) to do that. Include that dependency in pom.xml file and then use it to read HTML content or even links on any internal page.

 

Sample Reference code can be found here:

https://mkyong.com/java/java-how-to-get-all-links-from-a-web-page/

 

You can include the similar code in your servlet to achieve your use case.

 

I hope it helps !! 🙂