Expand my Community achievements bar.

Learn about Edge Delivery Services in upcoming GEM session
SOLVED

How to find broken links on a page that is behind authentication

Avatar

Level 4

Hi All,

We are coming up against an issue where some of our document links for one of our website that sits behind a user sign in process either appear blank or as 404. 

When finding these documents it's simply a matter of republishing them to fix them. But finding the problem docs is proving a challenge. the Link checker in AEM does not target these document links and I can't seem to find a way to crawl the live pages due to the authentication. 

Any suggestions for how we could crawl an authenticated website for broken links?

1 Accepted Solution

Avatar

Correct answer by
Community Advisor

you can check the references of the assets using AssetReference API using groovy or try below tool

https://kiransg.com/2022/03/26/broken-asset-references-aem/ 



Arun Patidar

View solution in original post

3 Replies

Avatar

Correct answer by
Community Advisor

you can check the references of the assets using AssetReference API using groovy or try below tool

https://kiransg.com/2022/03/26/broken-asset-references-aem/ 



Arun Patidar

Avatar

Level 4

Thanks Arunpatidar,

I didn't realise I could run a report that way.. which is great. 

It's kind of left me with more questions however... as when I compare the report it gave me to the files in the author.. I find some in the author that say unpublished.. but the report has them as published and they are accessible on the live site.. I wonder if something that gone wrong in the author instance  

Avatar

Community Advisor

Hello @RooRue :

 

There can be multiple reasons for the discrepancy between author status and availability of content on publish/

1. The content was deployed via packages on publish.

2. The content was published/unpublished, but the queue was stuck. Someone cleared the queue, but the content wasn't published/unpublished again. Thus, the events didn't reach publish.

3. Earlier, when tree replication was used, it didn't use to set the Replication metadata properly. I guess it does that properly now. 

 


Aanchal Sikka