Expand my Community achievements bar.

Don’t miss the AEM Skill Exchange in SF on Nov 14—hear from industry leaders, learn best practices, and enhance your AEM strategy with practical tips.
SOLVED

How to find broken links on a page that is behind authentication

Avatar

Level 4

Hi All,

We are coming up against an issue where some of our document links for one of our website that sits behind a user sign in process either appear blank or as 404. 

When finding these documents it's simply a matter of republishing them to fix them. But finding the problem docs is proving a challenge. the Link checker in AEM does not target these document links and I can't seem to find a way to crawl the live pages due to the authentication. 

Any suggestions for how we could crawl an authenticated website for broken links?

1 Accepted Solution

Avatar

Correct answer by
Community Advisor

you can check the references of the assets using AssetReference API using groovy or try below tool

https://kiransg.com/2022/03/26/broken-asset-references-aem/ 



Arun Patidar

View solution in original post

3 Replies

Avatar

Correct answer by
Community Advisor

you can check the references of the assets using AssetReference API using groovy or try below tool

https://kiransg.com/2022/03/26/broken-asset-references-aem/ 



Arun Patidar

Avatar

Level 4

Thanks Arunpatidar,

I didn't realise I could run a report that way.. which is great. 

It's kind of left me with more questions however... as when I compare the report it gave me to the files in the author.. I find some in the author that say unpublished.. but the report has them as published and they are accessible on the live site.. I wonder if something that gone wrong in the author instance  

Avatar

Community Advisor

Hello @RooRue :

 

There can be multiple reasons for the discrepancy between author status and availability of content on publish/

1. The content was deployed via packages on publish.

2. The content was published/unpublished, but the queue was stuck. Someone cleared the queue, but the content wasn't published/unpublished again. Thus, the events didn't reach publish.

3. Earlier, when tree replication was used, it didn't use to set the Replication metadata properly. I guess it does that properly now. 

 


Aanchal Sikka