Your achievements

Level 1

0% to

Level 2

Tip /
Sign in

Sign in to Community

to gain points, level up, and earn exciting badges like the new
Bedrock Mission!

Learn more

View all

Sign in to view all badges

Adobe Summit 2023 [19th to 23rd March, Las Vegas and Virtual] | Complete AEM Session & Lab list

How to find broken links on a page that is behind authentication


Level 3

Hi All,

We are coming up against an issue where some of our document links for one of our website that sits behind a user sign in process either appear blank or as 404. 

When finding these documents it's simply a matter of republishing them to fix them. But finding the problem docs is proving a challenge. the Link checker in AEM does not target these document links and I can't seem to find a way to crawl the live pages due to the authentication. 

Any suggestions for how we could crawl an authenticated website for broken links?

3 Replies


Community Advisor

you can check the references of the assets using AssetReference API using groovy or try below tool 


Level 3

Thanks Arunpatidar,

I didn't realise I could run a report that way.. which is great. 

It's kind of left me with more questions however... as when I compare the report it gave me to the files in the author.. I find some in the author that say unpublished.. but the report has them as published and they are accessible on the live site.. I wonder if something that gone wrong in the author instance 😕 


Level 7

Hello @RooRue :


There can be multiple reasons for the discrepancy between author status and availability of content on publish/

1. The content was deployed via packages on publish.

2. The content was published/unpublished, but the queue was stuck. Someone cleared the queue, but the content wasn't published/unpublished again. Thus, the events didn't reach publish.

3. Earlier, when tree replication was used, it didn't use to set the Replication metadata properly. I guess it does that properly now.