How to find broken links on a page that is behind authentication

Question

Hi All,We are coming up against an issue where some of our document links for one of our website that sits behind a user sign in process either appear blank or as 404. When finding these documents it's simply a matter of republishing them to fix them. But finding the problem docs is proving a challenge. the Link checker in AEM does not target these document links and I can't seem to find a way to crawl the live pages due to the authentication. Any suggestions for how we could crawl an authenticated website for broken links?

arunpatidar · Accepted Answer

you can check the references of the assets using AssetReference API using groovy or try below toolhttps://kiransg.com/2022/03/26/broken-asset-references-aem/

aanchal-sikka · Answer

Hello @roorue :

There can be multiple reasons for the discrepancy between author status and availability of content on publish/

1. The content was deployed via packages on publish.

2. The content was published/unpublished, but the queue was stuck. Someone cleared the queue, but the content wasn't published/unpublished again. Thus, the events didn't reach publish.

3. Earlier, when tree replication was used, it didn't use to set the Replication metadata properly. I guess it does that properly now.

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded