How to find broken links on a page that is behind authentication | Community
Skip to main content
Level 3
March 15, 2023
Solved

How to find broken links on a page that is behind authentication

  • March 15, 2023
  • 2 replies
  • 1060 views

Hi All,

We are coming up against an issue where some of our document links for one of our website that sits behind a user sign in process either appear blank or as 404. 

When finding these documents it's simply a matter of republishing them to fix them. But finding the problem docs is proving a challenge. the Link checker in AEM does not target these document links and I can't seem to find a way to crawl the live pages due to the authentication. 

Any suggestions for how we could crawl an authenticated website for broken links?

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.
Best answer by arunpatidar

you can check the references of the assets using AssetReference API using groovy or try below tool

https://kiransg.com/2022/03/26/broken-asset-references-aem/ 

2 replies

arunpatidar
Community Advisor
arunpatidarCommunity AdvisorAccepted solution
Community Advisor
March 15, 2023

you can check the references of the assets using AssetReference API using groovy or try below tool

https://kiransg.com/2022/03/26/broken-asset-references-aem/ 

Arun Patidar
RooRueAuthor
Level 3
March 15, 2023

Thanks Arunpatidar,

I didn't realise I could run a report that way.. which is great. 

It's kind of left me with more questions however... as when I compare the report it gave me to the files in the author.. I find some in the author that say unpublished.. but the report has them as published and they are accessible on the live site.. I wonder if something that gone wrong in the author instance 😕😕 

aanchal-sikka
Community Advisor
Community Advisor
March 17, 2023

Hello @roorue :

 

There can be multiple reasons for the discrepancy between author status and availability of content on publish/

1. The content was deployed via packages on publish.

2. The content was published/unpublished, but the queue was stuck. Someone cleared the queue, but the content wasn't published/unpublished again. Thus, the events didn't reach publish.

3. Earlier, when tree replication was used, it didn't use to set the Replication metadata properly. I guess it does that properly now. 

 

Aanchal Sikka