Hi All,
We are coming up against an issue where some of our document links for one of our website that sits behind a user sign in process either appear blank or as 404.
When finding these documents it's simply a matter of republishing them to fix them. But finding the problem docs is proving a challenge. the Link checker in AEM does not target these document links and I can't seem to find a way to crawl the live pages due to the authentication.
Any suggestions for how we could crawl an authenticated website for broken links?
Solved! Go to Solution.
Views
Replies
Total Likes
you can check the references of the assets using AssetReference API using groovy or try below tool
https://kiransg.com/2022/03/26/broken-asset-references-aem/
you can check the references of the assets using AssetReference API using groovy or try below tool
https://kiransg.com/2022/03/26/broken-asset-references-aem/
Thanks Arunpatidar,
I didn't realise I could run a report that way.. which is great.
It's kind of left me with more questions however... as when I compare the report it gave me to the files in the author.. I find some in the author that say unpublished.. but the report has them as published and they are accessible on the live site.. I wonder if something that gone wrong in the author instance
Hello @RooRue :
There can be multiple reasons for the discrepancy between author status and availability of content on publish/
1. The content was deployed via packages on publish.
2. The content was published/unpublished, but the queue was stuck. Someone cleared the queue, but the content wasn't published/unpublished again. Thus, the events didn't reach publish.
3. Earlier, when tree replication was used, it didn't use to set the Replication metadata properly. I guess it does that properly now.