We are coming up against an issue where some of our document links for one of our website that sits behind a user sign in process either appear blank or as 404.
When finding these documents it's simply a matter of republishing them to fix them. But finding the problem docs is proving a challenge. the Link checker in AEM does not target these document links and I can't seem to find a way to crawl the live pages due to the authentication.
Any suggestions for how we could crawl an authenticated website for broken links?
Solved! Go to Solution.
I didn't realise I could run a report that way.. which is great.
It's kind of left me with more questions however... as when I compare the report it gave me to the files in the author.. I find some in the author that say unpublished.. but the report has them as published and they are accessible on the live site.. I wonder if something that gone wrong in the author instance
Hello @RooRue :
There can be multiple reasons for the discrepancy between author status and availability of content on publish/
1. The content was deployed via packages on publish.
2. The content was published/unpublished, but the queue was stuck. Someone cleared the queue, but the content wasn't published/unpublished again. Thus, the events didn't reach publish.
3. Earlier, when tree replication was used, it didn't use to set the Replication metadata properly. I guess it does that properly now.