Recently we re-organized a bunch of content on our website. I wasn't involved with the content re-org, but an author noticed there are hundreds of pages on our website with broken links in author mode, and the links are no longer links on the published site.
When I looked at the a page with the problem, I see the broken link annotation around the link in question.
When I look at the page source, I notice the link is:
<a href="https://forums.adobe.com/content/xyz/patient-care/public-health/immunizations1/schedules.html">
but it should be:
<a href="https://forums.adobe.com/content/xyz/patient-care/public-health/immunizations/schedules.html">
It looks like an immunizations1 folder/page was created at some point in time and all of the links got updated to this, which was subsequently renamed or deleted. Is there a way to search and replace in author or in CQDE (AEM Developer Environment)?
Solved! Go to Solution.
Hi
Then in that case, you can do the following:-
Consider using Groovy console to crawl over the /content/your_site looking for strings starting with /content.
Then use resourceResolver to check if the found path exists. Sample script implementing this algorithm can be found here.
Link:- https://github.com/Citytechinc/cq-groovy-console [Groovy Tool]
Link:- https://gist.github.com/trekawek/72b3515a6641ca5f4b29 [Groovy Script]
// BrokenLinks
import javax.jcr.* | |
import org.apache.sling.api.resource.* | |
def ROOT_PATH = '/content/geometrixx' | |
def extractPaths(p) { | |
if (p instanceof Property && p.multiple) { | |
p.values.collect { extractPaths(it) }.flatten() | |
} else { | |
p.string.findAll(/\/content\/[^"]+/) | |
} | |
} | |
getNode(ROOT_PATH).recurse { node -> | |
node.properties.findAll {it.type == PropertyType.STRING}.each { | |
paths = extractPaths(it).findAll { resourceResolver.resolve(it) instanceof NonExistingResource } | |
if (!paths.empty) { | |
println "Path: ${node.path}" | |
println "Broken: ${paths}\n" | |
} | |
} | |
} | |
true |
Other option is writing own service to achieve the needful/
I hope this would be helpful to you.
Thanks and Regards
Kautuk Sahni
Hi,
Not pretty sure, but just a thought.
Probably it would not be possible to search for pages with internal links set to "immunizations1", unless this value is appearing in any of the jcr property.
May be we have a rough idea as to during which time frame the modifications have happened and check for all the pages modified during that timeframe. This might help identify the list of pages .
Hi
Please have a look at "The External Link Checker" tool:
Link:- https://docs.adobe.com/docs/en/aem/6-1/administer/operations/external-link-checker.html [AEM 6.1]
//
To use the external link checker:
Open the Tools console.
Double-click on External Link Checker (either the right or left pane). A list of all external links is generated.
Validate a specific link by selecting it in the list, then clicking Check:
Information such as:
is displayed.
On the individual content pages invalid links will be shown as broken:
Another Reference Article :- http://aemexperience.blogspot.in/2015/07/aem-link-checker-fixing-broken-links.html
I hope this would be helpful.
Thanks and Regards
Kautuk Sahni
Views
Replies
Total Likes
Hi Kautuk,
Thanks for the reply. Did not know that such a feature existed.
Views
Replies
Total Likes
This doesn't solve the problem. Please re-read the question.
The list generated by the External Link Checker is fairly short and doesn't include links to any of the web pages with bad links in the text, nor does it include the broken links that I mentioned, which are internal links in the web page text.
Example page on our published site, the link is gone from the published link, the last Immunization Schedules link at the very end of the page: http://bit.ly/1T70Fdt
The broken link in AEM author is at the bottom of the page. Screen shot below.
This link is broken across hundreds of pages as noted in my original post. I wanted to quickly find and replace all instances of /content/xyz/patient-care/public-health/immunizations1/schedules.html with /content/xyz/patient-care/public-health/immunizations/schedules.html, which would solve this, without requiring the author to go back and edit each page by hand.
I'll review the other response to the question to see if it helps.
Views
Replies
Total Likes
If Kautuk's suggestion is not working for you, write a service which would query all the pages and return the pages/nodes with the broken links. The update the links on those pages/nodes through the code. This should solve your problem.
Thanks
Tuhin
Views
Replies
Total Likes
Hi
Then in that case, you can do the following:-
Consider using Groovy console to crawl over the /content/your_site looking for strings starting with /content.
Then use resourceResolver to check if the found path exists. Sample script implementing this algorithm can be found here.
Link:- https://github.com/Citytechinc/cq-groovy-console [Groovy Tool]
Link:- https://gist.github.com/trekawek/72b3515a6641ca5f4b29 [Groovy Script]
// BrokenLinks
import javax.jcr.* | |
import org.apache.sling.api.resource.* | |
def ROOT_PATH = '/content/geometrixx' | |
def extractPaths(p) { | |
if (p instanceof Property && p.multiple) { | |
p.values.collect { extractPaths(it) }.flatten() | |
} else { | |
p.string.findAll(/\/content\/[^"]+/) | |
} | |
} | |
getNode(ROOT_PATH).recurse { node -> | |
node.properties.findAll {it.type == PropertyType.STRING}.each { | |
paths = extractPaths(it).findAll { resourceResolver.resolve(it) instanceof NonExistingResource } | |
if (!paths.empty) { | |
println "Path: ${node.path}" | |
println "Broken: ${paths}\n" | |
} | |
} | |
} | |
true |
Other option is writing own service to achieve the needful/
I hope this would be helpful to you.
Thanks and Regards
Kautuk Sahni
Views
Likes
Replies
Views
Likes
Replies