The problem: Google is crawling and indexing our site (good) but somehow they're indexing certain pages all the way down to the component level. This is likely because they're following JavaScript links for AJAX-generated pagination and the like. Clicking any of these links obviously ends up with pages rendered without page-level templating, other components, etc. (bad) To duplicate: Google "rand blog rafiq college ratings"
The third result (and several others on the first page of results) link all the way down to the component level:
We would expect this to link to the actual site page:
We've thought of two possible solutions, neither of them good:
1. Add an Apache-level rewrite for anything *./jcr:content(/.*) to rewrite to the parent page. This would almost surely work to fix links from Google, but would break site functionality that directly addresses components for AJAX calls (pagination) or alternate page-renderings (XML), etc.
2. Add affected pages to robot.txt. Even worse: then we don't even get indexed and likely impossible to effectively keep up with.
Any good strategies out there to force Google to index at the page level *only*?
Solved! Go to Solution.
Views
Replies
Total Likes
First and foremost, as a best practice, all of your CQ5 author and publish servers be put behind a firewall, not publicly accessible. Only your web server (dispatcher) should be in front of the firewall. If your author and publish servers are behind a firewall, there won’t be any way for Google to index them.
Please review following link
http://crxdelight.com/2012/02/04/how-to-protect-your-cq-instances-from-google-searches/
Views
Replies
Total Likes
First and foremost, as a best practice, all of your CQ5 author and publish servers be put behind a firewall, not publicly accessible. Only your web server (dispatcher) should be in front of the firewall. If your author and publish servers are behind a firewall, there won’t be any way for Google to index them.
Please review following link
http://crxdelight.com/2012/02/04/how-to-protect-your-cq-instances-from-google-searches/
Views
Replies
Total Likes
Views
Likes
Replies