Exclude Content nodes from Search engine indexing | Community
Skip to main content
Adobe Champion
December 9, 2022
Solved

Exclude Content nodes from Search engine indexing

  • December 9, 2022
  • 3 replies
  • 2125 views

Is it possible to exclude certain directories in the DAM from Google Search engine indexing?

Is it done at dispatcher level?

Can someone give details on how we can make this possible?

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.
Best answer by arunpatidar

you can place this file anywhere in the AEM but it must be served from domain root(This can be achieved using apache/dispatcher redirect rules e.g. https://github.com/arunpatidar02/aemaacs-aemlab/blob/c4e2ab400f561acc3127dba662a2f1f6c397d340/dispatcher.cloud/src/conf.d/rewrites/default_rewrite.rules#L12 )

e.g. https://www.mydomain.com/robots.txt

3 replies

arunpatidar
Community Advisor
Community Advisor
December 9, 2022

you can try adding those path in robots.txt

Arun Patidar
P_V_NairAdobe ChampionAuthor
Adobe Champion
December 9, 2022

@arunpatidar  is that file something residing in publisher or dispatcher? Could you please give me some more details?

 

Manu_Mathew_
Community Advisor
Community Advisor
December 10, 2022

@p_v_nair  'robots.txt' would be under path /content/dam/[sitename]

you can add the path under the `Disallow` section.

 

https://experienceleaguecommunities.adobe.com/t5/adobe-experience-manager/robots-txt-file-in-aem-websites-aem-community-blog-seeding/td-p/369442

 

Hope this helps!

Ravi_Pampana
Community Advisor
Community Advisor
December 9, 2022

As Arun mentioned, we can add the paths which should not be crawled by search engines in robots.txt

 

Robots.txt file should be available at root level (Ex: https://www.example.com/robots.txt)

 

You can have the file available in publisher and add redirection in dispatcher file so that it will load at root level as url provided above.

 

More details of robots.txt can be found in 

https://www.semrush.com/blog/beginners-guide-robots-txt/?kw=&cmp=US_SRCH_DSA_Blog_EN&label=dsa_pagefeed&Network=g&Device=c&utm_content=631620704803&kwid=aud-391253447936:dsa-1875638614702&cmpid=18348486859&agpid=142604696083&BU=Core&extid=60113850590&adpos=&gclid=Cj0KCQiA1sucBhDgARIsAFoytUtQk3Jg0Ahx3_wZrHYOlAnLIU-tVTdmk8U50wrCE4Udv3psXA4z-CIaAiXWEALw_wcB

tushaar_srivastava
Level 6
December 11, 2022

Yes, it is possible to exclude certain directories in the DAM from Google Search engine indexing in AEM. This can be done by setting the robots.txt file to disallow search engine bots from indexing those directories. Additionally, you can also add a meta tag to the HTML of the pages you want to exclude, which will prevent search engines from indexing them.

Also This can be done at the dispatcher level. You can configure the dispatcher.any file to exclude certain paths or directories from being indexed by Google.