Facing Issues for Robots.txt in AEM Cloud SDK | Community
Skip to main content
Level 2
May 27, 2021

Facing Issues for Robots.txt in AEM Cloud SDK

  • May 27, 2021
  • 2 replies
  • 5577 views

Dear All,

I have setup AEM cloud SDK in my local and trying to implement robots.txt by following the below blog.

 

https://www.aemtutorial.info/2020/07/

 

Here I am facing 2 issues.

 

1) Issue-1 - I have created a file under root content like below.

/content/mysite/robots.txt

When I am trying to see the robots.txt from the page in author/publish like http://localhost:4503/content/mysite/robots.txt , then robots.txt is downloading...

 

2) Issue-2 - When I am hitting the robots.txt from the dispatcher page then also I am not seeing the any content frrom robots.txt , as shown below.

 

 

I am getting below error log in dispatcher it is showing that blocked [publishfarm/-] 0ms "localhost:8082".

 

[27/May/2021:08:39:31 +0000] "GET /content/dam/mysite/robots.txt HTTP/1.1" - blocked [publishfarm/-] 0ms "localhost:8082"
172.17.0.1 "localhost:8082" - [27/May/2021:08:39:31 +0000] "GET /content/dam/mysite/robots.txt HTTP/1.1" 404 196 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36"
172.17.0.1 "localhost:8082" - [27/May/2021:08:40:25 +0000] "GET /content/mysite/robots.txt HTTP/1.1" 404 196 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36"
[27/May/2021:08:40:25 +0000] "GET /content/mysite/robots.txt HTTP/1.1" - blocked [publishfarm/-] 0ms "localhost:8082"

 

My robots.txt file is below

#Any search crawler can crawl our site
User-agent: *

#Allow only below mentioned paths
Allow: /en/
Allow: /fr/
Allow: /gb/
Allow: /in/
#Disallow everything else
Disallow: /

 

Can anybody please help me on this. Thanks a lot....NOTE that I am using AEM cloud SDK in my local.

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.

2 replies

Asutosh_Jena_
Community Advisor
Community Advisor
May 27, 2021

Hi @sunitaborn 

 

Please see my answers below:

#1: You need to configure the ContentDispositionFilter with the file path as exclusion to open the file instead of downloading it. Please add the below config in config.publish run mode so on all publish instance it will be opened while in author it will be downloaded. If you want in author also to open you can add the same configuration to config it self. But it's only required on publish as publish instances is exposed to public.

org.apache.sling.security.impl.ContentDispositionFilter.xml

<?xml version="1.0" encoding="UTF-8"?>
<jcr:root xmlns:sling="http://sling.apache.org/jcr/sling/1.0" xmlns:jcr="http://www.jcp.org/jcr/1.0"
jcr:primaryType="sling:OsgiConfig"
sling.content.disposition.all.paths="{Boolean}false"
sling.content.disposition.excluded.paths="[/content/mysite/robots.txt]"/>

 

#2: I see the path is blocked in "publishfarm" farm file. You need to enable access to the file location i.e. /content/mysite/* to allow the file to load. Ideally there will be a redirect set at the dispatcher because the file will be always accessed like www.website.com/robots.txt and it should serve content from the actual location. So you need to apply the below redirect at the dispatcher as well:

 

RewriteCond %{REQUEST_URI} ^/robots.txt$
RewriteRule (.*) /content/mysite/robots.txt [PT,L]

 

Thanks!

Level 2
June 1, 2021

Hi @asutosh_jena_,

 

When I used  /0009 { /glob "/content/*.txt" /type "allow" } in filter.any , then I am getting below error.

 

Cloud manager validator 2.0.30
2021/06/01 16:20:38 Dispatcher configuration validation failed:
conf.dispatcher.d\filters\filters.any:9: filter must not use glob pattern to allow requests

 

 

 

When I commented  /009 { /glob "/content/*.txt" /type "allow" } and used /009 { /type "allow" /extension '(txt)' /path "/content/myraitt/*" } then I am able to build success and I am able to see my robots.txt with 200 , as shown below.

 

 

 

rawvarun
Community Advisor
Community Advisor
May 25, 2023