Expand my Community achievements bar.

Dive into Adobe Summit 2024! Explore curated list of AEM sessions & labs, register, connect with experts, ask questions, engage, and share insights. Don't miss the excitement.

Robots.txt file isn't accessible on publish environment AEM as a cloud service

Avatar

Level 2

Hi

 

We are implementing a robots.txt file that needs to be fetched from the dam (for several sites with several languages).

The first issue that we are facing is that we can't access the txt file on the publish environment with the direct path e.g. /content/dam/robots/<sitename>/en_gb/robots.txt.

 

When we try to put it on a link on a button same result (href attribute gets removed).
The same approach works fine with pdf files.

 

Does someone knows if we missed something regarding the filetypes that can't be published/accessed on the publish environment?

We are using AEM as a cloud service version : 2022.11.9850.20221116T162329Z.

 

 

Thanks

 

7 Replies

Avatar

Level 2

Hi Arun

 

We have enabled the right rules in the dispatcher but we still have the same issue.
As well if I bypass the dispatcher  by accessing the publish url: 

https://publish-pXXXXX-eXXXXX.adobeaemcloud.com/content/dam/robots/<sitename>/en_gb/robots.txt

It still gives a 404

JeffreyVSpringbok_0-1679480115212.png


If I try to access a jpg file it works.

Avatar

Level 2

I configured everything like mentioned in the ticket
I have added the following entries in filters.any

## Allow version endpoint
/0105 { /type "allow" /method "GET"  /url "/robots.txt" }

/0107 { /type "allow" /extension '(txt)' /path "/content/*" }

 

and added this in my rewrite.rules:

# Rewrite to robots.txt
RewriteCond %{REQUEST_URI} ^/robots.txt$
RewriteRule (.*) /content/dam/robots/mysite/en_gb/robots.txt  [PT,L]

 

When I check the dispatcher logs I see the following entries (but not sure how it comes it doesn't work):

"GET /content/dam/robots/mysite/en_gb/robots.txt" - 0ms [publishfarm/-] [actionblocked] publish-pXXXXX-eXXXXX.adobeaemcloud.com
"GET /content/dam/robots/mysite/en_gb/robots.txt" - 0ms [publishfarm/-] [actionblocked] dev.mysite.com

 

Avatar

Level 2

Hi Arun
Small update on my side.
If I try it on my own local dispatcher instance it is working fine after performing the following chance.

/0105 { /type "allow" /method "GET"  /url "/robots" /extension "txt" }

I allowed also all the .txt files coming from my content path.

0100 { /type "allow" /extension '(css|eot|gif|ico|jpeg|jpg|js|gif|pdf|png|svg|swf|ttf|woff|woff2|html|txt)' /path "/content/*" } 

I still see in the logs actionblocked.

Avatar

Community Advisor

Your filter rules and rewrite rules are all correct. I can't figure out a reason why the request would be still blocked.

However, I have one suggestion - Can you try to write the below filter rule ?

/0108 { /type "allow" /url "/content/dam/robots/mysite/en_gb/robots.txt"  }

Just wanted to see if increased specificity can produce different result.

Avatar

Level 2

The issue was resolved by using the following filter:

 

/0107 { /type "allow" /extension '(txt)' /path "/content/*" }

 

and the following redirection in rewrite.rules:

 

# Rewrite to robots.txt
RewriteCond %{REQUEST_URI} /robots\.txt$
RewriteRule ^/(.*)$ /content/dam/robots/mysite/en_gb/robots.txt [PT,NC]​