Hi Folks,
I have a requirement to have robots.txt file at site root.
Everything is working fine in local SDK publish instance. I am following below link
https://www.aemtutorial.info/2020/07/robotstxt-file-in-aem-websites.html
robots.txt is placed under
/content/dam/test-project/robots.txt
Rule to shorten the url is added in resource resolver configuration
/content/dam/test-project/robots.txt:/robots.txt
Rule added in dispatcher to allow
/0073 { /type "allow" /url "/robots.txt"}
When I am accessing the robots.txt on cloud dispatcher, it is throwing 404.
Any help is highly appreciated.
Thanks,
Pradeep
Solved! Go to Solution.
Views
Replies
Total Likes
You said it works on your local SDK. Does it work with direct publish request (http://localhost:4503/robots.txt) or with local dispatcher (http://localhost/robots.txt) or with local dispatcher + actual host mapping in /etc/hosts with the same URL as it is requested from your CDN (http://<website.com>/robots.txt)?
Do not forget to check the publish access when you are not logged in to the publisher, it might be an issue with the anonymous user access restrictions!
If the first doesn't work locally (if works only with /content/dam/...), then your sling resolver mapping configuration is not correct (or you may need to change the order of rules), use the jcrresolver interface to check the configuration and fix it.
If it works well for the first but not the second one, then dispatcher allow or rewrite rules are not good. If the first and second are working, but not the third one, then the issue is still in the dispatcher in per-host configuration files.
If all three are working, you can check the same but on your remote - direct access on the publisher (https://adobecqms...:4503/robots.txt) will tell you if it's something incorrect with your AEM / access rules for the anonymous user or still something on CDN / dispatcher.
If it is still the dispatcher, you can try increasing the log output level for the rewrite (and maybe access) logs on the dispatcher and check logs after that, there should be a visible explanation of why the request has been blocked.
Btw, if you have the rule
RewriteRule ^/robots.txt$ /content/dam/test-project/robots.txt [NC,PT,L]
then you don't need to update mappings in resolver configuration, the localhost:4503/robots.txt won't work but requests through the dispatcher will work anyway.
First you need to check the dispatcher log to see if request is going to Publish instance or not.
if request is getting blocked by the dispatcher then there must be some rule after your dispatcher rule that is blocking robots.txt url. may be ".txt" extension is getting blocked by the dispatcher.
If dispatcher is sending the request to publisher the issue is with JCR resource resolver configuration.
Check if that is getting deployed properly or there is some issue with OSGI configuration file.
Hi @pradeepdubey82 Either this path (/content/dam/...) is incorrect or your file is not published. All rules seem right.
Thanks
-Bilal
On repo browser in cloud publisher I can see robots.txt file is there of type nt:file
I can see this error on dispatcher log.
"GET /robots.txt" - 0ms [publishfarm/-] [actionblocked] publish-
Where should I enable this ? In which file any idea
check your dispatcher filter rules if any filter rule is blocking /robots.txt url after your allow rule.
Hey @pradeepdubey82 Try to find if it is getting blocked in your custom filter(check the extensions if it is allowed). Also try to play around with other properties like /method or /path.
Thanks
-Bilal
I have searched entire codebase for txt or .txt in dispatcher files, nowhere it is blocking.
Now checking other configurations like method/extension/path etc in filter configurations.
Try updating your filter rule to the following and check:
/0073 { /type "allow" /url "/content/dam/test-project/robots.txt"}
Added rules in filter file
/0074 { /type "allow" /url "/robots.txt"}
Added rule in rewrite file
RewriteRule ^/robots.txt$ /content/dam/test-project/robots.txt [NC,PT]
No luck.
Please advise if any other place am I missing?
First check if you can access it fine from direct publish url:
http://publishserver:port/content/dam/test-project/robots.txt. If yes, then update your filter to below with RewriteRule in place.
Add rules in filter file
Either
/0074 { /type "allow" /url "/content/dam/test-project/robots.txt"}
Or
/0074 { /type "allow" /url "*/robots.txt"} -- just for testing.
update rule in rewrite file
RewriteRule ^/robots.txt$ /content/dam/test-project/robots.txt [NC,PT,L]
If you have access to dispatcher.log file, change the log level to debug and check the logs to make sure the dispatcher is not Rejecting this robots.txt path.
Also, clear the cache on dispatcher every time you make a change in dispatcher/apache rules especially with modifications in filter rules.
Still getting below error in dispatcher log and 404 from browser with full path or shorten path.
"GET /content/dam/test-project/robots.txt" - 0ms [publishfarm/-] [actionblocked]
Rewrite rules applied below
RewriteRule ^/robots.txt$ /content/dam/test-project/robots.txt [NC,PT,L]
Filter rules applied below
/0072 { /type "allow" /url "/content/dam/test-project/robots.txt"}
You said it works on your local SDK. Does it work with direct publish request (http://localhost:4503/robots.txt) or with local dispatcher (http://localhost/robots.txt) or with local dispatcher + actual host mapping in /etc/hosts with the same URL as it is requested from your CDN (http://<website.com>/robots.txt)?
Do not forget to check the publish access when you are not logged in to the publisher, it might be an issue with the anonymous user access restrictions!
If the first doesn't work locally (if works only with /content/dam/...), then your sling resolver mapping configuration is not correct (or you may need to change the order of rules), use the jcrresolver interface to check the configuration and fix it.
If it works well for the first but not the second one, then dispatcher allow or rewrite rules are not good. If the first and second are working, but not the third one, then the issue is still in the dispatcher in per-host configuration files.
If all three are working, you can check the same but on your remote - direct access on the publisher (https://adobecqms...:4503/robots.txt) will tell you if it's something incorrect with your AEM / access rules for the anonymous user or still something on CDN / dispatcher.
If it is still the dispatcher, you can try increasing the log output level for the rewrite (and maybe access) logs on the dispatcher and check logs after that, there should be a visible explanation of why the request has been blocked.
Btw, if you have the rule
RewriteRule ^/robots.txt$ /content/dam/test-project/robots.txt [NC,PT,L]
then you don't need to update mappings in resolver configuration, the localhost:4503/robots.txt won't work but requests through the dispatcher will work anyway.
Solution was, in cloud dev deployment changes specially dispatcher were not taking place, when we deploy it to stage it started working.
Cloud behaves weird that is difficult to understand why.
Thanks all for looking into it.
Cheers,
Pradeep