No able to create and use Robots.txt in AEM using two different approaches

cqsapientu69896

27-11-2016

I am trying to create a robots.txt file in AEM using the two links --

[1] http://www.wemblog.com/2013/06/how-to-implement-robotstxt-sitemapxml.html

[2] https://forums.adobe.com/thread/896654

 Following [1]; the text file does not work at all -- we can create a simple nt:file robots.txt file in crx but when we hit the file   - 

http://localhost:4502/content/mysite/en/robots.txt - the browser downloads the file instead of displaying the content of the text file on the page.

I also enabled txt rendition (Enable Plain Textin Apache Sling GET Servlet ; but there was the same result.

Following [2] ; when you print a page property in sightly using ${pageProperties.robotsTextContent} ; the property is printed on one line in the .html page -

User-agent: * Disallow: / New text: q Newer Text:s

while it has been entered in separate lines in the text-area. Line separator is important in robots.txt file. We need the property to be outputted in different lines as they have been authored in page properties. I used @context ='text' and @context ='html' in the sightly file but it printed '\n' as it on the page without line separation.

Somehow; the txt rendition is not working; creating a robotstext.txt.html file and hitting the page after enabling the text rendition using  Apache Sling GET Servlet outputs as  

http://localhost:4502/content/mysite/en/robots.txt

** Resource dumped by PlainTextRendererServlet** Resource path:/content/mysite/en/robots Resource metadata: {sling.modificationTime=-1, sling.characterEncoding=null, sling.parameterMap={}, sling.contentType=null, sling.creationTime=-1, sling.contentLength=-1, sling.resolutionPath=/content/mysite/en/robots, sling.resolutionPathInfo=.txt} Resource type: cq:Page Resource super type: -
** Resource properties ** jcr:primaryType: cq:Page jcr:createdBy: admin jcr:created: java.util.GregorianCalendar[time=1480233560436,areFieldsSet=true,areAllFieldsSet=true,lenient=false,zone=sun.util.calendar.ZoneInfo[id="GMT+11:00",offset=39600000,dstSavings=0,useDaylight=false,transitions=0,lastRule=null],firstDayOfWeek=1,minimalDaysInFirstWeek=1,ERA=1,YEAR=2016,MONTH=10,WEEK_OF_YEAR=49,WEEK_OF_MONTH=5,DAY_OF_MONTH=27,DAY_OF_YEAR=332,DAY_OF_WEEK=1,DAY_OF_WEEK_IN_MONTH=4,AM_PM=1,HOUR=6,HOUR_OF_DAY=18,MINUTE=59,SECOND=20,MILLISECOND=436,ZONE_OFFSET=39600000,DST_OFFSET=0]

When disabling the Apache Sling GET Servlet  text rendition; it outputs an error.

Did someone else try putting a robots.txt file in AEM? Why is the OOTB text rendition not working properly in AEM?

Any pointers for either of the two approaches will be highly appreciated.

Accepted Solutions (1)

Accepted Solutions (1)

PuzanovsP

MVP

29-11-2016

There should be absolutely nothing wrong with sling(Sling is A.W.E.S.O.M.E)

Look into what you ask sling to do,

Try to divide it into smaller tasks.

1) Print entire string in static sling servlet

2) Read the file from dam and print it in sling servlet

3) Create a selector and following selector read file from dam

Hope it makes sense.

Regards,

Peter

Answers (4)

Answers (4)

sreenivasb1988

02-07-2020

I tried the following in Apache and it worked :

 

RewriteCond %{REQUEST_URI} robots.txt$
Header unset Content-Disposition
RewriteRule ^(.*?)$ /content/dam/path$1 [NC,PT]

OlivBur

15-02-2018

Hi,

here some other solution, how we have made it:

We simply use the Power of the Apache Sling Framework (https://sling.apache.org) and their Resource Resolution mechanism (Sling's URL decomposition).

Follow these steps:

  1. Create some jsp/htl script which simply outputs the plain text content or reading the string content from some property, according to Sling's Resource resolution mechanism you simply can name the file with EXTENSION.SCRIPT_EXTENSION, e.g. txt.jsp or txt.html
  2. Add the content, e.g. Add the "robots" node (if you would like to provide the robots.txt file) with primaryType "nt:unstructured" and with "sling:resourceType" equals to your script/servlet, e.g. "myapp/components/robotstxt".
  3. Update your Vault package filter.xml:
    <filter root="/robots" />
  4. That's it, call your url to test, e.g. http://myhost/robots.txt

Of course you need to check your Webserver and/or also Dispatcher configuration that these requests come through to the AEM instance.

This solution has the advantage that it does not require any special OSGi configuration, that it may easily be integrated into a "Continuous Delivery/Deployment" workflow and that it may be easily extended if the file content should be customizeable by the user.

leeasling

29-11-2016

We place the robots.txt in a folder in the DAM and then have the dispatcher redirect any requests for it to the appropriate sites robots.txt file.  Simple and works.

jodib52714948

29-11-2016

I'm not a developer, but I can tell you what we did. We used a template that we had created for raw content. We created a page under our en_us locale and named it robots. In the page properties, we put robots.txt in the vanity URL. Since this was a custom template, we also had a tab for raw content, where we just pasted our User-Agent: * and Disallow: / paths. Our SEO partners approved this method as well. Not sure if this will help you or not and I could be missing a piece since I'm not a dev but thought I'd throw it out there...good luck!