Expand my Community achievements bar.

SOLVED

How to apply X-Robots-Tag: noindex to PDF files in AEM?

Avatar

Level 2

Hi,

I want to prevent PDF files stored in AEM from being indexed by search engines, while keeping the rest of the site indexable.

I read that this can be done by sending an X-Robots-Tag: noindex header for PDF files, but I’m not sure where or how to set this in AEM.

Can someone please guide me on the recommended and simplest way to do this?

Thanks
Saday

1 Accepted Solution

Avatar

Correct answer by
Level 10

hi @Sadaykumar

I recommend setting this header at the Dispatcher level, not directly within the AEM application.

In the Dispatcher configuration, add a rule targeting PDF files to include this header. For example:

<FilesMatch "^/content/dam.*\.pdf$">
    Header set X-Robots-Tag "noindex, nofollow"
</FilesMatch>

 

Another option is to set the header at the CDN level. If you're using Adobe CDN, you can leverage the "Response Transformations" feature to do so:

kind: CDN
version: 1
metadata:
  envTypes: ["prod"]
data:
  rules:
    - name: Add X-Robots-Tag to PDFs
      when:
        request:
          url:
            path:
              endsWith: ".pdf"
      actions:
        response:
          headers:
            set:
              X-Robots-Tag: "noindex, nofollow"
 
 

View solution in original post

4 Replies

Avatar

Correct answer by
Level 10

hi @Sadaykumar

I recommend setting this header at the Dispatcher level, not directly within the AEM application.

In the Dispatcher configuration, add a rule targeting PDF files to include this header. For example:

<FilesMatch "^/content/dam.*\.pdf$">
    Header set X-Robots-Tag "noindex, nofollow"
</FilesMatch>

 

Another option is to set the header at the CDN level. If you're using Adobe CDN, you can leverage the "Response Transformations" feature to do so:

kind: CDN
version: 1
metadata:
  envTypes: ["prod"]
data:
  rules:
    - name: Add X-Robots-Tag to PDFs
      when:
        request:
          url:
            path:
              endsWith: ".pdf"
      actions:
        response:
          headers:
            set:
              X-Robots-Tag: "noindex, nofollow"
 
 

Avatar

Employee Advisor

Hello @Sadaykumar ,

 

The preferred method is to configure this header at the Dispatcher level, not inside AEM itself. This ensures:

  • Minimal overhead on AEM.

  • No code or OSGi customization required.

  • It works for all PDFs served through the dispatcher cache.

Example for Apache HTTPD :

-> Add the following directive in your virtual host or relevant section:
<FilesMatch "\.pdf$">
Header set X-Robots-Tag "noindex, nofollow"
</FilesMatch>


-> Then reload Apache:
sudo service httpd reload

->This will automatically apply the header to all .pdf responses served through your Dispatcher.

Avatar

Employee

Hello @Sadaykumar 

If this is AEMaaCS, AEMaaCS does provide "traffic filter rule" mechanism, which you configure using cdn.yaml in your source control and deploy via the Cloud Manager Config Pipeline.


You can instruct the CDN to add, modify, or remove headers for responses matching certain criteria (for example, "all PDFs in /content/dam").

Example: Add X-Robots-Tag for all PDFs in DAM

 

kind: "CDN"
version: "1"
metadata:
envTypes: [ "dev", "stage", "prod"]
data:
responseTransformations:
rules:
- name: "add-x-robots-tag-noindex-to-pdfs"
when:
reqProperty:
path like: "/content/dam/*.pdf"
actions:
- type: set
respHeader: X-Robots-Tag
value: "noindex"


The above example adds the header for all responses matching /content/dam/*.pdf.

Reference :
https://experienceleague.adobe.com/en/docs/experience-cloud-kcs/kbarticles/ka-24559
https://experienceleague.adobe.com/en/docs/experience-manager-cloud-service/content/implementing/con...

Avatar

Level 6

Hi @Sadaykumar,

Please try below:

Add a rule in your Apache Dispatcher config to send the X-Robots-Tag: noindex header for PDF files:

<FilesMatch "\.pdf$">
Header set X-Robots-Tag "noindex, nofollow"
</FilesMatch>


This tells search engines not to index PDFs. No need to change anything in AEM itself. Just reload Apache after adding the rule.

 

Thanks!