Hi,
I want to prevent PDF files stored in AEM from being indexed by search engines, while keeping the rest of the site indexable.
I read that this can be done by sending an X-Robots-Tag: noindex header for PDF files, but I’m not sure where or how to set this in AEM.
Can someone please guide me on the recommended and simplest way to do this?
Thanks
Saday
Solved! Go to Solution.
Views
Replies
Total Likes
hi @Sadaykumar,
I recommend setting this header at the Dispatcher level, not directly within the AEM application.
In the Dispatcher configuration, add a rule targeting PDF files to include this header. For example:
<FilesMatch "^/content/dam.*\.pdf$">
Header set X-Robots-Tag "noindex, nofollow"
</FilesMatch>
Another option is to set the header at the CDN level. If you're using Adobe CDN, you can leverage the "Response Transformations" feature to do so:
kind: CDN
version: 1
metadata:
envTypes: ["prod"]
data:
rules:
- name: Add X-Robots-Tag to PDFs
when:
request:
url:
path:
endsWith: ".pdf"
actions:
response:
headers:
set:
X-Robots-Tag: "noindex, nofollow"
hi @Sadaykumar,
I recommend setting this header at the Dispatcher level, not directly within the AEM application.
In the Dispatcher configuration, add a rule targeting PDF files to include this header. For example:
<FilesMatch "^/content/dam.*\.pdf$">
Header set X-Robots-Tag "noindex, nofollow"
</FilesMatch>
Another option is to set the header at the CDN level. If you're using Adobe CDN, you can leverage the "Response Transformations" feature to do so:
kind: CDN
version: 1
metadata:
envTypes: ["prod"]
data:
rules:
- name: Add X-Robots-Tag to PDFs
when:
request:
url:
path:
endsWith: ".pdf"
actions:
response:
headers:
set:
X-Robots-Tag: "noindex, nofollow"
Hello @Sadaykumar ,
The preferred method is to configure this header at the Dispatcher level, not inside AEM itself. This ensures:
Minimal overhead on AEM.
No code or OSGi customization required.
It works for all PDFs served through the dispatcher cache.
-> Add the following directive in your virtual host or relevant section:
<FilesMatch "\.pdf$">
Header set X-Robots-Tag "noindex, nofollow"
</FilesMatch>
-> Then reload Apache:
sudo service httpd reload
->This will automatically apply the header to all .pdf responses served through your Dispatcher.
Hello @Sadaykumar
If this is AEMaaCS, AEMaaCS does provide "traffic filter rule" mechanism, which you configure using cdn.yaml in your source control and deploy via the Cloud Manager Config Pipeline.
You can instruct the CDN to add, modify, or remove headers for responses matching certain criteria (for example, "all PDFs in /content/dam").
Example: Add X-Robots-Tag for all PDFs in DAM
kind: "CDN"
version: "1"
metadata:
envTypes: [ "dev", "stage", "prod"]
data:
responseTransformations:
rules:
- name: "add-x-robots-tag-noindex-to-pdfs"
when:
reqProperty:
path like: "/content/dam/*.pdf"
actions:
- type: set
respHeader: X-Robots-Tag
value: "noindex"
The above example adds the header for all responses matching /content/dam/*.pdf.
Reference :
https://experienceleague.adobe.com/en/docs/experience-cloud-kcs/kbarticles/ka-24559
https://experienceleague.adobe.com/en/docs/experience-manager-cloud-service/content/implementing/con...
Hi @Sadaykumar,
Please try below:
Add a rule in your Apache Dispatcher config to send the X-Robots-Tag: noindex header for PDF files:
<FilesMatch "\.pdf$">
Header set X-Robots-Tag "noindex, nofollow"
</FilesMatch>
This tells search engines not to index PDFs. No need to change anything in AEM itself. Just reload Apache after adding the rule.
Thanks!
Views
Likes
Replies