Expand my Community achievements bar.

Learn about Edge Delivery Services in upcoming GEM session
SOLVED

DAM Assets Issue on disabling Google search

Avatar

Level 1

Hi,


The requirement is:

We need to make a CQ page non Google Searcheable. For that we have added <meta name="robots" content="noindex"> in template level and provided a checkbox option in page properties so that he can disable search for a specific page.
Now the challenge is for Assets (PDF), i added a custom checkbox but dont know where to place this <meta> tag since we dont have any templates for Assets unlike pages. Need help on this.

 

Thanks,
Bhargav
1 Accepted Solution

Avatar

Correct answer by
Level 10

Yes, 

Like you said you have implemented as checkbox for the asset. Now your next task should be to implement a event handler / workflow / scheduler which will update the robots.txt file dynamically.

Example: 

Whenever user check/uncheck the custom implemented checkbox, it will get store in JCR. Now any of event handler / workflow / scheduler  will take that stored property and updated the robot.txt file accordingly.

have doubts? let me know 

View solution in original post

7 Replies

Avatar

Correct answer by
Level 10

Yes, 

Like you said you have implemented as checkbox for the asset. Now your next task should be to implement a event handler / workflow / scheduler which will update the robots.txt file dynamically.

Example: 

Whenever user check/uncheck the custom implemented checkbox, it will get store in JCR. Now any of event handler / workflow / scheduler  will take that stored property and updated the robot.txt file accordingly.

have doubts? let me know 

Avatar

Level 10

Hi Bhargav,

One similar question was asked sometime back where user don't want PDF to be searchable.

Please see the thread and let me know if you have any doubt on it.

Thread : http://help-forums.adobe.com/content/adobeforums/en/experience-manager-forum/adobe-experience-manage...

Thanks

Avatar

Level 1

If we need to test whether its properly working or not, how can we do that ?

 

Thanks,

Bhargav

Avatar

Level 1

So I need to update the robots.txt at /content/<project> with asset path (say for example /content/dam/geometrixx/documents/GeoSphere_Datasheet.pdf) ?

One more doubt is does the <meta> tag i stated in question is no longer needed right since we are controlling from checkbox !?!

Please correct me if I go wrong

 

Thanks,

Bhargav

Avatar

Level 10

Yes, you are on right track. You need to mentioned you PDF file path.

For PDF we do not have <meta> tag like we have for pages but as you mentioned you can use these tags for pages.

Here is more for you:

1) Use robots.txt to block the files from search engines crawlers:

User-agent: * Disallow: /pdfs/ # Block the /pdfs/directory. Disallow: *.pdf  # Block pdf files. Non-standard but works for major search engines.

2) Use rel="nofollow" on links to those PDFs

<a href="something.pdf" rel="nofollow">Download PDF</a>

Complete Documentation: http://www.robotstxt.org/robotstxt.html

Any Doubt? let me know

thanks