Expand my Community achievements bar.

SOLVED

Adding a DAM predicate for NoRobots.txt

Avatar

Former Community Member

Hi

I'm running CQ 5.6.0 and currently have PDF DAM assets that are externally searchable via internet search engines. I would like to be able to customise the DAM so that I can define at an asset level whether or not a particular asset should be searchable in Google, Bing etc.

My plan is to create a property at the asset level which, once selected, adds the path to a NoRobots.txt file.

My questions are: 

  1. How do I create a property per DAM asset for a NoRobots flag?
  2. How do I create a NoRobots.txt file that can be populated with pages from 'Sites' as well as the DAM?
  3. Is there a better way of acheiving the same result?

Many thanks in advance

Fabio

1 Accepted Solution

Avatar

Correct answer by
Level 10

Two Things Here:
You can create versions of a pdf as well, this way only one pdf will be present for every product. These version are similar to version we have for pages.
Then you can use same name for new file. This approach will not require to hide previous files for same product explicitly as they will not be visible.


If still you want to go for a new file with every version, then I recommend you to follow some naming convention for the latest file, so that your java code can ignore these files under same product and get the name of all other file. Once you have all the name, you can add those path in robots.txt. Make sure you follow naming conversion of robots.txt while updating it programatically.

Tell me, what do you feel? 

View solution in original post

9 Replies

Avatar

Level 6

Hi,

Have you tried changing the permissions on the assets of folders? Try removing the permissions for Everyone group, it should work.

Thanks

Avatar

Former Community Member

Hi

 

Thanks for the fast response!

I'm looking for a way to control which PDFs are searchable and which aren't that I can give the users of the system. The defining of permissions on a folder sounds like a superuser group role which my users won't be allowed to have.

Is there another way or have I misunderstood?

thanks

Fabio 

Avatar

Level 10

Hi Fabio,

I just wanted to get clear on your exact requirement, so here what is understood ( correct me if I am wrong ):-

You are having some PDFs in your DAM which are not in single folder ( maybe distributed across folders )

Are you planning to dynamically include this hide from search engine pdf features?. It means today pdf "A" can be hidden from search and tomorrow it may not be hidden.

[OR]

A particular PDF will always be hidden forever and pdf which are not hidden will never be hidden.

Please clear If I am wrong.

Avatar

Former Community Member

Almost ;)

 

The DAM folder structure reflects the company's product structure with one folder per product. Within a Product's folder there are PDFs which describe the product. The users would like to add new versions of the PDFs with new URLs so that they can direct users to the PDF most appropriate to a customer when they signed up to a product. However, they only want the latest version of a PDF to be externally searchable by Google, Bing, Yahoo etc.

To solve this we have directed the users to upload new versions of the PDF but warning them to make sure the File name is different so that we'll get a new path. What I'm stuck on is how to stop a Search Engine finding a particular, defined asset from the DAM.

So the requirements are:

  • somehow be able to make a PDF that resides in any DAM Folder as non-searchable by a Search Engine while other items in the same DAM folder are searchable.
  • Users must be able to control what is searchable by SEs 
  • it is very likely that a PDF is defined a searchable today but a user may decide to make it non-searchable tomorrow.

Thanks in advance

 

Fabio

Avatar

Correct answer by
Level 10

Two Things Here:
You can create versions of a pdf as well, this way only one pdf will be present for every product. These version are similar to version we have for pages.
Then you can use same name for new file. This approach will not require to hide previous files for same product explicitly as they will not be visible.


If still you want to go for a new file with every version, then I recommend you to follow some naming convention for the latest file, so that your java code can ignore these files under same product and get the name of all other file. Once you have all the name, you can add those path in robots.txt. Make sure you follow naming conversion of robots.txt while updating it programatically.

Tell me, what do you feel? 

Avatar

Former Community Member

Two Things Here:
You can create versions of a pdf as well, this way only one pdf will be present for every product. These version are similar to version we have for pages.
Then you can use same name for new file. This approach will not require to hide previous files for same product explicitly as they will not be visible.

 

That won't work as a requirement I have is to have the old pages accessible if someone hits the link directly for the old PDF. Only the new version needs to be searchable by Google.

 

Once you have all the name, you can add those path in robots.txt. Make sure you follow naming conversion of robots.txt while updating it programatically.

This is the way I was envisaging it working. The question is how to create and update the robots.txt file? Is there a way to add a property to a DAM asset so that I can programmatically create the robots.txt file with the required paths?

Avatar

Level 10

If you want to add property to dam asset, probably you can provide some dialog box [OR] customize OOTB DAM fields to add an extra property to assets

Else As I suggested you can go for any naming convention to be followed by file (This could take less time and will be easy).

You can create a file via java with MIME type set as "text/plain"  for robots.txt. and upload it to you content/[project]/robots.txt path.

Avatar

Level 10

I do not think there is a way to prevent a Google crawl on PDF 1 while allowing a crawl on PDF 2 in same location. I will double check.