Highlighted

Handling deleted tags in AEM

Avatar

Avatar

av-ey

Avatar

av-ey

av-ey

03-10-2018

Hi everyone,

Just seeking some advice on best practices around handling deleted tags. As far as I understand...

Version: 6.3, SP1

  1. Deleting a tag whilst it's referenced by a number of content nodes (pages) will remove the tag from /etc/tags but it won't remove it from the page's cq:tags property.
  2. The com.day.cq.tagging.impl.TagGarbageCollector background job will clean-up tags that are no longer referenced by any pages.
  3. The cq:tags property of the pages will still contain references to the deleted tag until they are removed manually.
  4. The deleted tag shows up in the Touch UI in page properties section.

In view of the above, my questions are as follows:

  1. What is the expected behaviour of the TagManager API? What will it return the pages that still contain references to the deleted tags?
  2. Searching for the deleted tag in the Touch UI returns the pages that were previously tagged with it. Is that expected behaviour?
  3. Is my assumption correct that automatically removing the deleted tags (from the cq:tags property) from the referenced pages is a bad practice especially in scenarios where there are large number of pages that were tagged with it (high read/write - potentially killing off the author instance)?
  4. Is my assumption correct that deleting tags for the above reasons should be done seldom and that this type of activity should not be performed by regular authors?
  5. What would be the recommended way to remove references to deleted tags from a large number of pages? Is the answer figure out the taxonomy to start with and don't mess with it?

Look forward to a good discussion on this topic...

Thanks,

Arup

Replies

Highlighted

Avatar

Avatar

smacdonald2008

Total Posts

12.7K

Likes

1.4K

Correct Answer

2.3K

Avatar

smacdonald2008

Total Posts

12.7K

Likes

1.4K

Correct Answer

2.3K
smacdonald2008

03-10-2018

For point 1 - did you try that - apply a tag to page, delete the tag and then use the API to see if that page is still returned. Try invoking the find method and see if that resource is returned.

Highlighted

Avatar

Avatar

smacdonald2008

Total Posts

12.7K

Likes

1.4K

Correct Answer

2.3K

Avatar

smacdonald2008

Total Posts

12.7K

Likes

1.4K

Correct Answer

2.3K
smacdonald2008

03-10-2018

Also - i advice you to watch this session to learn more about assets and tags - Explore AEM Assets and Tags by their APIs

Highlighted

Avatar

Avatar

av-ey

Avatar

av-ey

av-ey

03-10-2018

Sadly the session doesn't really address the crux of the questions I have...

Highlighted

Avatar

Avatar

Arun_Patidar

MVP

Total Posts

2.9K

Likes

958

Correct Answer

820

Avatar

Arun_Patidar

MVP

Total Posts

2.9K

Likes

958

Correct Answer

820
Arun_Patidar
MVP

03-10-2018

Hi,

With the help of Query yo can find the tags which are referred in the page and using TagManager API com.day.cq.tagging.TagManager  you can delete the tags from page and after that tags can be deleted from /etc/tags/

Highlighted

Avatar

Avatar

raj_mandalapu

Avatar

raj_mandalapu

raj_mandalapu

04-10-2018

We should not give access to content authors to delete tags and it is good practice to have super users group who will have access to delete tags. so that we can avoid human errors here

Yes, when you delete tag the reference will not delete automatically, you need to wait for Garbage collection to run and clean up all references, and the GC runs every midnight, so you need to wait until garbage collector runs. If you don't want to wait then you must change the configurations and set it to 5 or 10mins, but this is not recommended as per my experience we will not delete tags very frequently so unnecessary we are putting the burden on the server.

To clear the references once the tag is deleted, you can manually go and change the configurations by setting the value to 5mins once the job is completed then revert it back to the original value

if you don't have an option to manually change server configurations then you can write a simple script which finds references of deleted tags using TagManager API and delete, you can write a servlet for this or add some option in the Admin dashboard.

Highlighted

Avatar

Avatar

av-ey

Avatar

av-ey

av-ey

04-10-2018

I changed the configuration of the garbage collector on my local instance to run every 5 minute with the following expression

0/1 0/5 0 ? * * *

I can confirm that the TagGarbageCollector does not remove the references of the deleted tags from cq:tags property of the page's jcr:content node. Basically this is going to be a problem as searching for the deleted tag in UI shows up pages that contain this invalid reference now.

Highlighted

Avatar

Avatar

Marcin_Czeczko

Avatar

Marcin_Czeczko

Marcin_Czeczko

04-10-2018

I can confirm too that Tag Garbage collector does not remove or fix references to the tag on pages. The only think it does is to remove old location of Tag after it was moved. It simply checks if tag from old location is still referenced on the pages, if not it removes that tag - that means the OOTB approach for this is that author is responsible to re-tag all the affected pages (remove tag that was moved/renamed and tag using that tag once again - but this time the new location of tag will be stored on the page level).

Highlighted

Avatar

Avatar

raj_mandalapu

Avatar

raj_mandalapu

raj_mandalapu

05-10-2018

Thanks, Guys, I think my understanding was wrong on it, if that is the case then we must write custom logic. which Query all tags which are referenced on the page or component, then iterate collection and find out the deleted tag and finally remove from the page or component node.

Highlighted

Avatar

Avatar

av-ey

Avatar

av-ey

av-ey

05-10-2018

Doing this in practical terms is untenable from a performance point of view. Imagine one tag that references 50K pages. You do this and you will bring down your author instance or whatever instance you do this on. Then there is the question of what will you do with vast number of pages that have now been modified on Author? Publish them? What are the consequences of that (surely you won't run a massive query and edit pages on publish). The list goes on.

The only realistic option I can think of is to do the upfront work and figure out your taxonomy. Don't let authors delete tags and as per documentation, move or rename tags. For those things it has to be an admin activity, the entire thing has to be planned and executed. In other words, deleting tags is not a BAU author activity.

I'd love to hear from Adobe staff if there is an alternative view!