Just seeking some advice on best practices around handling deleted tags. As far as I understand...
Version: 6.3, SP1
In view of the above, my questions are as follows:
Look forward to a good discussion on this topic...
With the help of Query yo can find the tags which are referred in the page and using TagManager API com.day.cq.tagging.TagManager you can delete the tags from page and after that tags can be deleted from /etc/tags/
We should not give access to content authors to delete tags and it is good practice to have super users group who will have access to delete tags. so that we can avoid human errors here
Yes, when you delete tag the reference will not delete automatically, you need to wait for Garbage collection to run and clean up all references, and the GC runs every midnight, so you need to wait until garbage collector runs. If you don't want to wait then you must change the configurations and set it to 5 or 10mins, but this is not recommended as per my experience we will not delete tags very frequently so unnecessary we are putting the burden on the server.
To clear the references once the tag is deleted, you can manually go and change the configurations by setting the value to 5mins once the job is completed then revert it back to the original value
if you don't have an option to manually change server configurations then you can write a simple script which finds references of deleted tags using TagManager API and delete, you can write a servlet for this or add some option in the Admin dashboard.
I changed the configuration of the garbage collector on my local instance to run every 5 minute with the following expression
0/1 0/5 0 ? * * *
I can confirm that the TagGarbageCollector does not remove the references of the deleted tags from cq:tags property of the page's jcr:content node. Basically this is going to be a problem as searching for the deleted tag in UI shows up pages that contain this invalid reference now.
I can confirm too that Tag Garbage collector does not remove or fix references to the tag on pages. The only think it does is to remove old location of Tag after it was moved. It simply checks if tag from old location is still referenced on the pages, if not it removes that tag - that means the OOTB approach for this is that author is responsible to re-tag all the affected pages (remove tag that was moved/renamed and tag using that tag once again - but this time the new location of tag will be stored on the page level).
Thanks, Guys, I think my understanding was wrong on it, if that is the case then we must write custom logic. which Query all tags which are referenced on the page or component, then iterate collection and find out the deleted tag and finally remove from the page or component node.
Doing this in practical terms is untenable from a performance point of view. Imagine one tag that references 50K pages. You do this and you will bring down your author instance or whatever instance you do this on. Then there is the question of what will you do with vast number of pages that have now been modified on Author? Publish them? What are the consequences of that (surely you won't run a massive query and edit pages on publish). The list goes on.
The only realistic option I can think of is to do the upfront work and figure out your taxonomy. Don't let authors delete tags and as per documentation, move or rename tags. For those things it has to be an admin activity, the entire thing has to be planned and executed. In other words, deleting tags is not a BAU author activity.
I'd love to hear from Adobe staff if there is an alternative view!