Expand my Community achievements bar.

Getting rid of duplicate assets

Avatar

Community Advisor

Hello everybody,

i am looking for a way to get rid of all the duplicate assets within our DAM.
Can somebody point me in the right direction where to find some guide how to create a workflow which could do the following:

Step 1: Create a list of all the assets "jcr:content/metadata/dam:sha1" value
Step 2: Compare and find the duplicate

Step3: Create a list of all the duplicates
Check if it is in use - site or InDesign file - if not 
Step 4: Delete

If it is a duplicate - the reference resolver should be able to fix this if, right?
Call one file A and the other B - they are exactly the same
One is in dam/arms and the other in dam/illustration
A is in use on 2 files and B in 3 - could the reference resolver reinstall the connection to A if i delete B?

3 Replies

Avatar

Community Advisor

The duplicate detection is nice, but it functions like googly eyes on a rock for a earthquake detection.
It makes visible what is going on but does only gives out a warning after the fact and does not help to get rid of the mess. It even adds a bunch of notifications that do not really help to clean it up and messily clog up the notifications.
For example:
A user imports 11000 files into a fresh folder. AEM imports it.
THEN Duplicate detection - after the reprocess - sends out 700 duplicate detection notifications to the admin.
each notification informs me that 2 to 40 duplicates where in that upload (5800 in total) and now i need to delete the duplicates, the notifications, and manualy clean up the mess.
Currently i make an metadata export - of the sha1 value and then give those duplicates with the later upload date a unique tag. 
Later i search for that tag and delete all those files manualy.
There has to be a quicker - automated way.
Best of all would be that this duplicate detection would delete duplicates automatically

Avatar

Community Advisor

I forgot the most important part: thank you for the reply.
We are using AEMaaCS maybe this works a bit differently here than in 6.4
I am sorry to say that the notification of the duplicate detection does not provide much help to get rid of duplicates. It only notifies you after the reprocess and by then it is too late. The notifications are a hustle to clean up and i did not found a way to make good use of information provided by the notification.
The cleaning up of the duplicates themself are probably can be done much more automated than i currently do it, but i am lost how to create a workflow that does the work.