Logic behind identifying Duplicate Assets
Hi All,
Basis what parameters AEM identify whether assets is duplicate or not. I have been able to analyse till below, but this does not give me exact parameters like if its assets size or Name or Metadata ?
How to enable duplicate check:
Go to the Adobe Experience Manager Web Console Configuration page at the following URL:
http://<server>:<port>/system/console/configMgr
- Edit the configuration for the servlet Day CQ DAM Create Asset.
- Select the detect duplicate option, and click/tap Save. The Detect Duplicate feature is now enabled in AEM Assets.
CreateAssetServlet will be invoked on save.
How Sha1 is calculated:
- Get the asset original rendition using below line of code
Rendition original = asset.getOriginal();
Note: Asset rendition will be unique for each asset.
- Get Input stream of rendition
is = original.getStream();
- Gets the sha1 value by passing is input stream to shaHex method of DigestUtils.
sha1 = DigestUtils.shaHex(is);
How duplicate assets are identified:
- Run a query on dam assets by passing calculated sha1 to get list of duplicate assets.
String queryString = "//element(*, dam:Asset)[(jcr:content/metadata/@dam:sha1 = '" + sha1 + "')]";
- Iterate through the list returned by above query and try to find the path of asset is equal to path of the asset we are trying to upload. If yes then delete that asset from list.
if (((String) ((List) duplicateAssets).get(i)).equals(asset.getPath())) {
((List) duplicateAssets).remove(i);
break;
}
your help in this regard would be highly appreciated.
Regards,
Vinod
