Highlighted

Logic behind identifying Duplicate Assets

mozzvinod

01-07-2019

Hi All,

Basis what parameters AEM identify whether assets is duplicate or not. I have been able to analyse till below, but this does not give me exact parameters like if its assets size or Name or Metadata ?

How to enable duplicate check:

Go to the Adobe Experience Manager Web Console Configuration page at the following URL:

http://<server>:<port>/system/console/configMgr

  • Edit the configuration for the servlet Day CQ DAM Create Asset.
  • Select the detect duplicate option, and click/tap Save. The Detect Duplicate feature is now enabled in AEM Assets.

CreateAssetServlet will be invoked on save.

How Sha1 is calculated:

  • ­­­­­Get the asset original rendition using below line of code

Rendition original = asset.getOriginal();

Note: Asset rendition will be unique for each asset.

  • Get Input stream of rendition

is = original.getStream();

  • Gets the sha1 value by passing is input stream to shaHex method of DigestUtils.

sha1 = DigestUtils.shaHex(is);

How duplicate assets are identified:

  • Run a query on dam assets by passing calculated sha1 to get list of duplicate assets.

String queryString = "//element(*, dam:Asset)[(jcr:content/metadata/@dam:sha1 = '" + sha1 + "')]";

  • Iterate through the list returned by above query and try to find the path of asset is equal to path of the asset we are trying to upload. If yes then delete that asset from list.

if (((String) ((List) duplicateAssets).get(i)).equals(asset.getPath())) {

((List) duplicateAssets).remove(i);

                break;

}

your help in this regard would be highly appreciated.

Regards,

Vinod

Replies

Highlighted

mozzvinod

03-07-2019

Thanks for this.

But this does not answer my question on how dam:sha1 value is being achieved and what are the parameters being used to create this, when I upload a file in to AEM DAM.

Regards,

Vinod

Highlighted

Jörg_Hoh

Employee

05-07-2019

You have identified the logic behind the functionality (I cannot say if correct or not). But I don't get your question, sorry.

Jörg

Highlighted

mozzvinod

11-07-2019

Hi Jorg,

My question is very specific.

1. How dam:sha1 being created.

2. What are the parameters being used to create dam:sha1

This will help in determining the parameter used for deciding if asset is duplicate or not.

Regards,

Vinod

Highlighted

Jörg_Hoh

Employee

11-07-2019

This is not documented, thus I don't have this information available. If you want to have this implementation specific information disclosed, you should approach the Adobe support and ask them for this.

(And I still don't get the reason why you need to do this on your own instead of relying solely on the product feature.)

Highlighted

mozzvinod

11-07-2019

Thanks for the reply Jörg,

This is a question from business. So that they know basis which criteria/attributes duplicate assets being decided. Concern is that assets should not be declared duplicate unnecessarily. This should basis business requirement.

I hope the logic behind DAM:SHA1 creation is something that can be disclosed..

I will raise a support ticket with Adobe on this. Thanks.

Regards,

Vinod