Expand my Community achievements bar.

Dive into Adobe Summit 2024! Explore curated list of AEM sessions & labs, register, connect with experts, ask questions, engage, and share insights. Don't miss the excitement.
SOLVED

Preventing duplicate asset uploading in AEM 6.0

Avatar

Level 3

Hi All,

How can we prevent duplicate asset uploading in AEM 6.0 using the checksum. It should not be only name validation of the asset, but also content wise it should validate before triggering the DAM upload workflow. If there are any steps please let me know. Any links/blogs regarding the same also will help me a lot.

 

Regards,

Kaustav Majoomder

1 Accepted Solution

Avatar

Correct answer by
Employee Advisor

Hi Kaustav,

well, technically it's possible. Just calculate the hash value over the asset and store it in the asset itself. As a first step of the asset update workflow check if the asset is already there. In case it is, remove it and stop the workflow.

The problem is: How would you handle that from a UX perspective? When a user uploads an asset, she expects it to be in the folder she uploaded it to. It would be very surprising if the asset isn't there anymore.

kind regards,
Jörg

View solution in original post

10 Replies

Avatar

Level 3

Thanks Jörg and Scott for your valuable comments.

But, my doubt is, how can I calculate the hash value? My plan is to externalize the binary data(by creating .cfg file inside Install folder ). There multiple folders will be created, while you are going to upload one asset. So, in that case I have three doubts:

  1. How to calculate the hash value of an asset(I am very new to this approach. So, please bear with me smiley) .
  2. How will be the searching procedure over multiple folders.
  3. Will it impact the performance of the instance(because of this searching technique), if I am going to upload huge amount of asset(approx 500GB).

​Looking forward for your comments on the above.

Thanks again both of you in advance!!

 

Regards,

Kaustav Majoomder

Avatar

Level 10

You can write your own logic for this. You can write a custom AEM upload service and use Java in the Sling Servlet to read the byte stream and make sure its not duplicate content.

See the following article to learn how to write a custom AEM service to upload content to the AEM DAM:

http://helpx.adobe.com/experience-manager/using/uploading-files-aem1.html

This does not show you how to prevent duplicate content; however, you can extend this by adding your own functionality.  

Avatar

Correct answer by
Employee Advisor

Hi Kaustav,

well, technically it's possible. Just calculate the hash value over the asset and store it in the asset itself. As a first step of the asset update workflow check if the asset is already there. In case it is, remove it and stop the workflow.

The problem is: How would you handle that from a UX perspective? When a user uploads an asset, she expects it to be in the folder she uploaded it to. It would be very surprising if the asset isn't there anymore.

kind regards,
Jörg

Avatar

Level 10

To perform operations on the content to ensure that there are no dup content will take more time than if there is no such operations. You will have to do testing in your dev environments to get an accurate answer for that question. As far as getting hash values like Jorg suggested - this will not be a specific AEM operation - but reather a general Java use case.  There are lot of great resources online that can help you. For example:

http://www.codejava.net/coding/how-to-calculate-md5-and-sha-hash-values-in-java

Avatar

Employee Advisor

Hi,

to 1) You can get an inputstream from an asset like this: asset.getRendition("original").getStream(); and the compute the hash of it an store it as metadata on the asset itself.

to 2) Use JCR query to look for an asset (nodetype: dam:asset) with the hashValue being the computed one.

to 3) The performance of this search is likely quite good, as it is an exact match on a property, which should quite fast.

 

Please consider: AEM comes with Metadata writeback functionality. So whenever an editor changes the metadata of an asset, AEM will not only store the changed metadata as property to the asset node, but AEM will also incorporate the changed metadata back into the binary, which will likely change the hash value...

kind regards;
Jörg

Avatar

Level 3

Thanks Jörg and Scott again!!

I have created a piece of java code for calculating the hash value(using SHA-1 algorithm), as AEM is using the same algo for hashing the assets in DAM. Now, can I use JCR query for comparing it in DAM assets in a specific path? 

For example, I have calculated one image's hash value as "5231743e76ff4a5532953e6f9cecd8928baf040e", and I want to check whether it is available in the path "/content/dam/kaustav". In this scenario can I use the below JCR query:

http://localhost:4502/bin/querybuilder.json?fulltext=5231743e76ff4a5532953e6f9cecd8928baf040e&group....

Please suggest me on the above approach. If you have any alternate idea, feel free to share it as well.

Regards,

Kaustav Majoomder

Avatar

Employee

I wouldn't do a fulltext search. the hash is stored in dam:sha1 property. I would just query that property. 

Avatar

Level 3

Thanks Kaushal, but what if I have to search it in a specific path?

Avatar

Employee

You can still search under a specific path. I am not sure I understand how for a particular property value affects what path you want to search under. 

Avatar

Employee

You could try something like this. 

/jcr:root/content/dam//*[@jcr:primaryType='dam:AssetContent' and metadata/@dam:sha1='07614f5ec4a3bc5e3a822455fd52e1a0369429a0'] order by @jcr:score

Please note that I have not tuned this query for performance, just something I cooked up.