Hi All,
How can we prevent duplicate asset uploading in AEM 6.0 using the checksum. It should not be only name validation of the asset, but also content wise it should validate before triggering the DAM upload workflow. If there are any steps please let me know. Any links/blogs regarding the same also will help me a lot.
Regards,
Kaustav Majoomder
Solved! Go to Solution.
Views
Replies
Total Likes
Hi Kaustav,
well, technically it's possible. Just calculate the hash value over the asset and store it in the asset itself. As a first step of the asset update workflow check if the asset is already there. In case it is, remove it and stop the workflow.
The problem is: How would you handle that from a UX perspective? When a user uploads an asset, she expects it to be in the folder she uploaded it to. It would be very surprising if the asset isn't there anymore.
kind regards,
Jörg
Thanks Jörg and Scott for your valuable comments.
But, my doubt is, how can I calculate the hash value? My plan is to externalize the binary data(by creating .cfg file inside Install folder ). There multiple folders will be created, while you are going to upload one asset. So, in that case I have three doubts:
Looking forward for your comments on the above.
Thanks again both of you in advance!!
Regards,
Kaustav Majoomder
You can write your own logic for this. You can write a custom AEM upload service and use Java in the Sling Servlet to read the byte stream and make sure its not duplicate content.
See the following article to learn how to write a custom AEM service to upload content to the AEM DAM:
http://helpx.adobe.com/experience-manager/using/uploading-files-aem1.html
This does not show you how to prevent duplicate content; however, you can extend this by adding your own functionality.
Hi Kaustav,
well, technically it's possible. Just calculate the hash value over the asset and store it in the asset itself. As a first step of the asset update workflow check if the asset is already there. In case it is, remove it and stop the workflow.
The problem is: How would you handle that from a UX perspective? When a user uploads an asset, she expects it to be in the folder she uploaded it to. It would be very surprising if the asset isn't there anymore.
kind regards,
Jörg
To perform operations on the content to ensure that there are no dup content will take more time than if there is no such operations. You will have to do testing in your dev environments to get an accurate answer for that question. As far as getting hash values like Jorg suggested - this will not be a specific AEM operation - but reather a general Java use case. There are lot of great resources online that can help you. For example:
http://www.codejava.net/coding/how-to-calculate-md5-and-sha-hash-values-in-java
Views
Replies
Total Likes
Hi,
to 1) You can get an inputstream from an asset like this: asset.getRendition("original").getStream(); and the compute the hash of it an store it as metadata on the asset itself.
to 2) Use JCR query to look for an asset (nodetype: dam:asset) with the hashValue being the computed one.
to 3) The performance of this search is likely quite good, as it is an exact match on a property, which should quite fast.
Please consider: AEM comes with Metadata writeback functionality. So whenever an editor changes the metadata of an asset, AEM will not only store the changed metadata as property to the asset node, but AEM will also incorporate the changed metadata back into the binary, which will likely change the hash value...
kind regards;
Jörg
Thanks Jörg and Scott again!!
I have created a piece of java code for calculating the hash value(using SHA-1 algorithm), as AEM is using the same algo for hashing the assets in DAM. Now, can I use JCR query for comparing it in DAM assets in a specific path?
For example, I have calculated one image's hash value as "5231743e76ff4a5532953e6f9cecd8928baf040e", and I want to check whether it is available in the path "/content/dam/kaustav". In this scenario can I use the below JCR query:
Please suggest me on the above approach. If you have any alternate idea, feel free to share it as well.
Regards,
Kaustav Majoomder
I wouldn't do a fulltext search. the hash is stored in dam:sha1 property. I would just query that property.
Thanks Kaushal, but what if I have to search it in a specific path?
You can still search under a specific path. I am not sure I understand how for a particular property value affects what path you want to search under.
You could try something like this.
/jcr:root/content/dam//*[@jcr:primaryType='dam:AssetContent' and metadata/@dam:sha1='07614f5ec4a3bc5e3a822455fd52e1a0369429a0'] order by @jcr:score
Please note that I have not tuned this query for performance, just something I cooked up.