Hi Everyone,
I’m looking for an effective way to identify duplicate assets in AEM (either by name, size, or binary content).
Is there any out-of-the-box functionality, query, or Groovy script that can help find duplicates in DAM - especially when they might exist in different folders or under different paths?
Solved! Go to Solution.
Views
Replies
Total Likes
Hi @garimak,
You can identify duplicate assets in AEM by checking the SHA-1 checksum value of each asset’s binary.
When you upload an asset, AEM automatically generates a SHA-1 hash for its binary and stores it under the node: /jcr:content/metadata/dam:sha1
For example:
/content/dam/geometrixx-outdoors/banners/best-season.jpg/jcr:content/metadata/dam:sha1
If the SHA-1 checksum of a newly uploaded asset matches an existing asset’s checksum, AEM detects it as a duplicate — regardless of the file name — and displays the “Duplicates Detected” dialog.
This ensures that even if two assets have different names, they are still flagged as duplicates when their binary content is identical.
In your case, you can leverage this property to find duplicates programmatically.
Since AEM doesn’t provide an out-of-the-box report for this, you can write a custom service or script that queries all assets under /content/dam, retrieves their dam:sha1 values, and groups them to identify duplicates.
Below is an example Groovy script that you can run in the Groovy console:
def map = [:]
def query = buildQuery("SELECT * FROM [dam:Asset] AS a WHERE ISDESCENDANTNODE([/content/dam])")
def result = query.execute()
result.nodes.each { node ->
def sha1 = node.getNode("jcr:content/metadata").getProperty("dam:sha1")?.string
if (sha1) {
map[sha1] = map.get(sha1, []) + node.path
}
}
map.findAll { it.value.size() > 1 }.each { sha1, paths ->
println "Duplicate binary hash: $sha1"
paths.each { println " - $it" }
}
This will list all assets that share the same binary hash, helping you identify duplicates across folders.
Hi @garimak,
You can identify duplicate assets in AEM by checking the SHA-1 checksum value of each asset’s binary.
When you upload an asset, AEM automatically generates a SHA-1 hash for its binary and stores it under the node: /jcr:content/metadata/dam:sha1
For example:
/content/dam/geometrixx-outdoors/banners/best-season.jpg/jcr:content/metadata/dam:sha1
If the SHA-1 checksum of a newly uploaded asset matches an existing asset’s checksum, AEM detects it as a duplicate — regardless of the file name — and displays the “Duplicates Detected” dialog.
This ensures that even if two assets have different names, they are still flagged as duplicates when their binary content is identical.
In your case, you can leverage this property to find duplicates programmatically.
Since AEM doesn’t provide an out-of-the-box report for this, you can write a custom service or script that queries all assets under /content/dam, retrieves their dam:sha1 values, and groups them to identify duplicates.
Below is an example Groovy script that you can run in the Groovy console:
def map = [:]
def query = buildQuery("SELECT * FROM [dam:Asset] AS a WHERE ISDESCENDANTNODE([/content/dam])")
def result = query.execute()
result.nodes.each { node ->
def sha1 = node.getNode("jcr:content/metadata").getProperty("dam:sha1")?.string
if (sha1) {
map[sha1] = map.get(sha1, []) + node.path
}
}
map.findAll { it.value.size() > 1 }.each { sha1, paths ->
println "Duplicate binary hash: $sha1"
paths.each { println " - $it" }
}
This will list all assets that share the same binary hash, helping you identify duplicates across folders.
Views
Likes
Replies
Views
Likes
Replies
Views
Likes
Replies
Views
Likes
Replies