Fetch duplicate assets | Community
Skip to main content
Level 2
October 10, 2025
Solved

Fetch duplicate assets

  • October 10, 2025
  • 1 reply
  • 227 views

Hi Everyone,

 

I’m looking for an effective way to identify duplicate assets in AEM (either by name, size, or binary content).
Is there any out-of-the-box functionality, query, or Groovy script that can help find duplicates in DAM - especially when they might exist in different folders or under different paths?

Best answer by SantoshSai

Hi @garimak,

You can identify duplicate assets in AEM by checking the SHA-1 checksum value of each asset’s binary.

When you upload an asset, AEM automatically generates a SHA-1 hash for its binary and stores it under the node: /jcr:content/metadata/dam:sha1

For example: 

/content/dam/geometrixx-outdoors/banners/best-season.jpg/jcr:content/metadata/dam:sha1

If the SHA-1 checksum of a newly uploaded asset matches an existing asset’s checksum, AEM detects it as a duplicate — regardless of the file name — and displays the “Duplicates Detected” dialog.
This ensures that even if two assets have different names, they are still flagged as duplicates when their binary content is identical.

In your case, you can leverage this property to find duplicates programmatically.
Since AEM doesn’t provide an out-of-the-box report for this, you can write a custom service or script that queries all assets under /content/dam, retrieves their dam:sha1 values, and groups them to identify duplicates.

Below is an example Groovy script that you can run in the Groovy console:

def map = [:]
def query = buildQuery("SELECT * FROM [dam:Asset] AS a WHERE ISDESCENDANTNODE([/content/dam])")
def result = query.execute()

result.nodes.each { node ->
    def sha1 = node.getNode("jcr:content/metadata").getProperty("dam:sha1")?.string
    if (sha1) {
        map[sha1] = map.get(sha1, []) + node.path
    }
}
map.findAll { it.value.size() > 1 }.each { sha1, paths ->
    println "Duplicate binary hash: $sha1"
    paths.each { println " - $it" }
}

This will list all assets that share the same binary hash, helping you identify duplicates across folders.

1 reply

SantoshSai
Community Advisor
SantoshSaiCommunity AdvisorAccepted solution
Community Advisor
October 10, 2025

Hi @garimak,

You can identify duplicate assets in AEM by checking the SHA-1 checksum value of each asset’s binary.

When you upload an asset, AEM automatically generates a SHA-1 hash for its binary and stores it under the node: /jcr:content/metadata/dam:sha1

For example: 

/content/dam/geometrixx-outdoors/banners/best-season.jpg/jcr:content/metadata/dam:sha1

If the SHA-1 checksum of a newly uploaded asset matches an existing asset’s checksum, AEM detects it as a duplicate — regardless of the file name — and displays the “Duplicates Detected” dialog.
This ensures that even if two assets have different names, they are still flagged as duplicates when their binary content is identical.

In your case, you can leverage this property to find duplicates programmatically.
Since AEM doesn’t provide an out-of-the-box report for this, you can write a custom service or script that queries all assets under /content/dam, retrieves their dam:sha1 values, and groups them to identify duplicates.

Below is an example Groovy script that you can run in the Groovy console:

def map = [:]
def query = buildQuery("SELECT * FROM [dam:Asset] AS a WHERE ISDESCENDANTNODE([/content/dam])")
def result = query.execute()

result.nodes.each { node ->
    def sha1 = node.getNode("jcr:content/metadata").getProperty("dam:sha1")?.string
    if (sha1) {
        map[sha1] = map.get(sha1, []) + node.path
    }
}
map.findAll { it.value.size() > 1 }.each { sha1, paths ->
    println "Duplicate binary hash: $sha1"
    paths.each { println " - $it" }
}

This will list all assets that share the same binary hash, helping you identify duplicates across folders.

Santosh Sai