Expand my Community achievements bar.

Submissions are now open for the 2026 Adobe Experience Maker Awards.
SOLVED

Fetch duplicate assets

Avatar

Level 1

Hi Everyone,

 

I’m looking for an effective way to identify duplicate assets in AEM (either by name, size, or binary content).
Is there any out-of-the-box functionality, query, or Groovy script that can help find duplicates in DAM - especially when they might exist in different folders or under different paths?

1 Accepted Solution

Avatar

Correct answer by
Community Advisor

Hi @garimak,

You can identify duplicate assets in AEM by checking the SHA-1 checksum value of each asset’s binary.

When you upload an asset, AEM automatically generates a SHA-1 hash for its binary and stores it under the node: /jcr:content/metadata/dam:sha1

For example: 

/content/dam/geometrixx-outdoors/banners/best-season.jpg/jcr:content/metadata/dam:sha1

If the SHA-1 checksum of a newly uploaded asset matches an existing asset’s checksum, AEM detects it as a duplicate — regardless of the file name — and displays the “Duplicates Detected” dialog.
This ensures that even if two assets have different names, they are still flagged as duplicates when their binary content is identical.

In your case, you can leverage this property to find duplicates programmatically.
Since AEM doesn’t provide an out-of-the-box report for this, you can write a custom service or script that queries all assets under /content/dam, retrieves their dam:sha1 values, and groups them to identify duplicates.

Below is an example Groovy script that you can run in the Groovy console:

def map = [:]
def query = buildQuery("SELECT * FROM [dam:Asset] AS a WHERE ISDESCENDANTNODE([/content/dam])")
def result = query.execute()

result.nodes.each { node ->
    def sha1 = node.getNode("jcr:content/metadata").getProperty("dam:sha1")?.string
    if (sha1) {
        map[sha1] = map.get(sha1, []) + node.path
    }
}
map.findAll { it.value.size() > 1 }.each { sha1, paths ->
    println "Duplicate binary hash: $sha1"
    paths.each { println " - $it" }
}

This will list all assets that share the same binary hash, helping you identify duplicates across folders.


Santosh Sai

AEM BlogsLinkedIn


View solution in original post

1 Reply

Avatar

Correct answer by
Community Advisor

Hi @garimak,

You can identify duplicate assets in AEM by checking the SHA-1 checksum value of each asset’s binary.

When you upload an asset, AEM automatically generates a SHA-1 hash for its binary and stores it under the node: /jcr:content/metadata/dam:sha1

For example: 

/content/dam/geometrixx-outdoors/banners/best-season.jpg/jcr:content/metadata/dam:sha1

If the SHA-1 checksum of a newly uploaded asset matches an existing asset’s checksum, AEM detects it as a duplicate — regardless of the file name — and displays the “Duplicates Detected” dialog.
This ensures that even if two assets have different names, they are still flagged as duplicates when their binary content is identical.

In your case, you can leverage this property to find duplicates programmatically.
Since AEM doesn’t provide an out-of-the-box report for this, you can write a custom service or script that queries all assets under /content/dam, retrieves their dam:sha1 values, and groups them to identify duplicates.

Below is an example Groovy script that you can run in the Groovy console:

def map = [:]
def query = buildQuery("SELECT * FROM [dam:Asset] AS a WHERE ISDESCENDANTNODE([/content/dam])")
def result = query.execute()

result.nodes.each { node ->
    def sha1 = node.getNode("jcr:content/metadata").getProperty("dam:sha1")?.string
    if (sha1) {
        map[sha1] = map.get(sha1, []) + node.path
    }
}
map.findAll { it.value.size() > 1 }.each { sha1, paths ->
    println "Duplicate binary hash: $sha1"
    paths.each { println " - $it" }
}

This will list all assets that share the same binary hash, helping you identify duplicates across folders.


Santosh Sai

AEM BlogsLinkedIn