Query For Most Used Dam Assets In Content

Answers (2)

Answers (2)

thisthatheotter

17-07-2019

FYI - I only saw the following reports in Classic UI (under Tools) -- we're still using Classic UI:

/reports/healthcheck.html

/reports/auditreport.html

/reports/compreport.html

/reports/diskusage.html

/reports/userreport.html

/reports/wfinstances.html

/etc/reports/ugcreport.html

nothing for DAM asset use.

I couldn't find any reports in Touch UI (under Tools), and the suggested URL: /mnt/overlay/dam/gui/content/reports/reportlist.html was a 404 on my system.

We're running AEM 6.3.2.1.

thisthatheotter

16-07-2019

Thanks for the suggestion Hemant. I posed the question in more detail to Adobe Support:

Our DAM has grown large and we suspect many assets are unused and could be deleted. We're using Classic UI and can select Tools, References... in the DAM Admin to see the references to individual assets, but we'd like to check the whole DAM and find assets with 0 references, or 1, or 2, or an arbitrary number of references.

Can you assist with a method or query for reporting on asset use?

I have tried using the following:

http://www.wemblog.com/2012/12/how-to-remove-non-referenced-node-from.html

https://github.com/hashimkhan786/aem-groovy-scripts/blob/master/findUnusedAssets.groovy

but each incorrectly reports assets as unreferenced that are in fact referenced.

I would like to use a similar technique as the UI (Tools / References...) in Classic UI / DAM Admin, but for the whole DAM.

I'm not a programmer so a complete example / query would be appreciated.

I received the following answer from Adobe Support:

Please note that there is no out of box feature for this task and this should be implemented as a custom piece of code. The challenge here is the fact that we can only search the asset references on a page but it is not possible to search within a entire DAM to check if the asset is referenced by any page or not. The reason is that there is no property on the Assets which indicates any references to a page or any other asset. However the assets referenced on page could be determined by the property of the components node. This property could be queried to find out the references. But as you can imagine such query will only show asset reference in a page not the entire DAM.

So to find the asset references in DAM a piece of code can be written to do this.

Here is a "sample" Pseudo code

- set the search path to DAM root path

- iterate through each node using "NodeIterator"

- search property to find the references using "ReferenceSearch"

- Store results in java "Map" object

- return nodes/assets not referenced anywhere

- Recursively iterate through all the assets from root path

Please note that this task will be highly technical that would require skills in Java coding as there is no single query that would give us the results we are looking for.

So a better option for you might be engaging Adobe Consulting Services.

I don't code in Java, but Adobe Support's suggestions got me thinking and I came up with a roundabout solution using querybuilder, the references.json built in, and command line tools to get 2 files, one with unreferenced assets, and another with assets and their references:

# get list of assets, where /content/dam/site/folder is the folder you want to query

curl -s -u "admin:admin" "http://localhost:4502/bin/querybuilder.json?path=/content/dam/site/folder&type=dam:Asset&p.limit=-1" > assets-in-path.json

# clean up assets-in-path.json with jq, a command line json parser. install jq via brew install jq (or similar)

# jq is the Swiss army knife of json parsing.

cat assets-in-path.json | jq '.hits[].path'|sed 's/^"//g'|sed 's/"$//g' > clean-assets-in-path.txt

# create php file to urencode asset paths

sed 's/^/echo urlencode("/g' clean-assets-in-path.txt |sed 's/$/"),PHP_EOL;/g' > urlencode-assets-in-path.php

# add opening and closing php script tags to top and bottom of the urlencode-assets-in-path.php file, at top: <?php

# and at bottom: ?>

# run the php script

php urlencode-assets-in-path.php > urlencoded-assests-in-path.txt

# add utf-8 encode query parameter (to the curl commands that will be constructed by below script) as the assets may contain utf-8 hex encoded strings,

# this query parameter is necessary to get accurate results back from AEM - without it, the asset will be shown to have no references when in actuality, it may have references

sed 's/$/\&_charset_=utf-8/g' urlencoded-assests-in-path.txt > dam_assets.txt

# the following is a bash script that executes curl commands against the dam_assets.txt file above - save bash script as get-unreferenced.sh (chmod +x get-unreferenced.sh) and execute as ./get-unreferenced.sh dam_assets.txt

# the generated assets.txt file is overwritten for each curl command result

# when the result is empty, the unreferenced asset is written to unused_dam_assets.txt

- - - - - - - - - - -

#!/bin/bash

while IFS= read -r line; do

dam_asset=$line

curl -s -u "admin:admin" "http://localhost:4502/bin/wcm/references.json?path=$dam_asset" > assets.txt

grep -q '\[\]' assets.txt

if [ $? -eq 0 ]

then

  echo "$dam_asset" >> unused_dam_assets.txt

  #exit 0

fi

done < "$1"

- - - - - - - - - - -

For referenced assets, I used the following, as above, get-used.sh dam_assets.txt:

- - - - - - - - - - -

#!/bin/bash

while IFS= read -r line; do

dam_asset=$line

curl -s -u "admin:admin" "http://localhost:4502/bin/wcm/references.json?path=$dam_asset" >> used.txt

done < "$1"

- - - - - - - - - - -

Command to print assets and references as csv:

jq --raw-output '.pages | group_by(.srcPath)[] | [.[0].srcPath, .[0].published, .[].references[]] | @csv' used.txt

# Useful commands

From stack exchange (I needed to install and use ggrep to do on Mac OS X):

- - - - - - - - - - -

# grep for non-ascii characters

grep --color='auto' -P -n "[\x80-\xFF]" file.xml

This will give you the line number, and will highlight non-ascii chars in red.

In some systems, depending on your settings, the above will not work, so you can grep by the inverse

grep --color='auto' -P -n "[^\x00-\x7F]" file.xml

- - - - - - - - - - -

# you may want to urldecode the unreferenced assets to view them more easily, etc. I needed to install and use gsed to do on Mac OS X:

/usr/local/bin/gsed 's/^/echo urldecode("/g' unused_dam_assets.txt |gsed 's/$/"),PHP_EOL;/g' > urldecode-unused.php

If you have issues with an older version of jq, sometimes upgrading it can resolve.

If you use the results to delete unused assets via cURL, you may need to add "sleep 1" (or some other number) commands between each cURL command if the script executes too fast for the system to handle the deletes.