Expand my Community achievements bar.

SOLVED

Asset clean up

Avatar

Level 2

Hi Team,

 

This use case is for cleaning of the assets from the author which are unreferenced and we want to have the same use case for workflow deactivation of the page 

 

Below is the use case:

Step 1: Create page A and use image component with image A authored and publish the page A.(NOTE : Image A is newly uploaded asset and is referenced only in page A. Also it is published.)

Step 2: go to damadmin and open image A - if we see under the references tab - it will list only one reference i.e. page A

Step 3: Now update the image component with image B in page A and do not publish the page A.

Step 4: In Author - Page A has image B referenced and in Publish - Page A has image A referenced.

 

Assuming we want to clean up unreferenced assets then Image A is orphan as it is not having any references. If we deactivate this asset then at the publisher end - page will break as asset will not be found.

 

We are looking for some APIs which will provide asset reference go through previous versions of the page A and check if the asset is really orphan or not. 

 

Thanks in advance

1 Accepted Solution

Avatar

Correct answer by
Employee

There's not a OOTB way to handle this since as you noted, AEM Author doesn't always reflect the state of AEM Publish.

There are 2 main approaches here:

1. Integrating it into the Asset Details UI on a "per asset basis", requiring someone to click into each asset to asset its use.

2. Building a tool that runs on lists of asset paths, and outputs a report of what is used / isn't used.

The good/bad news is both are custom efforts

 

To actually achieve the data needed to make this decision, ill break out the general approach by use-case:

 

1. Integrating into Asset Details UI

* Leverage the OOTB AEM Author references to determine references to the asset on AEM Author

* Leverage the OOTB Reference servlet (HTTP GET http://localhost:4503/bin/wcm/references?path=/content/dam/wknd/en/magazine/la-skateparks/article_01...) to get the references for the AEM Publish tier per asset assuming all AEM Publishes are considered consistent (you can event set exact=false, and query on folders!) Take the JSON results and using custom code inject them into the Asset Details UI.

 

2a. Build a tool that runs in AEM

  * Write code in AEM that:

     * Leverages the AssetReferenceSearch OSGi service to find all usages on AEM Author

     * Makes HTTP GET to the OOTB Reference servlet (see above) on AEM Publish to collect usage there

     * Collate the usages to determinate what's used

 

2b. Build a tool that runs outside of AEM

    * Export list of assets to input into tool (or even, provide root path like /content/dam)

     * Have tool call OOTB Reference servlet on AEM Author (w/ credentials)

     * Have tool call OOTB reference servlet on AEM Publish

 

Obviously, there's alot of implementation details that can't be enumerated here, but I think there are your 3 best approaches based on how you want to perform the validations, and what infrastructure you have to run these jobs.

Notes:
* if you use pass in folders with many assets, you may need to adjust the ReferenceSearchServlet's referencesearchservlet.maxPages and referencesearchservlet.maxReferencesPerPage OSGi config properties.

* If you have protected (non anonymous) pages on AEM Publish, youll need to authenticate the calls to the Reference servlet appropriately so it can "see" all the pages.

 

View solution in original post

3 Replies

Avatar

Level 2
Both AssetReferenceSearch and ReferenceSearch API will provide references from the current page version. We need to visit previous 2-3 versions of the page and check if the asset is referenced or not . Any API on this front.

Avatar

Correct answer by
Employee

There's not a OOTB way to handle this since as you noted, AEM Author doesn't always reflect the state of AEM Publish.

There are 2 main approaches here:

1. Integrating it into the Asset Details UI on a "per asset basis", requiring someone to click into each asset to asset its use.

2. Building a tool that runs on lists of asset paths, and outputs a report of what is used / isn't used.

The good/bad news is both are custom efforts

 

To actually achieve the data needed to make this decision, ill break out the general approach by use-case:

 

1. Integrating into Asset Details UI

* Leverage the OOTB AEM Author references to determine references to the asset on AEM Author

* Leverage the OOTB Reference servlet (HTTP GET http://localhost:4503/bin/wcm/references?path=/content/dam/wknd/en/magazine/la-skateparks/article_01...) to get the references for the AEM Publish tier per asset assuming all AEM Publishes are considered consistent (you can event set exact=false, and query on folders!) Take the JSON results and using custom code inject them into the Asset Details UI.

 

2a. Build a tool that runs in AEM

  * Write code in AEM that:

     * Leverages the AssetReferenceSearch OSGi service to find all usages on AEM Author

     * Makes HTTP GET to the OOTB Reference servlet (see above) on AEM Publish to collect usage there

     * Collate the usages to determinate what's used

 

2b. Build a tool that runs outside of AEM

    * Export list of assets to input into tool (or even, provide root path like /content/dam)

     * Have tool call OOTB Reference servlet on AEM Author (w/ credentials)

     * Have tool call OOTB reference servlet on AEM Publish

 

Obviously, there's alot of implementation details that can't be enumerated here, but I think there are your 3 best approaches based on how you want to perform the validations, and what infrastructure you have to run these jobs.

Notes:
* if you use pass in folders with many assets, you may need to adjust the ReferenceSearchServlet's referencesearchservlet.maxPages and referencesearchservlet.maxReferencesPerPage OSGi config properties.

* If you have protected (non anonymous) pages on AEM Publish, youll need to authenticate the calls to the Reference servlet appropriately so it can "see" all the pages.