Expand my Community achievements bar.

SOLVED

Repository not reducing even after deleting lot of dam assets AEM 6.4

Avatar

Level 1

I have almost clear whole dam via script and tested. Sites are working fine. all dam folders and assets added by users are deleted. and then run 
1. java -jar oak-run-1.8.4.jar checkpoints crx-quickstart/repository/segmentstore
2.  java -jar oak-run-1.8.4.jar checkpoints crx-quickstart/repository/segmentstore rm-unreferenced
3.   java -jar -Dsun.arch.data.model=32 oak-run-1.8.4.jar compact crx-quickstart/repository/segmentstore
after compact however it freed the size of 35GB. but my aem total size was around 865GB. and I was expecting to reduce more used size. however I haven't deleted the sites which were using these assets. The dam size was quite big in 700+GB.
Can someone guide me how to clear the size of application?
Server used is Rhel 9.

Topics

Topics help categorize Community content and increase your ability to discover relevant content.

1 Accepted Solution

Avatar

Correct answer by
Level 10

Maybe binary data is stored independently from the content nodes? i.e the binary data is stored in a data store, whereas content nodes are stored in a node store. Source: here.

When using TarMK with a separate datastore (which stores the actual binary files), oak-run compaction only cleans the segment store (metadata/references), not the actual binary data. The 700GB+ of DAM assets are stored as binaries in the datastore, which requires a separate cleanup process.

To run Datastore Garbage Collection Via JMX Console:

  1. Go to JMX Console: https://<serveraddress:port>/system/console/jmx
  2. Search for RepositoryManagement MBean

  3. Click startDataStoreGC(boolean markOnly)

  4. Enter false for the markOnly parameter and click Invoke​

Please note that Datastore garbage collection will not collect files deleted in the last 24 hours. You may need to wait and run it again if you just deleted the assets:

giuseppebaglio_0-1762765044577.png

 

Here is a list of additional tasks you can check:

  • After offline compaction completes, check your segmentstore directory for .tar.bak files. These are backup files created during compaction that can be safely deleted once you verify the instance is stable.
  • Navigate to Tools > Operations > Maintenance > Daily Maintenance Window > Lucene Binaries Cleanup to clean up old Lucene index binaries.

 

 

View solution in original post

5 Replies

Avatar

Community Advisor and Adobe Champion

This sound like you're definitely on the right track by deleting DAM assets and running oak-run to compact the repository, but there are a few key reasons why the overall size of your AEM instance might not be reducing as much as expected — especially in AEM 6.4.


Why the Repository Size Didn't Drop Much:

  1. Pages May Still Reference Deleted DAM Assets
    Even though the assets are gone from /content/dam, site pages that were using them might still contain references (like fileReference or data properties). These lingering references can prevent the garbage collection and compaction process from fully cleaning them out.
  2. Version History Still Retains Data
    AEM automatically stores older versions of assets and pages under /jcr:system/jcr:versionStorage, which can build up over time. Deleting assets doesn’t automatically remove their version history — and that version history can be huge.
  3. Workflows, Audit Logs, and Other System Nodes Add Up
    Paths like /var/workflow/instances, /var/audit, and /var/eventing can grow silently and hold on to data even after assets are deleted. These should also be purged regularly.
  4. Assets Might Be Embedded in Pages Instead of Stored in DAM
    In some cases, assets are uploaded directly inside components at the jcr:content level of a page, like:
    • /content/mysite/en/page/jcr:content/image/file
      These types of binary files don’t live in DAM, so deleting DAM assets doesn’t affect them — and since they’re tied to pages, they also get versioned. This is one of the reasons why using the DAM (/content/dam) is a best practice in AEM: it keeps assets centralized, processed by asset workflows, and much easier to manage and clean up.

What You Can Do Next:

  1. Run a version purge: Use the official Adobe AEM Version Purge Configuration: https://experienceleague.adobe.com/en/docs/experience-manager-65/content/implementing/deploying/conf... to automatically remove old versions of assets and pages. This helps clear data stored under /jcr:system/jcr:versionStorage.
  2. Check for embedded binaries in /content: Use a query or Groovy script to scan for file nodes or jcr:content/renditions under your site structure. These are often large, directly embedded binaries that don’t live in DAM and can silently grow over time.
  3. Purge workflows and audit data: Clean up system paths like /var/workflow, /var/audit, /var/eventing, and others that may store large logs or payloads long after they're needed.

Re-run oak-run compaction after the cleanup steps — and make sure AEM is fully stopped when doing this to ensure maximum cleanup efficiency.

Let me know how it goes.

Avatar

Correct answer by
Level 10

Maybe binary data is stored independently from the content nodes? i.e the binary data is stored in a data store, whereas content nodes are stored in a node store. Source: here.

When using TarMK with a separate datastore (which stores the actual binary files), oak-run compaction only cleans the segment store (metadata/references), not the actual binary data. The 700GB+ of DAM assets are stored as binaries in the datastore, which requires a separate cleanup process.

To run Datastore Garbage Collection Via JMX Console:

  1. Go to JMX Console: https://<serveraddress:port>/system/console/jmx
  2. Search for RepositoryManagement MBean

  3. Click startDataStoreGC(boolean markOnly)

  4. Enter false for the markOnly parameter and click Invoke​

Please note that Datastore garbage collection will not collect files deleted in the last 24 hours. You may need to wait and run it again if you just deleted the assets:

giuseppebaglio_0-1762765044577.png

 

Here is a list of additional tasks you can check:

  • After offline compaction completes, check your segmentstore directory for .tar.bak files. These are backup files created during compaction that can be safely deleted once you verify the instance is stable.
  • Navigate to Tools > Operations > Maintenance > Daily Maintenance Window > Lucene Binaries Cleanup to clean up old Lucene index binaries.

 

 

Avatar

Level 1

Thank you for saving me, even after compaction it was not  helping me.
What worked for me is 

  1. Go to JMX Console: https://<serveraddress:port>/system/console/jmx
  2. Search for RepositoryManagement MBean

  3. Click startDataStoreGC(boolean markOnly)

  4. Enter false for the markOnly parameter and click Invoke​.

    My question is how frequently we should do this cleaning process?

    Thanks again for your great help:)

Avatar

Level 10

Weekly is the standard schedule: How does data store garbage collection work? 

Datastore garbage collection will not collect files deleted in the last 24 hours. This is by design as a safety mechanism. So even if you run DSGC immediately after deleting assets, those binaries won't be removed yet. 

You might need to run DSGC more often than weekly if:​

  • Heavy DAM activity with frequent asset uploads, deletions, or updates

  • Large replication volumes with high publish activation activity create temporary files

  • Workflow-heavy environments, ie you have many workflow payloads generating temporary assets

  • Disk space is becoming a concern

You can automate the cleanup by navigating to Tools > Operations > Maintenance > Weekly Maintenance Window and ensuring the Data Store Garbage Collection task is enabled (customize the start time as needed to match an off-load period)

Avatar

Community Advisor

Hi @NituDh ,

Even after deleting DAM assets, AEM may not free much space because the content is still referenced (by pages or renditions), or old versions still exist.

To truly reduce repository size:

Purge DAM asset versions use “Version Purge” tool or script.

Run datastore garbage collection (DataStoreGarbageCollector).

Ensure no references exist to deleted assets.

Then run repository compaction again.

Only after removing all references, old versions, and binary garbage will the size drop significantly.

Hrishikesh Kagane