Our current repository size is around 25GB. We have online compaction enabled which runs on daily basis, also offline compaction once in month. Does version purge helps to reduce the repository size by deleting versions from /jcr:system/jcr:versionStorage? How to verify whether repository size reduced or not? Does disk usage report helps to verify in case we have shared S3 data storage?
Version purge would help in reducing the repository size. You can plan and schedule offline compaction once a month/fortnight if the downtime is affordable. To add more into this, there are many things that can cause unusual increases in disk utilization. Some potential causes:
- Proper maintenance hasn't been run on the system. See  article for details on various system maintenance activities.
- Since the tar storage in Oak operates in an append-only mode, repeated saving of nodes further contributes to excessive repository growth.
- Very large file(s) have been uploaded to AEM Assets or package manager.
- Debug or Trace logging was left enabled.
If AEM is still running then we can enable a debug logger to tell us which repository paths are being written to. To enable this logger, follow these steps:
The Purge Versions tool is intended for purging the versions of a node or a hierarchy of nodes in your repository. Its primary purpose is to help you reduce the size of your repository by removing old versions of your nodes.
In a default AEM installation, versions are created when you publish or unpublish pages or assets, upload or replace assets. Versions are stored as nodes under /jcr:system/jcr:versionStorage in the Oak repository. Those nodes keep references to binary files in the datastore. Over time the versions pile up and this affects system performance and disk utilization. The search indexes, Tar or Mongo storage and DataStore get bloated with additional data from old version histories. To reclaim the disk space and gain back system performance you need to run Version Purge.
This maintenance task needs to be run on a monthly basis.
SAMPLE LOG OUTPUT
Version purge will only output messages to the logs if it successfully purges versions. If it fails to purge some versions it would throw an error and continue to purge other versions.
The log message below is an example of a successful purge of a version:
INFO [pool-11-thread-10-Maintenance Queue(com/adobe/granite/maintenance/job/VersionPurgeTask)] com.day.cq.wcm.core.impl.VersionManagerImpl Purged version 1.0 of /content/geometrixx/en/jcr:content
The error below is an example of a failed version purge:
ERROR [pool-11-thread-10-Maintenance Queue(com/adobe/granite/maintenance/job/VersionPurgeTask)] com.day.cq.wcm.core.impl.VersionManagerImpl Unable to purge version 1.1 for /content/geometrixx/en/jcr:content : OakIntegrity0001: Unable to delete referenced node
javax.jcr.ReferentialIntegrityException: OakIntegrity0001: Unable to delete referenced node
Check the repository size before purging and then execute the version purge. After that re-check the repository size. It should have been reduced.
Purging version would definitely reduce the repository size. However, you would need to wait for the compaction cycle to happen for that. If you are on AEM 6.3 and above, online compaction would reclaim this space at 2:00 AM server time (default configured time)
To determine if version purge ran successfully, you can check your error.log files which should contain messages on the outcome of version purge. Also, you can open crx/explorer, click on any page or asset and check its version tree to see on how many versions it displays. Version purge would have removed the versions and they should not be visible in the version tree