Hi,
Our current repository size is around 25GB. We have online compaction enabled which runs on daily basis, also offline compaction once in month. Does version purge helps to reduce the repository size by deleting versions from /jcr:system/jcr:versionStorage? How to verify whether repository size reduced or not? Does disk usage report helps to verify in case we have shared S3 data storage?
Solved! Go to Solution.
Topics help categorize Community content and increase your ability to discover relevant content.
Views
Replies
Total Likes
Version purge would help in reducing the repository size. You can plan and schedule offline compaction once a month/fortnight if the downtime is affordable. To add more into this, there are many things that can cause unusual increases in disk utilization. Some potential causes:
- Proper maintenance hasn't been run on the system. See [0] article for details on various system maintenance activities.
- Since the tar storage in Oak operates in an append-only mode, repeated saving of nodes further contributes to excessive repository growth.
- Very large file(s) have been uploaded to AEM Assets or package manager.
- Debug or Trace logging was left enabled.
If AEM is still running then we can enable a debug logger to tell us which repository paths are being written to. To enable this logger, follow these steps:
- Go to http://aemhost:port/system/console/slinglog
- Click Add new logger
- Configure a logger: Log File: logs/repgrowth.log, Log Level: trace, Loggers: org.apache.jackrabbit.oak.jcr.operations.writes
Note: The log includes information regarding all writes and session details. If you use this logger then make sure you have sufficient disk space.
You can also leverage the Disk Usage report http://host:port/etc/reports/diskusage.html. This report displays the disk space used by repository path. The report is drillable, allowing you to view subtrees as well.
After using the repgrowth.log to get some idea of what data is being written, you can get information about what code is writing that data by capturing thread dumps and running CPU profiling.
[0]: https://helpx.adobe.com/in/experience-manager/6-4/sites/deploying/using/revision-cleanup.html
The Purge Versions tool is intended for purging the versions of a node or a hierarchy of nodes in your repository. Its primary purpose is to help you reduce the size of your repository by removing old versions of your nodes.
In a default AEM installation, versions are created when you publish or unpublish pages or assets, upload or replace assets. Versions are stored as nodes under /jcr:system/jcr:versionStorage in the Oak repository. Those nodes keep references to binary files in the datastore. Over time the versions pile up and this affects system performance and disk utilization. The search indexes, Tar or Mongo storage and DataStore get bloated with additional data from old version histories. To reclaim the disk space and gain back system performance you need to run Version Purge.
RECOMMENDED SCHEDULE
This maintenance task needs to be run on a monthly basis.
SAMPLE LOG OUTPUT
Version purge will only output messages to the logs if it successfully purges versions. If it fails to purge some versions it would throw an error and continue to purge other versions.
The log message below is an example of a successful purge of a version:
INFO [pool-11-thread-10-Maintenance Queue(com/adobe/granite/maintenance/job/VersionPurgeTask)] com.day.cq.wcm.core.impl.VersionManagerImpl Purged version 1.0 of /content/geometrixx/en/jcr:content
The error below is an example of a failed version purge:
Check the repository size before purging and then execute the version purge. After that re-check the repository size. It should have been reduced.
Please check the below Adobe document:- https://helpx.adobe.com/in/experience-manager/6-3/sites/deploying/using/version-purging.html
https://helpx.adobe.com/in/experience-manager/kb/AEM6-Maintenance-Guide.html#versionpurge
Views
Replies
Total Likes
Version Purge definitely helps reduce the repository size.
Versioning in AEM occurs a bit differently for both Pages & Assets.
Versioning in pages: https://docs.adobe.com/content/help/en/experience-manager-65/authoring/siteandpage/working-with-page...
Versioning in Assets: https://docs.adobe.com/content/help/en/experience-manager-65/assets/managing/managing-assets-touch-u...
You can check the repo size before and after performing the version purge. You would see some reduction in repo size, depending on how many versions of pages & assets existed.
The following doc may be helpful in Version Purging:
Purging version would definitely reduce the repository size. However, you would need to wait for the compaction cycle to happen for that. If you are on AEM 6.3 and above, online compaction would reclaim this space at 2:00 AM server time (default configured time)
To determine if version purge ran successfully, you can check your error.log files which should contain messages on the outcome of version purge. Also, you can open crx/explorer, click on any page or asset and check its version tree to see on how many versions it displays. Version purge would have removed the versions and they should not be visible in the version tree
Views
Replies
Total Likes
Start revisionGC from jmx -repositorymanager after the Version Purge activity completes.
Or wait for the next online compaction to complete.
Views
Replies
Total Likes
Version purge would help in reducing the repository size. You can plan and schedule offline compaction once a month/fortnight if the downtime is affordable. To add more into this, there are many things that can cause unusual increases in disk utilization. Some potential causes:
- Proper maintenance hasn't been run on the system. See [0] article for details on various system maintenance activities.
- Since the tar storage in Oak operates in an append-only mode, repeated saving of nodes further contributes to excessive repository growth.
- Very large file(s) have been uploaded to AEM Assets or package manager.
- Debug or Trace logging was left enabled.
If AEM is still running then we can enable a debug logger to tell us which repository paths are being written to. To enable this logger, follow these steps:
- Go to http://aemhost:port/system/console/slinglog
- Click Add new logger
- Configure a logger: Log File: logs/repgrowth.log, Log Level: trace, Loggers: org.apache.jackrabbit.oak.jcr.operations.writes
Note: The log includes information regarding all writes and session details. If you use this logger then make sure you have sufficient disk space.
You can also leverage the Disk Usage report http://host:port/etc/reports/diskusage.html. This report displays the disk space used by repository path. The report is drillable, allowing you to view subtrees as well.
After using the repgrowth.log to get some idea of what data is being written, you can get information about what code is writing that data by capturing thread dumps and running CPU profiling.
[0]: https://helpx.adobe.com/in/experience-manager/6-4/sites/deploying/using/revision-cleanup.html
Views
Likes
Replies