Expand my Community achievements bar.

SOLVED

AEM 6.5 - Does Version Purge helps to reduce repository size by deleting versions? How to verify?

Avatar

Level 2

Hi,

 Our current repository size is around 25GB. We have online compaction enabled which runs on daily basis, also offline compaction once in month. Does version purge helps to reduce the repository size by deleting versions from /jcr:system/jcr:versionStorage? How to verify whether repository size reduced or not? Does disk usage report helps to verify in case we have shared S3 data storage?

Vikashyadav_1-1591882019246.png

 

 

 

 
 
 

 

 

Topics

Topics help categorize Community content and increase your ability to discover relevant content.

6.5
1 Accepted Solution

Avatar

Correct answer by
Employee

Version purge would help in reducing the repository size. You can plan and schedule offline compaction once a month/fortnight if the downtime is affordable. To add more into this, there are many things that can cause unusual increases in disk utilization. Some potential causes:

 

- Proper maintenance hasn't been run on the system.  See [0] article for details on various system maintenance activities. 

- Since the tar storage in Oak operates in an append-only mode, repeated saving of nodes further contributes to excessive repository growth.

- Very large file(s) have been uploaded to AEM Assets or package manager.

- Debug or Trace logging was left enabled.

 

If AEM is still running then we can enable a debug logger to tell us which repository paths are being written to.  To enable this logger, follow these steps:

- Go to http://aemhost:port/system/console/slinglog

- Click Add new logger

- Configure a logger: Log File: logs/repgrowth.log, Log Level: trace, Loggers: org.apache.jackrabbit.oak.jcr.operations.writes

 

 

Note: The log includes information regarding all writes and session details.  If you use this logger then make sure you have sufficient disk space.

 

 

You can also leverage the Disk Usage report http://host:port/etc/reports/diskusage.html. This report displays the disk space used by repository path.  The report is drillable, allowing you to view subtrees as well.

 

 

After using the repgrowth.log to get some idea of what data is being written, you can get information about what code is writing that data by capturing thread dumps and running CPU profiling. 

 

 

[0]: https://helpx.adobe.com/in/experience-manager/6-4/sites/deploying/using/revision-cleanup.html

 

 

 

 

 

 

 

View solution in original post

8 Replies

Avatar

Employee Advisor

The Purge Versions tool is intended for purging the versions of a node or a hierarchy of nodes in your repository. Its primary purpose is to help you reduce the size of your repository by removing old versions of your nodes.

 

In a default AEM installation, versions are created when you publish or unpublish pages or assets, upload or replace assets.  Versions are stored as nodes under /jcr:system/jcr:versionStorage in the Oak repository.  Those nodes keep references to binary files in the datastore.  Over time the versions pile up and this affects system performance and disk utilization.  The search indexes, Tar or Mongo storage and DataStore get bloated with additional data from old version histories. To reclaim the disk space and gain back system performance you need to run Version Purge.

 

RECOMMENDED SCHEDULE

This maintenance task needs to be run on a monthly basis.

 

SAMPLE LOG OUTPUT

Version purge will only output messages to the logs if it successfully purges versions.  If it fails to purge some versions it would throw an error and continue to purge other versions.

The log message below is an example of a successful purge of a version:

INFO [pool-11-thread-10-Maintenance Queue(com/adobe/granite/maintenance/job/VersionPurgeTask)] com.day.cq.wcm.core.impl.VersionManagerImpl Purged version 1.0 of /content/geometrixx/en/jcr:content

The error below is an example of a failed version purge:

ERROR [pool-11-thread-10-Maintenance Queue(com/adobe/granite/maintenance/job/VersionPurgeTask)] com.day.cq.wcm.core.impl.VersionManagerImpl Unable to purge version 1.1 for /content/geometrixx/en/jcr:content : OakIntegrity0001: Unable to delete referenced node
javax.jcr.ReferentialIntegrityException: OakIntegrity0001: Unable to delete referenced node
at org.apache.jackrabbit.oak.api.CommitFailedException.asRepositoryException(CommitFailedException.java:235)
at org.apache.jackrabbit.oak.api.CommitFailedException.asRepositoryException(CommitFailedException.java:212)
at org.apache.jackrabbit.oak.jcr.version.ReadWriteVersionManager.removeVersion(ReadWriteVersionManager.java

Check the repository size before purging and then execute the version purge. After that re-check the repository size. It should have been reduced.

 

Please check the below Adobe document:- https://helpx.adobe.com/in/experience-manager/6-3/sites/deploying/using/version-purging.html

https://helpx.adobe.com/in/experience-manager/kb/AEM6-Maintenance-Guide.html#versionpurge

Avatar

Employee

Version Purge definitely helps reduce the repository size.

Versioning in AEM occurs a bit differently for both Pages & Assets.

 

Versioning in pages: https://docs.adobe.com/content/help/en/experience-manager-65/authoring/siteandpage/working-with-page...

Versioning in Assets: https://docs.adobe.com/content/help/en/experience-manager-65/assets/managing/managing-assets-touch-u...

 

You can check the repo size before and after performing the version purge. You would see some reduction in repo size, depending on how many versions of pages & assets existed.

 

The following doc may be helpful in Version Purging:

https://docs.adobe.com/content/help/en/experience-manager-65/deploying/configuring/version-purging.h...

Avatar

Level 3

Purging version would definitely reduce the repository size. However, you would need to wait for the compaction cycle to happen for that. If you are on AEM 6.3 and above, online compaction would reclaim this space at 2:00 AM server time (default configured time)

 

To determine if version purge ran successfully, you can check your error.log files which should contain messages on the outcome of version purge. Also, you can open crx/explorer, click on any page or asset and check its version tree to see on how many versions it displays. Version purge would have removed the versions and they should not be visible in the version tree

Avatar

Employee
"However, you would need to wait for the compaction cycle to happen for that." <--- Key distinction. This is the important part to understand. The repository is essentially append-only. Version purge is going to "purge" versions by touching nodes under /jcr:system/jcr:versionStorage. This action is effectively going to add entries to the segmentstore by marking versions for future-deletion, which will add to the size. Revision cleanup is going to compact the repository and clean up all those 'to-delete' entries finally reducing the size. Then you run a datastore-gc cycle to clean up the binary data/blobs from your file-datastore/s3-bucket/whatever that are now unreferenced in the segmentstore you compacted..

Avatar

Level 2

Start revisionGC from jmx -repositorymanager after the Version Purge activity completes.

 

Or wait for the next online compaction to complete.

Avatar

Correct answer by
Employee

Version purge would help in reducing the repository size. You can plan and schedule offline compaction once a month/fortnight if the downtime is affordable. To add more into this, there are many things that can cause unusual increases in disk utilization. Some potential causes:

 

- Proper maintenance hasn't been run on the system.  See [0] article for details on various system maintenance activities. 

- Since the tar storage in Oak operates in an append-only mode, repeated saving of nodes further contributes to excessive repository growth.

- Very large file(s) have been uploaded to AEM Assets or package manager.

- Debug or Trace logging was left enabled.

 

If AEM is still running then we can enable a debug logger to tell us which repository paths are being written to.  To enable this logger, follow these steps:

- Go to http://aemhost:port/system/console/slinglog

- Click Add new logger

- Configure a logger: Log File: logs/repgrowth.log, Log Level: trace, Loggers: org.apache.jackrabbit.oak.jcr.operations.writes

 

 

Note: The log includes information regarding all writes and session details.  If you use this logger then make sure you have sufficient disk space.

 

 

You can also leverage the Disk Usage report http://host:port/etc/reports/diskusage.html. This report displays the disk space used by repository path.  The report is drillable, allowing you to view subtrees as well.

 

 

After using the repgrowth.log to get some idea of what data is being written, you can get information about what code is writing that data by capturing thread dumps and running CPU profiling. 

 

 

[0]: https://helpx.adobe.com/in/experience-manager/6-4/sites/deploying/using/revision-cleanup.html