Disk Usage Report much smaller than disk space consumed | Community
Skip to main content
June 27, 2019
Solved

Disk Usage Report much smaller than disk space consumed

  • June 27, 2019
  • 25 replies
  • 11839 views

We have an AEM 6.2 instance with Hotfix 17578 (cq-6.2.0-hotfix-17578) installed so we are on Oak 1.4.17 and CFP18.

We run garbage collection, version cleanup, and workflow cleanup daily.  We have a separate datastore and segment store.  We have performed offline compaction but that only applies to the segmentstore.

The disk usage report in AEM (/etc/reports/diskusage.html) reports we are using ~18 GB of data.  However, our datastore has grown to ~170 GB of data.  I cannot find any way to figure out how to reduce this or where this is coming from.

From this page (Analyze unusual repository growth ) I can see we even have a few files between 1-6 GB but there is definitely no file in our DAM or packages that big.  The entire DAM according to the usage report is ~6 GB.

What can I do to reduce the size of our datastore?  What is causing this problem?

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.
Best answer by

Here is what I have done that seems to have resolved the issue.  If someone could let me know if there is some issue I am not seeing here but otherwise, it seems to have worked beautifully.

java -jar crx2oak-1.8.6-all-in-one.jar segment-old:/content/aem/crx-quickstart/repository segment-old:/content/backup/ --include-path=/ --src-datastore=/content/aem/datastore --datastore=/content/backup/datastore/

Running this command to pretend as if I am upgrading, but using "segment-old" for source and target, I was able to create a repository that is ~11 GB compared to the previous ~170 GB and all seems to work successfully after.

The only concern is this page (InvalidFileStoreVersionException migrating from older version to 6.3 using CRX2Oak ) only mentions using "segment-old" for the source repository but it doesn't seem to cause a problem with the destination repository.

25 replies

Adobe Employee
June 27, 2019

Hello,

How often do you run the datastore garbage collection [1] ?

The biggest gain in recovering the disk space occupied by Datastore is by running Datastore Garbage after running Offline Tar Compaction.

You can put TRACE log on org.apache.jackrabbit.oak.operations.blobs to see what is being written to datastore.

[1] https://helpx.adobe.com/experience-manager/6-2/sites/administering/using/data-store-garbage-collection.html

Regards,

Vishu

Adobe Employee
June 27, 2019

Please check this thread:

AEM - Ways to reduce repository size

antoniom5495929
Level 7
June 27, 2019

Hi,

I'm agree with Vish.dhaliwal.

You can avoid this behaviour by running a datastore garbage collection.

Please keep in mind that the datastore garbage collection need to be executed after a compaction (better if is an offline compation) just becase in that way:

-Compaction: remove all the segment which are not used

-Garbage collection: based on missing segment, remove all unreferenced data

Let us know if you need more info.

Thanks,

Antonio

Adobe Employee
June 27, 2019

Just a note : AEM 6.2 is end of life as of April 20th 2019 so it's in your best interest of upgrade at the earliest opportunity.

See End Of Life Matrix here : All Apps Help | Products and technical support periods

June 27, 2019

I appreciate the assistance, but it's not helpful if my message is not read.  We run garbage collection...daily.

I have searched for hours before posting and read all of Adobe's documentation on maintenance and many articles on this forum.  However, I will repeat that we run version cleanup, workflow cleanup, and data store garbage collection daily.  I will also repeat we have performed compaction.

The data store has grown to this over 2 years or more, it is not suddenly growing rapidly, so a TRACE will only be marginally useful if at all at this point.

antoniom5495929
Level 7
June 27, 2019

Hi michaelh28626156​,

I can confirm that we read your post. It's for this reason that i put more detail related to the timing of execution.

Are you sure you are running you datastore garbage collection AFTER the compaction? Otherwise you GC is useless.

Let us know.

Thanks,

Antonio

Adobe Employee
June 27, 2019

Adding to what Antonio mentioned, there are other maintenance tasks such as Audit log purge, .. etc. They are listed in the earlier update I made.

June 27, 2019

Yes.  I performed offline compaction.  Then I did various ways of garbage collection.  I turned off the application, ran compaction, turned it on, and ran garbage collection, including the following command which should be clear that I have.

curl -silent -u username:password -X POST --data markOnly=false http://localhost:4502/system/console/jmx/org.apache.jackrabbit.oak%3Aname%3Drepository+manager%2Ctype%3DRepositoryManagement/op/startDataStoreGC/boolean

27.06.2019 18:10:36.159 *INFO* [qtp1670602695-1929] log.access 127.0.0.1 - admin 27/Jun/2019:18:10:36 +0400 "POST /system/console/jmx/org.apache.jackrabbit.oak%3Aname%3Drepository+manager%2Ctype%3DRepositoryManagement/op/startDataStoreGC/boolean HTTP/1.1" 200 201 "nt" "curl/7.29.0"

27.06.2019 18:10:36.160 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Starting Blob garbage collection with markOnly [false]

27.06.2019 18:10:36.207 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (2048) blob references

27.06.2019 18:10:36.248 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (4096) blob references

27.06.2019 18:10:36.266 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (6144) blob references

27.06.2019 18:10:36.294 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (8192) blob references

27.06.2019 18:10:36.314 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (10240) blob references

27.06.2019 18:10:36.330 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (12288) blob references

27.06.2019 18:10:36.346 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (14336) blob references

27.06.2019 18:10:36.362 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (16384) blob references

27.06.2019 18:10:36.377 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (18432) blob references

27.06.2019 18:10:36.394 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (20480) blob references

27.06.2019 18:10:36.410 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (22528) blob references

27.06.2019 18:10:36.425 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (24576) blob references

27.06.2019 18:10:36.441 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (26624) blob references

27.06.2019 18:10:36.455 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (28672) blob references

27.06.2019 18:10:36.473 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (30720) blob references

27.06.2019 18:10:36.487 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (32768) blob references

27.06.2019 18:10:36.525 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (34816) blob references

27.06.2019 18:10:36.551 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (36864) blob references

27.06.2019 18:10:36.598 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Number of valid blob references marked under mark phase of Blob garbage collection [37965]

27.06.2019 18:10:36.722 *ERROR* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Not all repositories have marked references available : [7e195675-c082-4cdf-8ec2-813ad8194891, 56af03b7-829d-445f-813a-e75681f86188]

27.06.2019 18:10:36.722 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Blob garbage collection completed in 562.6 ms. Number of blobs deleted [0] with max modification time of [2019-06-26 18:10:36.160]

antoniom5495929
Level 7
June 27, 2019

Hi,

thanks for info.

Are you using AEM with TarMk with external or embedded datastore?

Are you using a shared datastore?

Let us know.

Thanks,

Antonio

June 27, 2019

I have seen that post and the linked maintenance document before posting and performed all those operations, including audit log purge (nothing older than five days).