Disk Usage Report much smaller than disk space consumed

Avatar

Avatar

michaelh2862615

Avatar

michaelh2862615

michaelh2862615

27-06-2019

We have an AEM 6.2 instance with Hotfix 17578 (cq-6.2.0-hotfix-17578) installed so we are on Oak 1.4.17 and CFP18.

We run garbage collection, version cleanup, and workflow cleanup daily.  We have a separate datastore and segment store.  We have performed offline compaction but that only applies to the segmentstore.

The disk usage report in AEM (/etc/reports/diskusage.html) reports we are using ~18 GB of data.  However, our datastore has grown to ~170 GB of data.  I cannot find any way to figure out how to reduce this or where this is coming from.

From this page (Analyze unusual repository growth ) I can see we even have a few files between 1-6 GB but there is definitely no file in our DAM or packages that big.  The entire DAM according to the usage report is ~6 GB.

What can I do to reduce the size of our datastore?  What is causing this problem?

Accepted Solutions (1)

Accepted Solutions (1)

Avatar

Avatar

michaelh2862615

Avatar

michaelh2862615

michaelh2862615

03-07-2019

Here is what I have done that seems to have resolved the issue.  If someone could let me know if there is some issue I am not seeing here but otherwise, it seems to have worked beautifully.

java -jar crx2oak-1.8.6-all-in-one.jar segment-old:/content/aem/crx-quickstart/repository segment-old:/content/backup/ --include-path=/ --src-datastore=/content/aem/datastore --datastore=/content/backup/datastore/

Running this command to pretend as if I am upgrading, but using "segment-old" for source and target, I was able to create a repository that is ~11 GB compared to the previous ~170 GB and all seems to work successfully after.

The only concern is this page (InvalidFileStoreVersionException migrating from older version to 6.3 using CRX2Oak ) only mentions using "segment-old" for the source repository but it doesn't seem to cause a problem with the destination repository.

Answers (24)

Answers (24)

Avatar

Avatar

michaelh2862615

Avatar

michaelh2862615

michaelh2862615

18-07-2019

Sure. I have just submitted a ticket.  Thanks for everyone's help.  I will update if Adobe comes back with anything insightful.

Avatar

Avatar

Jörg_Hoh

Employee

Total Posts

3.0K

Likes

910

Correct Answer

1.0K

Avatar

Jörg_Hoh

Employee

Total Posts

3.0K

Likes

910

Correct Answer

1.0K
Jörg_Hoh
Employee

08-07-2019

Hm, very strange then.

Can you report this to Adobe support (if not done already), just to let them know about the situation you encountered and how you solved it?

thanks,

Jörg

Avatar

Avatar

michaelh2862615

Avatar

michaelh2862615

michaelh2862615

07-07-2019

Yes, I ran offline compaction multiple times and even tried removing all checkpoints at some point as recommended above.

Avatar

Avatar

Jörg_Hoh

Employee

Total Posts

3.0K

Likes

910

Correct Answer

1.0K

Avatar

Jörg_Hoh

Employee

Total Posts

3.0K

Likes

910

Correct Answer

1.0K
Jörg_Hoh
Employee

07-07-2019

I would expect that the disk-usage report remains as is, because this report just iterates through the repository and sums up the size of the properties and binaries. It does not lookup files and such on the filesystem (thus does not know anything about segmentstore, datastores, shared datastores and such).

Just checking: Have you executed an offline compaction on your original instance before you ran the DSGC?

Jörg

Avatar

Avatar

michaelh2862615

Avatar

michaelh2862615

michaelh2862615

07-07-2019

I am not sure what the best way to provide a report about the file differences.  I can tell you a quick summary comparing a difference in files between the old and new datastore:

diff -qrN datastore/ datastore-bak/ | wc -l

534635

The AEM Usage Report (/etc/reports/diskusage.html) is basically identical -- I do not have an exact comparison on numbers, but the difference is negligible for our purposes.

Avatar

Avatar

Jörg_Hoh

Employee

Total Posts

3.0K

Likes

910

Correct Answer

1.0K

Avatar

Jörg_Hoh

Employee

Total Posts

3.0K

Likes

910

Correct Answer

1.0K
Jörg_Hoh
Employee

05-07-2019

Glad that you managed to reduce the size. But it's a strange situation.

I could think that the directories of the original repository contained files, which were not part of the repository itself, but consumed lot of space on disk. And when you copied the repo itself with oak-run, these files were not copied.

Can you compare the old and the new repository, and find out in which directory the discrepancies were in terms of size? And then checking which files were affected within this directory?

Jörg

Avatar

Avatar

michaelh2862615

Avatar

michaelh2862615

michaelh2862615

30-06-2019

Unfortunately, I cannot do that since the server is remote and does not have a display.

Anything else I can try?  I have even updated to Oak 1.4.24 on a cloned server and deleted indices and compacted twice.  I have not been able to free any space of the datastore.  Is there some way to clone the datastore through some command (maybe the oak-run jar) that won't just copy byte-for-byte all the files on the disk and might produce something trimmed down?

Avatar

Avatar

shunnar

Employee

Avatar

shunnar

Employee

shunnar
Employee

27-06-2019

Not sure if this will help, but you can start a GUI to explore the repository using the oak-run tool and it actually shows you the size used by each node.

java -jar oak-run-1.2.16.jar explore author/repository/segmentstore

Avatar

Avatar

michaelh2862615

Avatar

michaelh2862615

michaelh2862615

27-06-2019

We have a cloned server already.  So I stopped the application.  I deleted crx-quickstart/repository/index.  I ran java -jar oak-run-1.4.17.jar checkpoints /content/aem/crx-quickstart/repository/segmentstore/ rm-all.  I started the server. I waited for the index to be built.  I ran garbage collection.  I checked the Disk Usage report.  Still the same space is used in the datastore on the disk and the disk usage report is still showing the same, much lower number.

Avatar

Avatar

michaelh2862615

Avatar

michaelh2862615

michaelh2862615

27-06-2019

Wouldn't a repository consistency check, which we also run daily, catch these corruptions?  I will try this and follow up.

Avatar

Avatar

michaelh2862615

Avatar

michaelh2862615

michaelh2862615

27-06-2019

All files over 100 MB are zips.  I have inspected one of the files over 6 GB.  They look like packages that were backups, but these packages do not exist anymore.  I have checked all packages from the Package Explorer multiple times to confirm.  Is there something I can do about these files in the meantime?  Why would they not be garbage collected?

This would be helpful in cleaning up at least a few GB, but the question about the over 150GB+ will still remain. 😕

Avatar

Avatar

shunnar

Employee

Avatar

shunnar

Employee

shunnar
Employee

27-06-2019

One method you can use to try:

- Clone your current AEM server to a separate server.

- Delete crx-quickstart/repository/index folder.

- Run offline compaction, but use the rm-all flag, instead of rm-unreferenced. This will cause all indexes to be deleted.

- Start your server. Upon server startup, all indexes will be rebuilt and it will be a slow startup. Wait until the server is fully up.

- Run datastore GC.

- Run the disk usage report and compare its result with the actual disk size.

If you have corrupt index data, then the above should resolve it. Corrupt index data might give you incorrect results from the disk usage report. It could also cause datastore GC to allow content to persist when it should not.

Avatar

Avatar

michaelh2862615

Avatar

michaelh2862615

michaelh2862615

27-06-2019

Not a shared datastore.  How can I answer your question about embedded or external TarMK datastore? I believe it is a TarMK datastore, the run modes include crx3tar.

Avatar

Avatar

shunnar

Employee

Avatar

shunnar

Employee

shunnar
Employee

27-06-2019

If you want to figure out what those 1-6GB files are, you can run the Linux/Unix command "file" on those files, and it will identify what type of file it is.

for example, when I run it on a blob in my datastore, I get the following, which indicates it is a JPEG:

$ file 1677c4fff0d5c7b5f7788edcb549639d60d5c44a4aff101dcd830a7b16e653a0

1677c4fff0d5c7b5f7788edcb549639d60d5c44a4aff101dcd830a7b16e653a0: JPEG image data, JFIF standard 1.01, resolution (DPI), density 300x300, segment length 16, Exif Standard: [TIFF image data, big-endian, direntries=12, height=2848, bps=0, PhotometricIntepretation=RGB, orientation=upper-left, width=4288], baseline, precision 8, 1626x1080, frames 3

I then copy that blob and rename it to image.jpg and open it, and I can see which image it is. This might give a clue as to where that image is coming from.

Avatar

Avatar

michaelh2862615

Avatar

michaelh2862615

michaelh2862615

27-06-2019

I have seen that post and the linked maintenance document before posting and performed all those operations, including audit log purge (nothing older than five days).

Avatar

Avatar

antoniom5495929

Avatar

antoniom5495929

antoniom5495929

27-06-2019

Hi,

thanks for info.

Are you using AEM with TarMk with external or embedded datastore?

Are you using a shared datastore?

Let us know.

Thanks,

Antonio

Avatar

Avatar

michaelh2862615

Avatar

michaelh2862615

michaelh2862615

27-06-2019

Yes.  I performed offline compaction.  Then I did various ways of garbage collection.  I turned off the application, ran compaction, turned it on, and ran garbage collection, including the following command which should be clear that I have.

curl -silent -u username:password -X POST --data markOnly=false http://localhost:4502/system/console/jmx/org.apache.jackrabbit.oak%3Aname%3Drepository+manager%2Ctyp...

27.06.2019 18:10:36.159 *INFO* [qtp1670602695-1929] log.access 127.0.0.1 - admin 27/Jun/2019:18:10:36 +0400 "POST /system/console/jmx/org.apache.jackrabbit.oak%3Aname%3Drepository+manager%2Ctype%3DRepositoryManagement/op/startDataStoreGC/boolean HTTP/1.1" 200 201 "nt" "curl/7.29.0"

27.06.2019 18:10:36.160 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Starting Blob garbage collection with markOnly [false]

27.06.2019 18:10:36.207 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (2048) blob references

27.06.2019 18:10:36.248 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (4096) blob references

27.06.2019 18:10:36.266 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (6144) blob references

27.06.2019 18:10:36.294 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (8192) blob references

27.06.2019 18:10:36.314 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (10240) blob references

27.06.2019 18:10:36.330 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (12288) blob references

27.06.2019 18:10:36.346 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (14336) blob references

27.06.2019 18:10:36.362 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (16384) blob references

27.06.2019 18:10:36.377 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (18432) blob references

27.06.2019 18:10:36.394 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (20480) blob references

27.06.2019 18:10:36.410 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (22528) blob references

27.06.2019 18:10:36.425 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (24576) blob references

27.06.2019 18:10:36.441 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (26624) blob references

27.06.2019 18:10:36.455 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (28672) blob references

27.06.2019 18:10:36.473 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (30720) blob references

27.06.2019 18:10:36.487 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (32768) blob references

27.06.2019 18:10:36.525 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (34816) blob references

27.06.2019 18:10:36.551 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (36864) blob references

27.06.2019 18:10:36.598 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Number of valid blob references marked under mark phase of Blob garbage collection [37965]

27.06.2019 18:10:36.722 *ERROR* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Not all repositories have marked references available : [7e195675-c082-4cdf-8ec2-813ad8194891, 56af03b7-829d-445f-813a-e75681f86188]

27.06.2019 18:10:36.722 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Blob garbage collection completed in 562.6 ms. Number of blobs deleted [0] with max modification time of [2019-06-26 18:10:36.160]

Avatar

Avatar

hamidk92094312

Employee

Avatar

hamidk92094312

Employee

hamidk92094312
Employee

27-06-2019

Adding to what Antonio mentioned, there are other maintenance tasks such as Audit log purge, .. etc. They are listed in the earlier update I made.

Avatar

Avatar

antoniom5495929

Avatar

antoniom5495929

antoniom5495929

27-06-2019

Hi michaelh28626156​,

I can confirm that we read your post. It's for this reason that i put more detail related to the timing of execution.

Are you sure you are running you datastore garbage collection AFTER the compaction? Otherwise you GC is useless.

Let us know.

Thanks,

Antonio

Avatar

Avatar

michaelh2862615

Avatar

michaelh2862615

michaelh2862615

27-06-2019

I appreciate the assistance, but it's not helpful if my message is not read.  We run garbage collection...daily.

I have searched for hours before posting and read all of Adobe's documentation on maintenance and many articles on this forum.  However, I will repeat that we run version cleanup, workflow cleanup, and data store garbage collection daily.  I will also repeat we have performed compaction.

The data store has grown to this over 2 years or more, it is not suddenly growing rapidly, so a TRACE will only be marginally useful if at all at this point.

Avatar

Avatar

aemmarc

Employee

Avatar

aemmarc

Employee

aemmarc
Employee

27-06-2019

Just a note : AEM 6.2 is end of life as of April 20th 2019 so it's in your best interest of upgrade at the earliest opportunity.

See End Of Life Matrix here : All Apps Help | Products and technical support periods

Avatar

Avatar

antoniom5495929

Avatar

antoniom5495929

antoniom5495929

27-06-2019

Hi,

I'm agree with Vish.dhaliwal.

You can avoid this behaviour by running a datastore garbage collection.

Please keep in mind that the datastore garbage collection need to be executed after a compaction (better if is an offline compation) just becase in that way:

-Compaction: remove all the segment which are not used

-Garbage collection: based on missing segment, remove all unreferenced data

Let us know if you need more info.

Thanks,

Antonio

Avatar

Avatar

hamidk92094312

Employee

Avatar

hamidk92094312

Employee

hamidk92094312
Employee

27-06-2019

Please check this thread:

AEM - Ways to reduce repository size

Avatar

Avatar

Vish_dhaliwal

Employee

Avatar

Vish_dhaliwal

Employee

Vish_dhaliwal
Employee

27-06-2019

Hello,

How often do you run the datastore garbage collection [1] ?

The biggest gain in recovering the disk space occupied by Datastore is by running Datastore Garbage after running Offline Tar Compaction.

You can put TRACE log on org.apache.jackrabbit.oak.operations.blobs to see what is being written to datastore.

[1] https://helpx.adobe.com/experience-manager/6-2/sites/administering/using/data-store-garbage-collecti...

Regards,

Vishu