Disk Usage Report much smaller than disk space consumed | Community
Skip to main content
June 27, 2019
Solved

Disk Usage Report much smaller than disk space consumed

  • June 27, 2019
  • 25 replies
  • 11821 views

We have an AEM 6.2 instance with Hotfix 17578 (cq-6.2.0-hotfix-17578) installed so we are on Oak 1.4.17 and CFP18.

We run garbage collection, version cleanup, and workflow cleanup daily.  We have a separate datastore and segment store.  We have performed offline compaction but that only applies to the segmentstore.

The disk usage report in AEM (/etc/reports/diskusage.html) reports we are using ~18 GB of data.  However, our datastore has grown to ~170 GB of data.  I cannot find any way to figure out how to reduce this or where this is coming from.

From this page (Analyze unusual repository growth ) I can see we even have a few files between 1-6 GB but there is definitely no file in our DAM or packages that big.  The entire DAM according to the usage report is ~6 GB.

What can I do to reduce the size of our datastore?  What is causing this problem?

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.
Best answer by

Here is what I have done that seems to have resolved the issue.  If someone could let me know if there is some issue I am not seeing here but otherwise, it seems to have worked beautifully.

java -jar crx2oak-1.8.6-all-in-one.jar segment-old:/content/aem/crx-quickstart/repository segment-old:/content/backup/ --include-path=/ --src-datastore=/content/aem/datastore --datastore=/content/backup/datastore/

Running this command to pretend as if I am upgrading, but using "segment-old" for source and target, I was able to create a repository that is ~11 GB compared to the previous ~170 GB and all seems to work successfully after.

The only concern is this page (InvalidFileStoreVersionException migrating from older version to 6.3 using CRX2Oak ) only mentions using "segment-old" for the source repository but it doesn't seem to cause a problem with the destination repository.

25 replies

Adobe Employee
June 27, 2019

If you want to figure out what those 1-6GB files are, you can run the Linux/Unix command "file" on those files, and it will identify what type of file it is.

for example, when I run it on a blob in my datastore, I get the following, which indicates it is a JPEG:

$ file 1677c4fff0d5c7b5f7788edcb549639d60d5c44a4aff101dcd830a7b16e653a0

1677c4fff0d5c7b5f7788edcb549639d60d5c44a4aff101dcd830a7b16e653a0: JPEG image data, JFIF standard 1.01, resolution (DPI), density 300x300, segment length 16, Exif Standard: [TIFF image data, big-endian, direntries=12, height=2848, bps=0, PhotometricIntepretation=RGB, orientation=upper-left, width=4288], baseline, precision 8, 1626x1080, frames 3

I then copy that blob and rename it to image.jpg and open it, and I can see which image it is. This might give a clue as to where that image is coming from.

June 27, 2019

Not a shared datastore.  How can I answer your question about embedded or external TarMK datastore? I believe it is a TarMK datastore, the run modes include crx3tar.

Adobe Employee
June 27, 2019

One method you can use to try:

- Clone your current AEM server to a separate server.

- Delete crx-quickstart/repository/index folder.

- Run offline compaction, but use the rm-all flag, instead of rm-unreferenced. This will cause all indexes to be deleted.

- Start your server. Upon server startup, all indexes will be rebuilt and it will be a slow startup. Wait until the server is fully up.

- Run datastore GC.

- Run the disk usage report and compare its result with the actual disk size.

If you have corrupt index data, then the above should resolve it. Corrupt index data might give you incorrect results from the disk usage report. It could also cause datastore GC to allow content to persist when it should not.

June 27, 2019

All files over 100 MB are zips.  I have inspected one of the files over 6 GB.  They look like packages that were backups, but these packages do not exist anymore.  I have checked all packages from the Package Explorer multiple times to confirm.  Is there something I can do about these files in the meantime?  Why would they not be garbage collected?

This would be helpful in cleaning up at least a few GB, but the question about the over 150GB+ will still remain. :-/

June 27, 2019

Wouldn't a repository consistency check, which we also run daily, catch these corruptions?  I will try this and follow up.

June 27, 2019

We have a cloned server already.  So I stopped the application.  I deleted crx-quickstart/repository/index.  I ran java -jar oak-run-1.4.17.jar checkpoints /content/aem/crx-quickstart/repository/segmentstore/ rm-all.  I started the server. I waited for the index to be built.  I ran garbage collection.  I checked the Disk Usage report.  Still the same space is used in the datastore on the disk and the disk usage report is still showing the same, much lower number.

Adobe Employee
June 27, 2019

Not sure if this will help, but you can start a GUI to explore the repository using the oak-run tool and it actually shows you the size used by each node.

java -jar oak-run-1.2.16.jar explore author/repository/segmentstore

June 30, 2019

Unfortunately, I cannot do that since the server is remote and does not have a display.

Anything else I can try?  I have even updated to Oak 1.4.24 on a cloned server and deleted indices and compacted twice.  I have not been able to free any space of the datastore.  Is there some way to clone the datastore through some command (maybe the oak-run jar) that won't just copy byte-for-byte all the files on the disk and might produce something trimmed down?

Accepted solution
July 4, 2019

Here is what I have done that seems to have resolved the issue.  If someone could let me know if there is some issue I am not seeing here but otherwise, it seems to have worked beautifully.

java -jar crx2oak-1.8.6-all-in-one.jar segment-old:/content/aem/crx-quickstart/repository segment-old:/content/backup/ --include-path=/ --src-datastore=/content/aem/datastore --datastore=/content/backup/datastore/

Running this command to pretend as if I am upgrading, but using "segment-old" for source and target, I was able to create a repository that is ~11 GB compared to the previous ~170 GB and all seems to work successfully after.

The only concern is this page (InvalidFileStoreVersionException migrating from older version to 6.3 using CRX2Oak ) only mentions using "segment-old" for the source repository but it doesn't seem to cause a problem with the destination repository.

joerghoh
Adobe Employee
Adobe Employee
July 5, 2019

Glad that you managed to reduce the size. But it's a strange situation.

I could think that the directories of the original repository contained files, which were not part of the repository itself, but consumed lot of space on disk. And when you copied the repo itself with oak-run, these files were not copied.

Can you compare the old and the new repository, and find out in which directory the discrepancies were in terms of size? And then checking which files were affected within this directory?

Jörg