We have an AEM 6.2 instance with Hotfix 17578 (cq-6.2.0-hotfix-17578) installed so we are on Oak 1.4.17 and CFP18.
We run garbage collection, version cleanup, and workflow cleanup daily. We have a separate datastore and segment store. We have performed offline compaction but that only applies to the segmentstore.
The disk usage report in AEM (/etc/reports/diskusage.html) reports we are using ~18 GB of data. However, our datastore has grown to ~170 GB of data. I cannot find any way to figure out how to reduce this or where this is coming from.
From this page (Analyze unusual repository growth ) I can see we even have a few files between 1-6 GB but there is definitely no file in our DAM or packages that big. The entire DAM according to the usage report is ~6 GB.
What can I do to reduce the size of our datastore? What is causing this problem?
Solved! Go to Solution.
Views
Replies
Total Likes
Here is what I have done that seems to have resolved the issue. If someone could let me know if there is some issue I am not seeing here but otherwise, it seems to have worked beautifully.
java -jar crx2oak-1.8.6-all-in-one.jar segment-old:/content/aem/crx-quickstart/repository segment-old:/content/backup/ --include-path=/ --src-datastore=/content/aem/datastore --datastore=/content/backup/datastore/
Running this command to pretend as if I am upgrading, but using "segment-old" for source and target, I was able to create a repository that is ~11 GB compared to the previous ~170 GB and all seems to work successfully after.
The only concern is this page (InvalidFileStoreVersionException migrating from older version to 6.3 using CRX2Oak ) only mentions using "segment-old" for the source repository but it doesn't seem to cause a problem with the destination repository.
Hello,
How often do you run the datastore garbage collection [1] ?
The biggest gain in recovering the disk space occupied by Datastore is by running Datastore Garbage after running Offline Tar Compaction.
You can put TRACE log on org.apache.jackrabbit.oak.operations.blobs to see what is being written to datastore.
Regards,
Vishu
Please check this thread:
Hi,
I'm agree with Vish.dhaliwal.
You can avoid this behaviour by running a datastore garbage collection.
Please keep in mind that the datastore garbage collection need to be executed after a compaction (better if is an offline compation) just becase in that way:
-Compaction: remove all the segment which are not used
-Garbage collection: based on missing segment, remove all unreferenced data
Let us know if you need more info.
Thanks,
Antonio
Just a note : AEM 6.2 is end of life as of April 20th 2019 so it's in your best interest of upgrade at the earliest opportunity.
See End Of Life Matrix here : All Apps Help | Products and technical support periods
I appreciate the assistance, but it's not helpful if my message is not read. We run garbage collection...daily.
I have searched for hours before posting and read all of Adobe's documentation on maintenance and many articles on this forum. However, I will repeat that we run version cleanup, workflow cleanup, and data store garbage collection daily. I will also repeat we have performed compaction.
The data store has grown to this over 2 years or more, it is not suddenly growing rapidly, so a TRACE will only be marginally useful if at all at this point.
Hi michaelh28626156,
I can confirm that we read your post. It's for this reason that i put more detail related to the timing of execution.
Are you sure you are running you datastore garbage collection AFTER the compaction? Otherwise you GC is useless.
Let us know.
Thanks,
Antonio
Adding to what Antonio mentioned, there are other maintenance tasks such as Audit log purge, .. etc. They are listed in the earlier update I made.
Yes. I performed offline compaction. Then I did various ways of garbage collection. I turned off the application, ran compaction, turned it on, and ran garbage collection, including the following command which should be clear that I have.
curl -silent -u username:password -X POST --data markOnly=false http://localhost:4502/system/console/jmx/org.apache.jackrabbit.oak%3Aname%3Drepository+manager%2Ctyp...
27.06.2019 18:10:36.159 *INFO* [qtp1670602695-1929] log.access 127.0.0.1 - admin 27/Jun/2019:18:10:36 +0400 "POST /system/console/jmx/org.apache.jackrabbit.oak%3Aname%3Drepository+manager%2Ctype%3DRepositoryManagement/op/startDataStoreGC/boolean HTTP/1.1" 200 201 "nt" "curl/7.29.0"
27.06.2019 18:10:36.160 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Starting Blob garbage collection with markOnly [false]
27.06.2019 18:10:36.207 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (2048) blob references
27.06.2019 18:10:36.248 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (4096) blob references
27.06.2019 18:10:36.266 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (6144) blob references
27.06.2019 18:10:36.294 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (8192) blob references
27.06.2019 18:10:36.314 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (10240) blob references
27.06.2019 18:10:36.330 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (12288) blob references
27.06.2019 18:10:36.346 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (14336) blob references
27.06.2019 18:10:36.362 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (16384) blob references
27.06.2019 18:10:36.377 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (18432) blob references
27.06.2019 18:10:36.394 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (20480) blob references
27.06.2019 18:10:36.410 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (22528) blob references
27.06.2019 18:10:36.425 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (24576) blob references
27.06.2019 18:10:36.441 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (26624) blob references
27.06.2019 18:10:36.455 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (28672) blob references
27.06.2019 18:10:36.473 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (30720) blob references
27.06.2019 18:10:36.487 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (32768) blob references
27.06.2019 18:10:36.525 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (34816) blob references
27.06.2019 18:10:36.551 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Collected (36864) blob references
27.06.2019 18:10:36.598 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Number of valid blob references marked under mark phase of Blob garbage collection [37965]
27.06.2019 18:10:36.722 *ERROR* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Not all repositories have marked references available : [7e195675-c082-4cdf-8ec2-813ad8194891, 56af03b7-829d-445f-813a-e75681f86188]
27.06.2019 18:10:36.722 *INFO* [sling-oak-observation-75] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Blob garbage collection completed in 562.6 ms. Number of blobs deleted [0] with max modification time of [2019-06-26 18:10:36.160]
Hi,
thanks for info.
Are you using AEM with TarMk with external or embedded datastore?
Are you using a shared datastore?
Let us know.
Thanks,
Antonio
I have seen that post and the linked maintenance document before posting and performed all those operations, including audit log purge (nothing older than five days).
If you want to figure out what those 1-6GB files are, you can run the Linux/Unix command "file" on those files, and it will identify what type of file it is.
for example, when I run it on a blob in my datastore, I get the following, which indicates it is a JPEG:
$ file 1677c4fff0d5c7b5f7788edcb549639d60d5c44a4aff101dcd830a7b16e653a0
1677c4fff0d5c7b5f7788edcb549639d60d5c44a4aff101dcd830a7b16e653a0: JPEG image data, JFIF standard 1.01, resolution (DPI), density 300x300, segment length 16, Exif Standard: [TIFF image data, big-endian, direntries=12, height=2848, bps=0, PhotometricIntepretation=RGB, orientation=upper-left, width=4288], baseline, precision 8, 1626x1080, frames 3
I then copy that blob and rename it to image.jpg and open it, and I can see which image it is. This might give a clue as to where that image is coming from.
Not a shared datastore. How can I answer your question about embedded or external TarMK datastore? I believe it is a TarMK datastore, the run modes include crx3tar.
One method you can use to try:
- Clone your current AEM server to a separate server.
- Delete crx-quickstart/repository/index folder.
- Run offline compaction, but use the rm-all flag, instead of rm-unreferenced. This will cause all indexes to be deleted.
- Start your server. Upon server startup, all indexes will be rebuilt and it will be a slow startup. Wait until the server is fully up.
- Run datastore GC.
- Run the disk usage report and compare its result with the actual disk size.
If you have corrupt index data, then the above should resolve it. Corrupt index data might give you incorrect results from the disk usage report. It could also cause datastore GC to allow content to persist when it should not.
All files over 100 MB are zips. I have inspected one of the files over 6 GB. They look like packages that were backups, but these packages do not exist anymore. I have checked all packages from the Package Explorer multiple times to confirm. Is there something I can do about these files in the meantime? Why would they not be garbage collected?
This would be helpful in cleaning up at least a few GB, but the question about the over 150GB+ will still remain. :-/
Wouldn't a repository consistency check, which we also run daily, catch these corruptions? I will try this and follow up.
We have a cloned server already. So I stopped the application. I deleted crx-quickstart/repository/index. I ran java -jar oak-run-1.4.17.jar checkpoints /content/aem/crx-quickstart/repository/segmentstore/ rm-all. I started the server. I waited for the index to be built. I ran garbage collection. I checked the Disk Usage report. Still the same space is used in the datastore on the disk and the disk usage report is still showing the same, much lower number.
Not sure if this will help, but you can start a GUI to explore the repository using the oak-run tool and it actually shows you the size used by each node.
java -jar oak-run-1.2.16.jar explore author/repository/segmentstore
Unfortunately, I cannot do that since the server is remote and does not have a display.
Anything else I can try? I have even updated to Oak 1.4.24 on a cloned server and deleted indices and compacted twice. I have not been able to free any space of the datastore. Is there some way to clone the datastore through some command (maybe the oak-run jar) that won't just copy byte-for-byte all the files on the disk and might produce something trimmed down?
Here is what I have done that seems to have resolved the issue. If someone could let me know if there is some issue I am not seeing here but otherwise, it seems to have worked beautifully.
java -jar crx2oak-1.8.6-all-in-one.jar segment-old:/content/aem/crx-quickstart/repository segment-old:/content/backup/ --include-path=/ --src-datastore=/content/aem/datastore --datastore=/content/backup/datastore/
Running this command to pretend as if I am upgrading, but using "segment-old" for source and target, I was able to create a repository that is ~11 GB compared to the previous ~170 GB and all seems to work successfully after.
The only concern is this page (InvalidFileStoreVersionException migrating from older version to 6.3 using CRX2Oak ) only mentions using "segment-old" for the source repository but it doesn't seem to cause a problem with the destination repository.
Glad that you managed to reduce the size. But it's a strange situation.
I could think that the directories of the original repository contained files, which were not part of the repository itself, but consumed lot of space on disk. And when you copied the repo itself with oak-run, these files were not copied.
Can you compare the old and the new repository, and find out in which directory the discrepancies were in terms of size? And then checking which files were affected within this directory?
Jörg