Expand my Community achievements bar.

Enhance your AEM Assets & Boost Your Development: [AEM Gems | June 19, 2024] Improving the Developer Experience with New APIs and Events
SOLVED

Reduction of size in Diskusage report is not reflecting at repository size at server level

Avatar

Level 2

Hi Adobe Community,

 

We are running our cluster over Adobe 6.5 on Author & Publish architecture. During upgrade process, we noticed the size of Author instances are huge mostly due to packages left in datastore and we removed them in via console '/etc/packages' size is reduced from 45GB to 10 GB. 

Post to which we ran Revision clean up, Data Store Garbage collection, yet the reduced sizes are not getting reflected over the Crx repository. 

Our CRX repository sizes before and after clean up are as below, can you advise if we are missing something to be run here. 

 

Before cleanup - 

288.0K  blobids

55.8G  datastore

782.3M  index

20.7G  segmentstore

 

After cleanup

620.0K blobids

48.7G datastore

767.7M index

16.6G segmentstore

 

Since there is a 30GB reduction in diskusage report, am expecting 30GB reduction in Datastore at repository level.

1 Accepted Solution

Avatar

Correct answer by
Community Advisor

Yes,  deleting packages typically removes the associated content and configurations from the AEM environment but does not directly affect the underlying datastore size.
Also, 50GB + datastore size is fine - honestly it depends on the number of content pages, assets size and number etc.

There are use cases of 80 GB+ datastore size as seen in AEM Upgrade 6.5 - Huge Datastore size


Hope this helps!

Rohan Garg

View solution in original post

4 Replies

Avatar

Level 9

HI @Pavan_KumarTi ,

The reduction in size reported by the Diskusage report may not directly reflect the reduction in size at the repository level. The Diskusage report provides an estimate of the size of different components in the repository, but it may not take into account all factors that contribute to the overall repository size.

Here are a few reasons why the reduction in size reported by the Diskusage report may not match the reduction in size at the repository level:

1. Datastore Garbage Collection: The Datastore Garbage Collection process removes unused binary files from the datastore. However, it may not immediately release the disk space occupied by those files. The disk space may be marked as available for reuse, but it may not be immediately reclaimed by the operating system. This can result in a delay between the reduction in size reported by the Diskusage report and the actual reduction in size at the repository level.

2. Segment Store: The segment store is responsible for storing the content and revisions in the repository. The Diskusage report provides an estimate of the size of the segment store, but it may not take into account all factors that contribute to the segment store size. There may be other factors, such as internal data structures and metadata, that contribute to the overall segment store size and are not accounted for in the Diskusage report.

3. Index: The index stores the indexed data for efficient searching and querying. The Diskusage report provides an estimate of the size of the index, but it may not take into account all factors that contribute to the index size. There may be other factors, such as internal data structures and metadata, that contribute to the overall index size and are not accounted for in the Diskusage report.

It's also worth noting that the Diskusage report may not provide real-time or immediate updates. It may take some time for the report to reflect the changes made to the repository.

In summary, the reduction in size reported by the Diskusage report may not directly match the reduction in size at the repository level due to various factors and processes involved in the storage and management of data in AEM. It's recommended to monitor the overall disk usage and observe the trend over time to assess the impact of cleanup processes on the repository size.

Avatar

Community Advisor

Hi @Pavan_KumarTi,

Disk Usage is generally a superset (total space consumed by all components of the AEM instance) of datastore (specifically handles the storage of binary data), Indexes, Temp Files, Backup Files, CRX repo itself and thus is a bigger pool than datastore.

More importantly, the datastore/blobstore de-duplicates large binaries. Say you upload a large binary twice - it's stored only once in blob storage but the disk usage report will fill it twice thus resulting in a higher number than the actual disk usage.

 

Hope this helps!

Rohan Garg

Avatar

Level 2

Hi Rohan,

Thanks for advising, so does it mean even reduction of 35 GB of (custom packages ) doesn't help in reducing any size at repo level.

Is there any other spaces to look for housekeeping, or an 50GB + datstore for an author instance is normal nomal size in larger org level ?

Avatar

Correct answer by
Community Advisor

Yes,  deleting packages typically removes the associated content and configurations from the AEM environment but does not directly affect the underlying datastore size.
Also, 50GB + datastore size is fine - honestly it depends on the number of content pages, assets size and number etc.

There are use cases of 80 GB+ datastore size as seen in AEM Upgrade 6.5 - Huge Datastore size


Hope this helps!

Rohan Garg