compaction - which nodes are being deleted | Community
Skip to main content
federicos727792
Level 3
November 21, 2016
Solved

compaction - which nodes are being deleted

  • November 21, 2016
  • 6 replies
  • 2713 views

We've problems with a repository growing out of control. Luckily compaction helps a lot (we're using online compaction), but I'd like to determine what is being deleted (compacted).

I'm using 

java -Dlogback.configurationFile=/tmp/logback.xml   -jar ~/Downloads/oak-run-1.4.9.jar compact repository/segmentstore/ >>/tmp/oak-run.log

with logback logging at trace level to try to determine what is being deleted. Although it seems to be pretty close, it's not really telling me the path of the nodes that are being deleted.

Reference: https://docs.adobe.com/docs/en/aem/6-2/deploy/platform/storage-elements-in-aem-6.html#Performing%20Online%20Revision%20Cleanup

Any suggestion on how to tackle this? My next try is going to be enabling repository write logging to determine the path of the nodes being written.

Thx,

Federico

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.
Best answer by federicos727792

Opkar, 

Nope, I wasn't using the rm-all flag.

 

Jörg,

Thanks a lot for the article. I was doing exactly that. The tip of the stacktrace generation when creating a new session is quite neat!

In the end the problem was an hourly invoked process that was rebuilding some content packages and downloading them from aem. The rebuilding operation increases the size of the repository by the same amount as the package size (which is over one GB). I suppose this is because of the appending nature of the JCR implementation.

Generating these packages less often + online compaction should take care of this.

Thanks for your help!

Federico

6 replies

kautuk_sahni
Community Manager
Community Manager
November 22, 2016

Hi

Please read this community article that covers most about Compaction.

Link:- http://www.aemcq5tutorials.com/tutorials/online-offline-tar-compaction-in-aem/

There are two ways to run tar compaction in aem Online Tar Compaction and Offline Tar Compaction. Below topics are covered in this tutorial:-

  • Steps to Run Online Compaction in AEM
  • Steps to Run Offline Compaction in AEM
  • Increase Performance of offline tar compaction
  • Frequently Asked Questions

 

Link:- http://adobeaemclub.com/oak-tarmk-compaction/

// TarMK Compaction

If we are using Tar files as the storage, it tends to grow in size and starts claiming disk space every time when data is created or updated as data in tar files are never overwritten rather it keeps adding new versions. To mitigate the same, AEM has garbage collection mechanism which is known as ‘Tar Compaction’ to remove the unused data and reclaim the disk space.

Link:- https://www.netcentric.biz/blog/is-your-repository-growing-rapidly-in-aem6.html

// Growth of repositories in AEM ? Here is the solution.

I hope this would help you.

~kautuk

Kautuk Sahni
Adobe Employee
November 22, 2016

Hi Federico, 

have you informed daycare and got their support for online compaction, as the is the only way it is supported.

The increase in size of the repository is in direct relation to the activity on the instance, so if you are doing a lot of writes, changes, moves, re-indexing, replication, then this will generate the growth in your repository. What kind of activity is happening on your authoring instance? That will help you to determine where the growth is coming from.

Regards,

Opkar

federicos727792
Level 3
November 23, 2016

Hi guys, thanks for the answers.

I've seen some of these articles before, however the main problem is that the repository is growing really fast (like 1gb per hour) without any activity that could explain it. 

I've tried an off-line compaction. It was relatively fast (3 minutes) and the compression was quite good, however after restarting aem the repository started to grow again. Hence I really need to investigate the root cause of the problem.

The latest suspect is the integration with adobe target. After enabling low level logging of the oak repository I saw many entries like this: "23.11.2016 06:37:37.146 [admin] [session-3003] Setting property [/etc/segmentation/adobe-target/XXX/498967/jcr:content/jcr:title]" in logs/audit.log.

I'll keep you posted.

Btw: Adobe Experience Manager (6.1.0.SP2), Apache Jackrabbit Oak 1.2.16

Best,

Federico

Adobe Employee
November 23, 2016

How are you doing the compaction? You are not using the "rm-all" flag every time are you?

Regards,

Opkar

joerghoh
Adobe Employee
Adobe Employee
November 23, 2016

Hi,

if you're repo is growing without any observable activity, you can use the mechanism described in [1] to log all write activities. That should help.

Jörg

[1] https://cqdump.wordpress.com/2016/05/24/what-is-writing-to-my-oak-repository/

federicos727792
federicos727792AuthorAccepted solution
Level 3
November 24, 2016

Opkar, 

Nope, I wasn't using the rm-all flag.

 

Jörg,

Thanks a lot for the article. I was doing exactly that. The tip of the stacktrace generation when creating a new session is quite neat!

In the end the problem was an hourly invoked process that was rebuilding some content packages and downloading them from aem. The rebuilding operation increases the size of the repository by the same amount as the package size (which is over one GB). I suppose this is because of the appending nature of the JCR implementation.

Generating these packages less often + online compaction should take care of this.

Thanks for your help!

Federico