Dangers of rotating tar files too often
In our Stage environment, we have a 'Share Nothing' Author configuration with a diskspace of 200GB for the Author. The diskspace fills up too quickly; every day workspace/crx.default directory grows by 30GB and tarJournal by 12G.
When tar optimisation runs overnight, it optimizes the tar files but doesn't delete them if the maximumAge is not set in the repository.xml. By default optimized files are not deleted unless they are a month old.
The problem is since these optimized files are not deleted, when optimisation runs the following night, it starts by processing the oldest file under the 3 locations: workspace/crx.default, tarJournal, version. It takes the same amount of time to optimize these files, although they had already been optimised the night before. Since they're already optimised, the append=-1 shows there is nothing to append to the new output data files created during optimisation. The optimisation process really should start from the where it left off, rather from the oldest file in the directory. (All sites mention optimisation starts off from where it previously stopped.) Correct me if I'm wrong, but this is a bug in CQ.
My question is could we safely have the following setting where maximumAge is set to 12H?
<Cluster>
<Journal class="com.day.crx.persistence.tar.TarJournal">
<param name="maximumAge" value="PT12H"/>
</Journal>
</Cluster>
I need answers to the following questions if you could please? Would really appreciate your feedback.
1) The optimisation process starts from the oldest file rather than starting off from where it left off during the previous Tar optimisation run. Is this a Bug?
2) This means when files are processed and optimised, the files are deleted if they are greater than 12hours in age. If the files are processed and optimised and appended to output data files created during optimisation, what is the motivation of not deleting these already optimised files?
3) What are the dangers of rotating files and not leaving them there for say a month? When we run migration, these about 170 tar files are created under workspace/crx.default. We would like them deleted as soon as they are optimised and appended to new data files. But I am not aware of the consequences (if there are any) of deleting already optimised files.
How to Reproduce:
Easy to reproduce. In your CQ environment, leave the default setting for repository.xml:
<Journal class="com.day.crx.persistence.tar.TarJournal"/>
Make sure you have some tar files under
1) /repository/tarJournal
2) /repository/workspaces/crx.default
3) /repository/version
Run Tar Optimisation process from http://localhost:4502/system/console/jmx/com.adobe.granite%3Atype%3DRepository, startTarOptimization(). As you observe it in the logs, you will see data tar files optimized, and appended to new output data files. When this process completes new data files are created. The processed files are not deleted.
Now re-run Tar optimization. You'll notice in the logs that it starts by processing the oldest file in the directories and you'll see append=-1 since there is nothing to append. But the time taken to process this tar file is no different. This wastes time re-processing already processed files and the tar optimisation process might never get to the newer files if there's a huge backlog.
Now set your tar files to be rotated:
<Cluster>
<Journal class="com.day.crx.persistence.tar.TarJournal">
<param name="maximumAge" value="PT5H"/>
</Journal>
</Cluster>
You'll observe that optimised tar files that are older than 5hours will be deleted.
Business Impact: CQ on our Author servers shut down because of no diskspace as a result of the huge diskspace consumed by tar files. What are the dangers of rotating the tar files more often? Do we really need to keep already optimised tar files?
Product version: AEM CQ 5.6.1