We have a nightly backup on our 6.3 authoring environment. We have a script that shuts down the server, then makes a tar file of the authoring directory, grabbing the crx-quickstart directory, license, initial jar file, and so on. Once it's done, it restarts the server and gzips the backup file.
The tar process is taking a very long time - around 12 hours or more - which impacts when we can use the authoring environment.
I cannot exactly say how big the repository is (the du command never finishes) but the backup files (after gzip) are ~70G. (The last file was 150G, but I'm not sure if it was compressed or not.)
Removing content is not an option, because we don't know what older content they can get rid of or will need later on.
Any recommendations to help us in this process? Or for backups in general?
You could use Rsync for about 2 times while the system is running and then the last rsync can be taken by shutting down the server where the downtime would be less as the last rsync would only copy delta and most of the data is already copied from previous rsync.
Rsync for the 1st time while the instance is running - takes time for the 1st time
Rsync for the 2nd time while the instance is running - copies over delta should be relatively fast
Rsync for the 3rd time shutting the instance down - quick delta copy and start the instance
After the above stages you could look at options to compress it which shouldn't affect the instance.
Another option to look at is taking SNAPSHOT which is more effective in terms of backup and restore