Expand my Community achievements bar.

Automation of data store garbage collection

Avatar

Level 3

Hi All,

I'm trying to automate the data store garbage collection through the curl command given in below link :

https://docs.adobe.com/docs/en/aem/6-0/administer/operations/data-store-garbage-collection.html

Curl command: 

     
1
curl -u admin:xyz -X POST --data markOnly=false http://localhost:4502/system/console/jmx/org.apache.jackrabbit.oak%3Aid%3D14%2Cname%3D%22repository+manager%22%2Ctype%3D%22RepositoryManagement%22/op/startDataStoreGC/boolean
 

But not able to get the correct response. When we hit the URL given in the curl command it is giving us the 404 error message as it is unable to find the operations "runDataStoreGarbageCollection". When I hit the url without the operations (shared below), it is able to find  but unable to open the operation as it is opened as a pop-up. Please advise on how to automate data store garbage collection.

curl -u admin:admin -X POST http://localhost:4502/system/console/jmx/com.adobe.granite:type=Repository

15 Replies

Avatar

Level 7

If you try to open the url from the system console, you have a 404?

something like: system/console/jmx/org.apache.jackrabbit.oak%3Aid%3D14%2Cname%3D%22repository+manager%22%2Ctype%3D%22RepositoryManagement%22

Avatar

Level 3

Hi,

Without /op/startDataStoreGC/boolean in the URL, I'm able to open jmx console. But with /op/... I'm getting 404. Please see the attachment.

Avatar

Employee

Hi Swati,

there is an error in the docs, please try the command below I used for 6.2

curl -u admin:admin -X POST --data markOnly=false http://localhost:4502/system/console/jmx/org.apache.jackrabbit.oak:id=14,name="repository manager",type="RepositoryManagement"/op/startDataStoreGC/boolean

The docs also claim that the above command will return after Datastore Garbage Collection(DSGC) has completed, this is incorrect. The curl command returns almost immediately, but the DSGC process is still running on the server. You can check the logs to see when the DSGC process has completed, it is because of this reason that you may be able to initiate DSGC, but you would have to check the logs to see when it finished or whether it finished without any errors.

For anyone using a 6.2 shared datastore setup, then please don't try and automate DSGC, as you would have to run DSGC in mark only mode on all servers and wait for the mark phase to complete, before attempting to run it with mark only false i.e. sweep mode, otherwise you will get an error:

26.07.2016 13:42:23.526 *ERROR* [sling-oak-observation-746] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Not all repositories have marked references available : [7e86561f-ca7e-4365-834b-0992b95001ae] 26

Regards,

Opkar

Avatar

Level 3

Hi Opkar,

Thank you for the reply. Were you able to hit the URL mentioned in the curl command in a browser? I'm still getting 404 error when I hit the URL in the browser. I changed the id number to 16. Please see the attachment.

Avatar

Employee

Hi Swathi,

I just installed a 6.1 instance with a DS and got the following command to work:

curl -u admin:admin -X POST --data 'markOnly=true' http://localhost:4502/system/console/jmx/org.apache.jackrabbit.oak%3Aid%3D15%2Cname%3D%22repository+...'

In order to get the correct command, you can use chrome tools to look at the request, one developer tools, then in the main window click on startDataStoreGC, then click on the network tab and then find the request, you can then right click and select Copy as curl, as in the screenshot. Then use the command captured to build the above command. 

Regards,

Opkar

Avatar

Level 3

Hi Opkar,

Thank you very much for your reply. As you suggested, we were able to copy the command as CURL from developer tools by invoking garbage collection manually. You mentioned that we would be able to track this activity from logs. We see following message from our request.log file. Were you referring to this? Is there any other way or other parameters that we can use to verify if GC is going well or the the status of GC

25/Aug/2016:11:33:52 -0400 [1373283] -> POST /system/console/jmx/org.apache.jackrabbit.oak%3Aid%3D16%2Cname%3D%22repository+manager%22%2Ctype%3D%22RepositoryManagement%22/op/startDataStoreGC/boolean HTTP/1.1

Avatar

Employee

Hi,

what I meant was the messages in the error.log

 

For the Mark Phase you will see:

 

25.08.2016 18:27:22.856 *INFO* [pool-6-thread-8] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Starting Blob garbage collection

25.08.2016 18:27:22.877 *INFO* [pool-6-thread-8] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Number of valid blob references marked under mark phase of Blob garbage collection [2863]

 

For the Sweep Phase you will see:

25.08.2016 18:27:09.353 *INFO* [pool-6-thread-15] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Starting Blob garbage collection

25.08.2016 18:27:09.509 *INFO* [pool-6-thread-15] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Number of valid blob references marked under mark phase of Blob garbage collection [2863]

25.08.2016 18:27:10.203 *INFO* [pool-6-thread-15] org.apache.jackrabbit.oak.plugins.blob.MarkSweepGarbageCollector Blob garbage collection completed in 847.6 ms. Number of blobs deleted [0]

 

 

Depending on the state of you repository, there will be a lot more messages outputted for each step. But as you do not have a shared datastore, you can go straight to the sweep, by setting markedly to false.

 

Regards,

Opkar

Avatar

Level 3

Hi Opkar,

We are not getting error.log file. It is either gc.log (java garbage collection), request.log or access.log file. We have error.log file from yesterday. Also, want to mention that we have our logging level as error. May be that is the reason we are not getting "Info" in error.log file. Planning to change the logging level to info temporarily to see if that helps. In that case can we invoke GC again.

Avatar

Employee

Is there any reason you do not set error.log at "INFO"? I think it's fairly standard to use info.

Regards,

Opkar

Avatar

Level 3

We changed the logging level to have only error as error.log file was growing rapidly with info as logging level in our PTE (Production Test Environment) and Production environment.

Avatar

Employee

It would make sense to look at how the development team have implemented their logging, it sounds like they are sending too much information to the logs at INFO level.

Regards,

Opkar

Avatar

Level 3

Hi Opkar,

We have changed the logging level to info and then triggered the GC manually using jmx console. But we still did not get anything in error.log file. We could only find below entry in our request.log file. We have no clue if our data store GC is still going on or is it done. We did not see any change in size of either in repo or data store directory. Please advise if there are any other parameters to see if DC is still going on. 

25/Aug/2016:15:16:46 -0400 [322] -> POST /system/console/jmx/org.apache.jackrabbit.oak%3Aid%3D16%2Cname%3D%22repository+manager%22%2Ctype%3D%22RepositoryManagement%22/op/startDataStoreGC/boolean HTTP/1.1

Avatar

Level 3

Hi,

I am having the same problem here as swathib78201134. Did anyone figure this out?

Avatar

Level 4

Hi Opkar,

what should be done if we get these kind of errors?

Regards,

Ramgopal.

Avatar

Employee Advisor

Make sure you get the quoting right, otherwise the Shell interprets characters like "&".

Jörg