Your achievements

Level 1

0% to

Level 2

Tip /
Sign in

Sign in to Community

to gain points, level up, and earn exciting badges like the new
Bedrock Mission!

Learn more

View all

Sign in to view all badges

Can you identify valid blob ids in File Data Store and remove unused files?


Level 5

Marking each AEM instance for file datastore GC is one thing.  But I'm wondering if there is a way, maybe with oak-run that I don't know about, that can do the reverse: get all the blob ids in the File Data Store and check against each AEM instance and mark any unused files.

So for example... if I manually created a file dog.txt in the FDS directly.  Is there a command I can run that would mark that as never used in any of my AEM instances?

I have a Externally Shared FDS, using binary less replication, 1 author, 3 publish.  I feel like I'm about 150G above what I should be.


4 Replies


Community Advisor

Hi @sdouglasmc ,

I have created script to clean unused references achieve the same as below, customize as per your requirement 

java -Xmx1g -Doak.compaction.eagerFlush=true -jar tools/oak-run-1.40.0.jar console crx-quickstart/repository/segmentstore < cleanup.commands
java -Xmx1g -Doak.compaction.eagerFlush=true -jar tools/oak-run-1.40.0.jar checkpoints crx-quickstart/repository/segmentstore
java -Xmx1g -Doak.compaction.eagerFlush=true -jar tools/oak-run-1.40.0.jar checkpoints crx-quickstart/repository/segmentstore rm-unreferenced
java -Xmx1g -Doak.compaction.eagerFlush=true -jar tools/oak-run-1.40.0.jar checkpoints crx-quickstart/repository/segmentstore rm-all
java -Xmx1g -Doak.compaction.eagerFlush=true -Doffline-compaction=true -jar tools/oak-run-1.40.0.jar compact crx-quickstart/repository/segmentstore
rm crx-quickstart/repository/segmentstore/*.tar.bak
echo "Finished"


import org.apache.jackrabbit.oak.spi.commit.CommitInfo
import org.apache.jackrabbit.oak.spi.commit.EmptyHook
import org.apache.jackrabbit.oak.spi.state.NodeStateUtils
import org.apache.jackrabbit.oak.spi.state.NodeStore
import org.apache.jackrabbit.oak.commons.PathUtils

def rmNode(def session, String path, boolean includingThis = true) {
if(!includingThis) {
println "Removing subnodes of ${path}"

def ns = NodeStateUtils.getNode(session.getRoot(), path);
for(def subNodeName : ns.getChildNodeNames()) {
if(!subNodeName.equals("rep:policy")) {
String subpath = path + "/" +subNodeName;
rmNode(session, subpath);
} else {
println "Removing node ${path}"

NodeStore ns =
def nb = ns.root.builder()

def aBuilder = nb
for(p in PathUtils.elements(path)) {
aBuilder = aBuilder.getChildNode(p)
if(aBuilder.exists()) {
rm = aBuilder.remove()
ns.merge(nb, EmptyHook.INSTANCE, CommitInfo.EMPTY)
return em
} else {
priln "Node ${path} doesn't exist"
return false;

 Create tools folder parallel to your script and add above rmNode.groovy and jar from here .

Hope that helps!




Level 5

Appreciated, but that isn't going to be what I'm looking for the way we have things set up. 


Employee Advisor

Why does the datastore GC process not work for you? I am not aware of any other way to do it.


Level 5

It does work.  That's not the issue I'm having.  I notice that we have spiked about 150G somewhere in the past 2 months and I'm trying to figure out why.  So, I'd like to figure out a way I can validate all the blobids in the FDS against the repositories and not the other way around.  Do you know if the marksweep uses any indexes or just the blobid cache on the filesystem?