


Marking each AEM instance for file datastore GC is one thing. But I'm wondering if there is a way, maybe with oak-run that I don't know about, that can do the reverse: get all the blob ids in the File Data Store and check against each AEM instance and mark any unused files.
So for example... if I manually created a file dog.txt in the FDS directly. Is there a command I can run that would mark that as never used in any of my AEM instances?
I have a Externally Shared FDS, using binary less replication, 1 author, 3 publish. I feel like I'm about 150G above what I should be.
Views
Replies
Total Likes
Hi @sdouglasmc ,
I have created script to clean unused references achieve the same as below, customize as per your requirement
#!/bin/bash java -Xmx1g -Doak.compaction.eagerFlush=true -jar tools/oak-run-1.40.0.jar console crx-quickstart/repository/segmentstore < cleanup.commands java -Xmx1g -Doak.compaction.eagerFlush=true -jar tools/oak-run-1.40.0.jar checkpoints crx-quickstart/repository/segmentstore java -Xmx1g -Doak.compaction.eagerFlush=true -jar tools/oak-run-1.40.0.jar checkpoints crx-quickstart/repository/segmentstore rm-unreferenced java -Xmx1g -Doak.compaction.eagerFlush=true -jar tools/oak-run-1.40.0.jar checkpoints crx-quickstart/repository/segmentstore rm-all java -Xmx1g -Doak.compaction.eagerFlush=true -Doffline-compaction=true -jar tools/oak-run-1.40.0.jar compact crx-quickstart/repository/segmentstore rm crx-quickstart/repository/segmentstore/*.tar.bak echo "Finished"
rmNode.groovy
import org.apache.jackrabbit.oak.spi.commit.CommitInfo
import org.apache.jackrabbit.oak.spi.commit.EmptyHook
import org.apache.jackrabbit.oak.spi.state.NodeStateUtils
import org.apache.jackrabbit.oak.spi.state.NodeStore
import org.apache.jackrabbit.oak.commons.PathUtils
def rmNode(def session, String path, boolean includingThis = true) {
if(!includingThis) {
println "Removing subnodes of ${path}"
def ns = NodeStateUtils.getNode(session.getRoot(), path);
for(def subNodeName : ns.getChildNodeNames()) {
if(!subNodeName.equals("rep:policy")) {
String subpath = path + "/" +subNodeName;
rmNode(session, subpath);
}
}
} else {
println "Removing node ${path}"
NodeStore ns = session.store
def nb = ns.root.builder()
def aBuilder = nb
for(p in PathUtils.elements(path)) {
aBuilder = aBuilder.getChildNode(p)
}
if(aBuilder.exists()) {
rm = aBuilder.remove()
ns.merge(nb, EmptyHook.INSTANCE, CommitInfo.EMPTY)
return em
} else {
priln "Node ${path} doesn't exist"
return false;
}
}
}
Create tools folder parallel to your script and add above rmNode.groovy and jar from here .
Hope that helps!
Regards,
Santosh
Views
Replies
Total Likes
Appreciated, but that isn't going to be what I'm looking for the way we have things set up.
Views
Replies
Total Likes
Why does the datastore GC process not work for you? I am not aware of any other way to do it.
Views
Replies
Total Likes
It does work. That's not the issue I'm having. I notice that we have spiked about 150G somewhere in the past 2 months and I'm trying to figure out why. So, I'd like to figure out a way I can validate all the blobids in the FDS against the repositories and not the other way around. Do you know if the marksweep uses any indexes or just the blobid cache on the filesystem?
Views
Replies
Total Likes