We found the oak running into a problem consuming a lot of resources so we are trying to improve the XPath queries.
Our AEM is 6.0 on OAK 1.0.22. According to https://jackrabbit.apache.org/oak/docs/query/query-troubleshooting.html, queries should not traverse more than 1000 nodes. We start our AEM as:
java -server -Xms4096m -Xmx8192m -XX:MetaspaceSize=512m -XX:MaxMetaspaceSize=2048m -Doak.queryLimitReads=20000 -Doak.queryLimitInMemory=50000...
/jcr:root/content/a/b//element(*,cq:PageContent)[@sling:resourceType='pages/my-resource' and (@cq:tags = 'custom-tag') ] order by @date descending
We index the field `date`, and there is only one article under /content/a/b, with resourceType 'pages/my-resource', cq:tags 'curstom-tag'. But we get the warning:
org.apache.jackrabbit.oak.query.FilterIterators The query read or traversed more than 20000 nodes.
java.lang.UnsupportedOperationException: The query read or traversed more than 20000 nodes.
And it returns nothing in the /crx/de/ Tools->Query tool.
If we get rid of the queryLimitReads and queryLimitInMemory in the startup script, we can get the article back.
Anyone has an idea why this happens? Thanks!
@kevingtan As I can find in official documentation, "the default values (unlimited) are used."
Please check following section "Slow Queries and Read Limits" in https://jackrabbit.apache.org/oak/docs/query/query-engine.html
Slow queries are logged as follows:
*WARN* Traversed 10000 nodes with filter Filter(query=select ...) consider creating an index or changing the query
If this is the case, an index might need to be created, or the condition of the query might need to be changed to take advantage of an existing index.
Queries that traverse many nodes, or that read many nodes in memory, can be cancelled. The limits can be set at runtime (also while a slow query is running) using JMX, domain “org.apache.jackrabbit.oak”, type “QueryEngineSettings”, attribute names “LimitInMemory” and “LimitReads”. These setting are not persisted, so in the next restart, the default values (unlimited) are used. As a workaround, these limits can be changed using the system properties “oak.queryLimitInMemory” and “oak.queryLimitReads”. Queries that exceed one of the limits are cancelled with an UnsupportedOperationException saying that “The query read more than x nodes… To avoid running out of memory, processing was stopped.”
So, as long as not setting the limit, it shall be limited hence you query works. However, if you limit then it traverse to that specific limit.
Hi @Himanshu_Singhal ,
Thanks for your reply.
I don't see the "
consider creating an index or changing the query
" in the error log, and it is not supposed to be there because I created all the necessary customized indexes under oak:index, plus the specific targeted content directory "/content/a/b" has only 1 article. According to https://jackrabbit.apache.org/oak/docs/query/query-troubleshooting.html, one doesn't have to create an index if there are only very limited items in the content repository for certain. Quoted from the document:
"If it is known from the data model that a query will never traverse many nodes, then no index is needed."
My understanding is that if I can direct the query engine to a very specific repository with very limited articles, in this case, "/content/a/b", we do not have to index it, and we did it nevertheless. Please correct me if I am wrong on this.