Expand my Community achievements bar.

Dive into Adobe Summit 2024! Explore curated list of AEM sessions & labs, register, connect with experts, ask questions, engage, and share insights. Don't miss the excitement.
SOLVED

SQL2 search query taking too long to execute.

Avatar

Former Community Member

We have a pretty big repository and i'm trying to perform the following SQL2 query which takes around 30 seconds to execute.

Query -

SELECT * FROM [CUSTOMNODETYPE] as t where ISDESCENDANTNODE('path_under_required_folders') and t.isModified=CAST('true' AS BOOLEAN)

I enabled the debug logs for the package  org.apache.jackrabbit.core.query and am attaching the log messages..
Going through the logs the below message seems suspicious:

17.10.2014 01:43:18.049 *DEBUG* [10.239.60.171 [1413535373275] GET /crx/de/query.jsp?_dc=1413535372832&_charset_=utf-8&type=JCR-SQL2&stmt=SELECT%20*%20FROM%20CUSTOM_NODE_TYPE%20as%20t%20where%20ISDESCENDANTNODE(%27CUSTOM_PATH%27)%20and%20t.isModified%3DCAST(%27true%27%20AS%20BOOLEAN)&showResults=true HTTP/1.1] org.apache.jackrabbit.core.query.lucene.CachingIndexReader PlainDocId(2319940) not valid anymore.

I don't think that the search should be this slow. Just a FYI the query returns 57 search results in around 25 seconds.

Could anyone please help me understand why this is so slow? Is the indexing not happening properly?

Thanks

1 Accepted Solution

Avatar

Correct answer by
Employee Advisor

Hi,

The ISDESCENDANT function is pretty heavyweight, as  it requires that each potential result is looked up in the TarPM to determine the path. The only choice to improve the performance is to provide other criteria, which can be used by Lucene itself.

Jörg

View solution in original post

7 Replies

Avatar

Level 10

Take a look at this thread on the Apache site - it explain an issue with your query - mainly use of ISDESCENDANTNODE:

https://issues.apache.org/jira/browse/JCR-2835

Avatar

Former Community Member

i went through the jira page, but wasn't this fixed in the 2.3.0 version of apache jackrabbit? Looking at my system/console/bundles page, CQ 5.6.1 uses 2.6.2 version of apache jakrabbit api.

Also i tried the equivalent query with the SQL version (using jcr:path LIKE constraint) and that too took about the same time.

Avatar

Correct answer by
Employee Advisor

Hi,

The ISDESCENDANT function is pretty heavyweight, as  it requires that each potential result is looked up in the TarPM to determine the path. The only choice to improve the performance is to provide other criteria, which can be used by Lucene itself.

Jörg

Avatar

Former Community Member

Hi Jörg,

We face the same performance issue using SQL and SQL2 query.

Could you please elaborate more on what are the other criteria's for improving the performance?

Avatar

Employee Advisor

Hi,

In [1] I tried to explain some aspects, but I probably have to cover them more deeply. Can you share your slow query as well? Or is it the same as in the original posting?

Regards,
Jörg

 

[1] http://cqdump.wordpress.com/2012/11/28/ways-to-access-your-content-with-jcr-part-2-performance-aspec...
 

Avatar

Former Community Member

Hi Jörg,

The below are the sample queries that we make use of:

SQL - SELECT * FROM CUSTOMNODETYPE WHERE jcr:path like '/content/foo/bar/%/level1/level2/%' and cq:propName='val'

SQL2 - SELECT * FROM [CUSTOMNODETYPE] as t where ISDESCENDANTNODE('path_under_required_folders') and t.isModified=CAST('true' AS BOOLEAN)

I went through your post and for our large repository and for the use cases where the data needs to be obtained from a considerable large subtree, i believe we should be using jcr search using queries. 

just a FYI the combined size of our index is around 18 GB (crx-quickstart/repository/repository/index && crx-quickstart/repository/workspaces/crx.default/index), which lead me to the documentation @ http://helpx.adobe.com/experience-manager/kb/SearchIndexingConfig.html which described the need to index only certain properties of node types. Would that help?

Avatar

Employee Advisor

Hi,

If you need to search a large subtree, the use of a query can be suitable. In your case, you probably have a large number of results when you look for your custom nodetype; each of these nodes needs to get checked against the TarIndex to look up the path. I don't know the query optimizer of JCR that well, but you can try a simple reordering of the conditions first (the check of a property against a specific value is quite fast). Depending on the size of the tar index the TarPM option "indexInMemory" could also speedup this search.

If that doesn't help and you need to speedup the query, you should consider to denormalize your data, so you create more specific queries which can be handled solely by Lucene. But that requires often changes on a number of places in your application code.

Regarding the optimization of the index.You can do this, but it needs good testing. When you have created own nodetypes, you can be more agressive; I would recommend to avoid to "tune" the index of the ootb nodetypes.

Jörg