Expand my Community achievements bar.

Dive into Adobe Summit 2024! Explore curated list of AEM sessions & labs, register, connect with experts, ask questions, engage, and share insights. Don't miss the excitement.

Query node traversal limit of 40,000 for AEM as a Cloud Service

Avatar

Level 4

Hi All,

 

We have a servlet endpoint that returns assets that contain a metadata property product_id_1. We have created a custom index called damAssetLucene-7-custom-7 that includes this property. We have recently added about 60,000 assets from Scene7 into AEM prod author instance with product_id_1 property populated. Ever since our servlet is erroring out with http 503 response. When I look at the logs, I do not see any error messages but I do see the below warning message:
08.04.2022 15:42:57.814 [cm-xxxxxxxxxx-aem-author-64c9b549b6-l5tqf] *WARN* [99.64.83.187 [1649432402844] GET /bin/xxxxxxx/getProductImages HTTP/1.1] org.apache.jackrabbit.oak.plugins.index.search.spi.query.FulltextIndex$FulltextPathCursor Index-Traversed 40000 nodes with filter Filter(query=select [jcr:path], [jcr:score], * from [dam:Asset] as a where [jcr:content/metadata/product_id_1] is not null and isdescendantnode(a, '/content/dam') /* xpath: /jcr:root/content/dam//element(*, dam:Asset)[(jcr:content/metadata/@product_id_1)] */, path=/content/dam//*, property=[jcr:content/metadata/product_id_1=[is not null]])

We tried changing queryLimitReads to 1,000,000 and queryLimitInMemory to 5,000,000 for the org.apache.jackrabbit.oak.query.QueryEngineSettingsService.cfg.json to see if that would help but it does not. Is there any other osgi configuration that limits the query node traversal to 40,000 that we can override? Just to note that this is AEM as a Cloud Service environment. We do need to send information for all assets that match the criteria to the external system.

Here is the xPath query: /jcr:root/content/dam//element(*, dam:Asset)[(jcr:content/metadata/@product_id_1)]
When I do the query execution plan for this query it does say it is using the custom dam lucene index mentioned above. Here is more details from the query execution plan:
Indexes Used: damAssetLucene-7-custom-7(/oak:index/damAssetLucene-7-custom-7)

Execution Plan:
[dam:Asset] as [a] /* lucene:damAssetLucene-7-custom-7(/oak:index/damAssetLucene-7-custom-7) +:ancestors:/content/dam +jcr:content/metadata/product_id_1:[* TO *] sync:(jcr:content/metadata/product_id_1 is not null) where ([a].[jcr:content/metadata/product_id_1] is not null) and (isdescendantnode([a], [/content/dam])) */

 

Logs:
Parsing xpath statement: explain /jcr:root/content/dam//element(*, dam:Asset)[(jcr:content/metadata/@product_id_1)]

XPath > SQL2: explain select [jcr:path], [jcr:score], * from [dam:Asset] as a where [jcr:content/metadata/product_id_1] is not null and isdescendantnode(a, '/content/dam') /* xpath: /jcr:root/content/dam//element(*, dam:Asset)[(jcr:content/metadata/@product_id_1)] */
cost using filter Filter(query=explain select [jcr:path], [jcr:score], * from [dam:Asset] as a where [jcr:content/metadata/product_id_1] is not null and isdescendantnode(a, '/content/dam') /* xpath: /jcr:root/content/dam//element(*, dam:Asset)[(jcr:content/metadata/@product_id_1)] */, path=/content/dam//*, property=[jcr:content/metadata/product_id_1=[is not null]])
cost for reference is Infinity
cost for property is Infinity
cost for nodeType is Infinity
cost for elasticsearch is Infinity
Applicable IndexingRule found IndexRule: nt:base
Applicable IndexingRule found IndexRule: nt:base
Applicable IndexingRule found IndexRule: nt:base
Applicable IndexingRule found IndexRule: nt:base
Applicable IndexingRule found IndexRule: dam:Asset
Applicable IndexingRule found IndexRule: nt:hierarchyNode
Applicable IndexingRule found IndexRule: dam:Asset
Applicable IndexingRule found IndexRule: dam:Asset
Applicable IndexingRule found IndexRule: nt:base
Applicable IndexingRule found IndexRule: nt:base
Applicable IndexingRule found IndexRule: nt:base
Applicable IndexingRule found IndexRule: nt:base
cost for [/oak:index/assetPrefixNodename-1] of type (lucene-property) with plan [lucene:assetPrefixNodename-1(/oak:index/assetPrefixNodename-1) :ancestors:/content/dam] is 11428900000.00
cost for [/oak:index/damAssetLucene-7-custom-7] of type (lucene-property) with plan [lucene:damAssetLucene-7-custom-7(/oak:index/damAssetLucene-7-custom-7) +:ancestors:/content/dam +jcr:content/metadata/product_id_1:[* TO *] sync:(jcr:content/metadata/product_id_1 is not null)] is 52430.00
cost for lucene-property is Infinity
cost for aggregate lucene is Infinity
looking for plans for paths : []
cost for aggregate solr is Infinity
cost for traverse is 8270948.0
count: 1 query: explain /jcr:root/content/dam/.../element(*, dam:Asset)[(jcr:content/metadata/@product_id_1)]

 

I appreciate any help you can provide.

-SKM

2 Replies

Avatar

Community Advisor

@skmAem 

Can you confirm if you have this property - notNullCheckEnabled -> true added to your custom property named product_id_1 in damAssetLucene-7-custom-7

Avatar

Level 2

Hi @Vijayalakshmi_S ,

I tried adding that property but that did not help. The issue was the incoming requests used to get timed  out while our servlet was still processing all of the assets on-demand. We had to implement a workaround where we would create a json file containing the assets metadata using a scheduled task ahead of time and store it in DAM. When the request would come to our servlet, we would read the json file from DAM and return it.

Thanks,

SKM