OAK index for null checks (nullcheckenabled)

Avatar

Avatar

bartosz_wesolow

Avatar

bartosz_wesolow

bartosz_wesolow

08-07-2020

Hi all,

 

I have been implementing sitemap functionality. Sitemap that I am implement list all pages that are not hidden for search - page have custom checkbox on page properties named `Hide in search` - if set to true page will not be included in the sitemap. To get all pages for the sitemap I use the following query: 

 

SELECT * FROM [cq:Page] WHERE ISDESCENDANTNODE('/content/site-root') AND ([jcr:content/hideInSearch]='false' OR [jcr:content/hideInSearch] IS NULL)

 

 

To keep the query fast I wanted to extend the /oak:index/cqPageLucene index with an entry for hideInSearch: 

 

<jcr:root xmlns:jcr="http://www.jcp.org/jcr/1.0" xmlns:nt="http://www.jcp.org/jcr/nt/1.0"
  jcr:primaryType="nt:unstructured"
  name="jcr:content/hideInSearch"
  nullCheckEnabled="{Boolean}true"
  propertyIndex="{Boolean}true">
</jcr:root>

 

Unfortunately after reindexing I see that query is traversing resources to match the IS NULL condition (If I remove that condition from query there is no traversal warning message in the logs). I checked the Lucene index documentation page and it seems that nullCheckEnabled property should fix that issue but it does not. I also checked that traversal is used also when I keep only the null check, that is:

SELECT * FROM [cq:Page] WHERE ISDESCENDANTNODE('/content/site-root') AND [jcr:content/hideInSearch] IS NULL

 

Do you know what I am doing wrong or what needs to be done resolve that issue ? Thanks for your help in advance. 

 

Currently the only solution I have is to make sure that all pages have value assigned to hideInSearch property (setting default value on template level + updating the existing content with groovy script). This can be done quire easily but still it would be great to understand what's wrong with my index definition. 

 

Cheers!

Accepted Solutions (1)

Accepted Solutions (1)

Avatar

Avatar

Jörg_Hoh

Employee

Total Posts

3.0K

Likes

910

Correct Answer

1.0K

Avatar

Jörg_Hoh

Employee

Total Posts

3.0K

Likes

910

Correct Answer

1.0K
Jörg_Hoh
Employee

08-07-2020

Hi,

 

For the sitemap I wouldn't use search, because it doesn't comes with benefits. For the JCR query you need to deal with the query (which just iterates through the index) and after that you still need to lookup all remaining pages from the repo. Assuming that you don't have much pages which should not appear in the sitemap a simple traversal of the content tree is easier to implement (no custom index, simple traversal) and has about the same runtime performance.

Answers (1)

Answers (1)

Avatar

Avatar

vanegi

Employee

Avatar

vanegi

Employee

vanegi
Employee

08-07-2020

Hi Bartosz,

Yes, using nullCheckEnabled property should suffice the constraint here.

 

For the query "SELECT * FROM [cq:Page] WHERE ISDESCENDANTNODE('/content/site-root') AND [jcr:content/hideInSearch] IS NULL", below is the structure for index definition:

 

 

- compatVersion = 2
- async = "async"
- jcr:primaryType = oak:QueryIndexDefinition
- evaluatePathRestrictions = true
- type = "lucene"
+ indexRules
+ cq:Page
+ properties
+ hideInSearch
- name = "jcr:content/hideInSearch"
- propertyIndex = true
- nullCheckEnabled = true

 

 

I would also suggest to include some aggregate rules to the index (/oak:index/testIndex/aggregates/cq:Page) to include the contents of descendant nodes as well and make it more optimize.

 

Thanks,

Vaishali