I have been implementing sitemap functionality. Sitemap that I am implement list all pages that are not hidden for search - page have custom checkbox on page properties named `Hide in search` - if set to true page will not be included in the sitemap. To get all pages for the sitemap I use the following query:
SELECT * FROM [cq:Page] WHERE ISDESCENDANTNODE('/content/site-root') AND ([jcr:content/hideInSearch]='false' OR [jcr:content/hideInSearch] IS NULL)
To keep the query fast I wanted to extend the /oak:index/cqPageLucene index with an entry for hideInSearch:
Unfortunately after reindexing I see that query is traversing resources to match the IS NULL condition (If I remove that condition from query there is no traversal warning message in the logs). I checked the Lucene index documentation page and it seems that nullCheckEnabled property should fix that issue but it does not. I also checked that traversal is used also when I keep only the null check, that is:
SELECT * FROM [cq:Page] WHERE ISDESCENDANTNODE('/content/site-root') AND [jcr:content/hideInSearch] IS NULL
Do you know what I am doing wrong or what needs to be done resolve that issue ? Thanks for your help in advance.
Currently the only solution I have is to make sure that all pages have value assigned to hideInSearch property (setting default value on template level + updating the existing content with groovy script). This can be done quire easily but still it would be great to understand what's wrong with my index definition.
For the sitemap I wouldn't use search, because it doesn't comes with benefits. For the JCR query you need to deal with the query (which just iterates through the index) and after that you still need to lookup all remaining pages from the repo. Assuming that you don't have much pages which should not appear in the sitemap a simple traversal of the content tree is easier to implement (no custom index, simple traversal) and has about the same runtime performance.