Solved

Fulltext search index depth

Forum|Forum|3 years ago
March 4, 2022
1 reply
2308 views

I'm an trying to add fulltext search function using Lucene,

Issue is the depth of the fulltext search. Let me explain this through an example. Say you have to pages that contain a text component with the term "Foo" in the text and the search Predicate {fulltext=Foo, p.offset=0, p.limit=2, path=/content/mypath, type=cq:Page, p.excerpt=true}.

cq:Page Page 1
jcr:content
par1
component_text (search term "Foo" is here)

cq:Page Page 2
jcr:content
par1
component_wrapper
component_text (search term "Foo" is here)

The default index will find Page 1 but not Page 2.

I created a custom index increasing the number of aggregates for cq:Page (xml included below). That successfully returns Page 1 and Page 2, but it also returns all the parent pages in the tree.

Any suggestions how to resolve this?

<?xml version="1.0" encoding="UTF-8"?>
<jcr:root xmlns:cq="http://www.day.com/jcr/cq/1.0"
xmlns:jcr="http://www.jcp.org/jcr/1.0" xmlns:nt="http://www.jcp.org/jcr/nt/1.0"
jcr:primaryType="oak:Unstructured"
async="async"
compatVersion="{Long}2"
name="myPageLucene"
reindex="{Boolean}false"
includedPaths="[/content/mypath]"
queryPaths="[/content/mypath]"
evaluatePathRestrictions="{Boolean}true"
reindexCount="{Long}1"
type="lucene">
<aggregates jcr:primaryType="nt:unstructured">
<cq:Page jcr:primaryType="nt:unstructured">
<include0
jcr:primaryType="nt:unstructured"
path="jcr:content"
relativeNode="{Boolean}false"/>
<include1
jcr:primaryType="nt:unstructured"
path="*/*/*"
relativeNode="{Boolean}false"/>
<include2
jcr:primaryType="nt:unstructured"
path="*/*/*/*"
relativeNode="{Boolean}false"/>
<include3
jcr:primaryType="nt:unstructured"
path="*/*/*/*/*"
relativeNode="{Boolean}false"/>
<include4
jcr:primaryType="nt:unstructured"
path="*/*/*/*/*/*"
relativeNode="{Boolean}false"/>
<include5
jcr:primaryType="nt:unstructured"
path="*/*/*/*/*/*/*"
relativeNode="{Boolean}false"/>
<include6
jcr:primaryType="nt:unstructured"
path="*/*/*/*/*/*/*/*"
relativeNode="{Boolean}false"/>
<include7
jcr:primaryType="nt:unstructured"
path="*/*/*/*/*/*/*/*/*"
relativeNode="{Boolean}false"/>
<include8
jcr:primaryType="nt:unstructured"
path="*/*/*/*/*/*/*/*/*/*"
relativeNode="{Boolean}false"/>
<include9
jcr:primaryType="nt:unstructured"
path="*/*/*/*/*/*/*/*/*/*/*"
relativeNode="{Boolean}false"/>
</cq:Page>
</aggregates>
<indexRules jcr:primaryType="nt:unstructured">
<cq:Page jcr:primaryType="nt:unstructured"/>
</indexRules>
</jcr:root>

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.

Best answer by Anmol_Bhardwaj

If that is the query (see below) then it doesn't seem to work in the scenario I am trying to describe (there is content with "foo", if I remove the "fulltext.relpath = /" I get various hits, but that doesn't resolve the either not finding content that is too deep in the page node structure or find that content but returning also the all the parent pages as well.

My bad. Can you try with fulltext.relPath = .

I have done a search with the same scenario in the past. And have been able to achieve this. I have done the same in my debugger and can see the desired results.

Anmol_Bhardwaj

Community Advisor

If you're just looking for a particular text authored inside a component, you don't need to set the depth.

Just change the query to :

path = <content-path>

fulltext = foo

fulltext.relPath = /

this will search all the jcr:properties of all the nodes.

jlpjbAuthor

Thanks, but that doesn't quite work. I want to be able to find the term "foo" on any page, regardless of its relative node depth from the jcr:content node (and regardless of which component contains it). Does that make sense?

jlpjbAuthor

I was talking about the query above, I have pasted the same in the comment

If that is the query (see below) then it doesn't seem to work in the scenario I am trying to describe (there is content with "foo", if I remove the "fulltext.relpath = /" I get various hits, but that doesn't resolve the either not finding content that is too deep in the page node structure or find that content but returning also the all the parent pages as well.

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded