I want to exclude certain properties from AEM full text search so that there are no matching results when page author's name is searched
for example if I search for Amit - it is providing few pages as results as there are few pages authored by Amit. I don't want these search results
I am using the default OOTB cqPageLucene index
I already checked the document - https://jackrabbit.apache.org/oak/docs/query/lucene.html where it is mentioned that to
exclude a property we can add index(boolean) false on the property
I have added index(boolean) false on
jcr:content/cq:lastRolledoutBy
jcr:content/cq:lastModifiedBy
jcr:content/cq:lastReplicatedBy
but the issue is that even below jcr:content node - there are various nodes in a page - a responsive grid inside another responsive grid and when an author drops a component that component always has jcr:createdBy ; jcr:lastModifiedBy properties which have the content author's id/name
So I am planning to use
isRegexp
as mentioned in the doc to write a regex and then set index(boolean) false
Has anyone else faced the same issue and can help in excluding these jcr:createdby and jcr:lastModifiedBy properties in deep nodes ? Am I going in the right direction using isRegexp ?
If yes - what can be the right regex to exclude these properties from certain (n) level of nodes?
I read this https://experienceleaguecommunities.adobe.com/t5/adobe-experience-manager/indexing-data-indexing-rul... but there is no solution for excluding properties in nested nodes
Any oak experts; lucene indexing gurus can help me on this?
thanks in advance
Solved! Go to Solution.
Views
Replies
Total Likes
Hi @cqsapientu69896,
If a property is not to be part of full text search set the property -
For restricting the property names using regex,
Note : The concern that you mentioned on "property name can be at any depth under cq:Page" can be controlled or handled using aggregates and property definition together. (In other words, depth of nodes to be indexed under cq:Page is defined with help of aggregates node. Even if it does at say 10th level, respective node might not have been indexed at first place unless we define them explicitly in include rule)
Example :
isRegex -> true
name -> jcr:content/*/*/*/.* (all properties of spacer node - jcr:content/root/responsivegrid/spacer)
or
name -> jcr:content/*/*/*/jcr:lastModifiedBy
analyzed -> false
nodeScopeIndex -> false
Views
Replies
Total Likes
Hi @cqsapientu69896,
If a property is not to be part of full text search set the property -
For restricting the property names using regex,
Note : The concern that you mentioned on "property name can be at any depth under cq:Page" can be controlled or handled using aggregates and property definition together. (In other words, depth of nodes to be indexed under cq:Page is defined with help of aggregates node. Even if it does at say 10th level, respective node might not have been indexed at first place unless we define them explicitly in include rule)
Example :
isRegex -> true
name -> jcr:content/*/*/*/.* (all properties of spacer node - jcr:content/root/responsivegrid/spacer)
or
name -> jcr:content/*/*/*/jcr:lastModifiedBy
analyzed -> false
nodeScopeIndex -> false
Hello, I've got the similar problem. I want to exclude from the fulltext index (cqPageLucene) all the "technical" properties. As a test I've added on my local instance such node under the:
+cqPageLucene
+ indexRules
+ cq:Page
+ properties
+ techProp
-name="myTechPropName"
-index="{Boolean}false"
-excludeFromAggregation="{Boolean}true"
after the reindex I still get search results based on value in this property, so the above index definition properties mentioned in the Lucene documentation are not working.
https://jackrabbit.apache.org/oak/docs/query/lucene.html#indexing-rules
Views
Replies
Total Likes
Hi @MikeGforces,
Can you share the below details to debug further
Views
Replies
Total Likes
Thanks @Vijayalakshmi_S for the descriptive answer
however it is not correct - I also added a comment to my question yesterday which mentioned that isRegexp does not support child nodes -
as it is also mentioned in the document - https://jackrabbit.apache.org/oak/docs/query/lucene.html
Note that the regular expression doesn’t match intermediate nodes, so, jcr:content/.*/.* would not index all properties for all children of jcr:content. OAK-5187 is an open improvement to track supporting arbitrary intermediate child nodes.
I tried adding a node with
isRegexp true;
analyzed false;
nodeScopeIndex false
and name as jcr:content/*/*/*/jcr*
and it still returned the result with author name (it is not excluding the property)
it is the same with these regex
jcr:content/*/*/*/jcr.*
jcr:content/*/*/*/jcr:lastModifiedBy
So it is not working - and the reason for this is https://issues.apache.org/jira/browse/OAK-5187
can you please let me know if my understanding is correct ? cc @kautuk_sahni
Hi @cqsapientu69896,
Your understanding is correct. I missed about the open issue
If your requirement is critical and need to be addressed by any means, consider the below.(approach not involving isRegexp, providing complete property path)
Also, as part of my trial in my local, could see that nodeScopeIndex/analyzed -> false is not restricting at times. You can try and if it is the same for you, use property named "excludeFromAggregation" -> true [Boolean] on the property instead.
Conclusion:
Providing full property path (without regex which works only for property names not for intermediate nodes) + "excludeFromAggregation" should work. Please try and update this thread.
Views
Replies
Total Likes
Thanks @Vijayalakshmi_S
If you check my comment on the same day I asked this question; I had already tried
excludeFromAggregation(bolean) false with complete property path and mentioned that it works
and also mentioned that it is not feasible and practical to add it for thousands of nodes as the node structure and hence the property path can be anything as per the content has been authored.