Expand my Community achievements bar.

SOLVED

Searching Rendered Content

Avatar

Level 2

I am having a bit of trouble with using the built in search with the Reference Component.

It seems that Lucene will index the content that is in the JCR for a particular node but it doesn't pull in anything from referenced nodes.

Page A has content RTE with 'term'. 
Page B has content RTE with 'additional'. 
Page A has reference to RTE on Page B.


Search of 'term' finds Page A.
Search of 'additional' finds Page B.

I can see how to filter out Page B from our results on 'additional' but I can we get Page A into the result set? Any help is greatly appreciated.

1 Accepted Solution

Avatar

Correct answer by
Employee Advisor

The indexing process of nodes is managed and implemented by the underlying OAK repository layer which does not understand the concept of page/component level relations like references since it is a WCM concept and at node level it does not mean anything. And I do not think there is any way to add any custom business logic or any mechanism to change the indexing process of nodes. 

The PDF example works because the PDF file is stored as a child node of dam:Asset node and since parent and child relation is a node level concept the OAK indexing process allows you to define indexing rules for aggregating the child nodes content with parent node. See the following node for an example - /oak:index/damAssetLucene/aggregates/dam:Asset. The concept of aggregation can not be applied to references as it is not a parent child relationship. 

For more details read this documentation - https://jackrabbit.apache.org/oak/docs/query/lucene.html

As @bsloki has mentioned the only possible solution in your case is to process the resultset returned by the lucene for references and return the final combined result set with references back to the search component. 

View solution in original post

5 Replies

Avatar

Level 10

What query are you using to pull this result set. Please specify the exact search term you are using'.

Avatar

Level 2

I put together a slightly contrived example inside the Geometrixx Media content.

I have a page (reference) at /content/geometrixx-media/en/reference.html which has a RTE with the content 'secondary'.

I have a page (article) at /content/geometrixx-media/en/test/article.html with a Title Component whose title is set to 'Alpha'. There is also a Reference Component which is connected to the RTE in the above page.

When I search for 'alpha' I get the article page. When I search for 'secondary' I get the reference page.

Using the query debug tool here are the queries I am using:

/libs/cq/search/content/querydebug.html?_charset_=UTF-8&query=path%3D%2Fcontent%0D%0Afulltext%3Dalpha%0D%0Atype%3Dcq%3APage which finds the article page.

/libs/cq/search/content/querydebug.html?_charset_=UTF-8&query=path%3D%2Fcontent%0D%0Afulltext%3Dsecondary%0D%0Atype%3Dcq%3APage which finds the reference page.

 

I am looking for a way to have the article page come in the result set when the term 'secondary' is used.

Avatar

Level 10

Hi,

If you look at it, all the content is stored in JCR within nodes. the search index will not be aware of the reference. It just searches for the given string under all the nodes and sends the result. If this is the requirement, then we may have to write multiple queries and achieve.

ex: 

Step 1: Search for the given 'string'

Step 2: Search for any references of the nodepath from the above result set.

Step 3: Combine both the results and send.

Avatar

Level 2

Is there any way to make the search index aware of the reference? When the reference page is updated I could search for all pages referencing that page and update their index? I'm not familiar with how Lucene builds the index of from JCR content.

Out of the box AEM will index PDF files based on extracted text information. The extracted text does not appear to be attached to the JCR node but Lucene still finds it in a search. Are there any clean hooks to associate content to be indexed with a particular node?

My thought is on an update to either the reference or the article pages the content being indexed could be re-built. It might be costly when things are updated but it seems better than trying to stitch results together on search. Or am I going down a bad thought path?

Avatar

Correct answer by
Employee Advisor

The indexing process of nodes is managed and implemented by the underlying OAK repository layer which does not understand the concept of page/component level relations like references since it is a WCM concept and at node level it does not mean anything. And I do not think there is any way to add any custom business logic or any mechanism to change the indexing process of nodes. 

The PDF example works because the PDF file is stored as a child node of dam:Asset node and since parent and child relation is a node level concept the OAK indexing process allows you to define indexing rules for aggregating the child nodes content with parent node. See the following node for an example - /oak:index/damAssetLucene/aggregates/dam:Asset. The concept of aggregation can not be applied to references as it is not a parent child relationship. 

For more details read this documentation - https://jackrabbit.apache.org/oak/docs/query/lucene.html

As @bsloki has mentioned the only possible solution in your case is to process the resultset returned by the lucene for references and return the final combined result set with references back to the search component.