Adobe Experience Manager Sites & More

abhishekk981269 · 8/22/20

We have implemented custom search using query builder for Pages and Assets. When i set

p.excerpt = true, I get all the information of excerpt for both pages and assets (pdfs mainly) but query gets super slow.

The query works as excepted (faster) if I just set the excerpt for Pages but as soon as i use it for assets, it gets slow.

Is there any way to extract excerpt for huge volume of assets faster through querybuilder search.

(/jcr:root/content/xxx/us/en//element(*, cq:Page)[(jcr:contains(., 'xyz') and not(jcr:content/@isNotSearchable))] | /jcr:root/content/dam/xxx/documents//element(*, dam:Asset)[(jcr:contains(., 'xyz') )])/rep:excerpt(.)

Shashi_Mulugu · 8/24/20

@abhishekk981269 Few questions to guide you better, What version of AEM are your using? What are doing with excerpts?

abhishekk981269 · 8/24/20

Hi Shashi, it is aem 6.5.4

I am trying to highlight search results (full text search)

This is how i am fetching it in code (The query is mentioned in original post):

hit.getExcerpt()

The code returns the excerpt as expected but the query gets super slow if i put p.excerpt = true for Assets

Our Assets (pdfs) are around 4-5 GB

@Shashi_Mulugu

Shashi_Mulugu · 8/24/20

Can you please convert your query to query builder format and run it with Explain query tool to see performance.

Shashi_Mulugu · 8/24/20

https://docs.adobe.com/content/help/en/experience-manager-64/developing/bestpractices/troubleshootin...

abhishekk981269 · 8/24/20

Thanks @Shashi_Mulugu. Yeah I tried that.

Here is my query performance when I include p.excerpt =true.

Indexes Used

cqPageLucene(/oak:index/cqPageLucene)

damAssetLucene(/oak:index/damAssetLucene)

Execution Time

Total time: 5697 ms

Query execution time: 1 ms
Get nodes time: 42 ms
Result node count time: 5654 ms
Number of nodes in result: 3518

Here is my query performance when I do not include p.excerpt =true in query. Indexes Used

cqPageLucene(/oak:index/cqPageLucene)

damAssetLucene(/oak:index/damAssetLucene)

Execution Time

Total time: 59 ms

Query execution time: 0 ms
Get nodes time: 4 ms
Result node count time: 55 ms
Number of nodes in result: 3518

As you can see the query response time goes down to 59 ms from 5697 ms because of p.excerpt = true.

I could not find any helpful article/resource explaining tuning of query with excerpt.

The rest of the query is picking up the OOTB indexes to which we have added our custom property indexes for isNotSearchable. There is nothing else in the query. The query is pretty much restricted over the searchable content.

Shashi_Mulugu · 8/24/20

Then in that case can you please cross if those indexes have the "useinexcerpt" property enabled or not?https://jackrabbit.apache.org/oak/docs/query/lucene.html#Property_Definitions

abhishekk981269 · 8/24/20

Thanks @Shashi_Mulugu . I want to use the useInExcerpt but all indexes for Dam are about the metadata properties. I am not sure if I put useInExcerpt = true on metadata properties , it will help in indexing the actual content of the pdf for useInExcerpt
Is there way I can use useInExcerpt on actual content of the pdfs so that the content of the pdf gets indexed and not the metadata properties?

Here are the dam indexes (all metadata)

Let me know. I appreciate your help

Shashi_Mulugu · 8/25/20

Can you check if you can add original rendition to fulltext search index and take it from there? https://jackrabbit.apache.org/oak/docs/query/lucene.html... if not i would recommend to use Solr/other search engines and index content to it which has OOTB fulltext search with excerpts enabled with optimal query performance