Expand my Community achievements bar.

Learn about Edge Delivery Services in upcoming GEM session

getExcerpt() gives only last few words of the extract

Avatar

Level 2

Hi All,

I am using AEM OOTB QueryBuilder API to perform search on pages and assets. Requirement is to display Title and Excerpt from Pages and PDFs.

But getExcerpt() gives only last few words of the extract for pages. And for PDF its not giving any extract within PDF but only gives extract if Author sets jcr:description.

I checked many documents and threads but I am not getting step by step procedure to check if I am missing anything.

Any pointers will be helpful.

Regards,

MJ

5 Replies

Avatar

Community Advisor

Hi Manish

       That is how that API method is used for . It will only give you excerpt of the page . It will be a short paragraph from the description. What is your exact requirement ?

Thanks

Veena

Avatar

Level 2

Hi Veena,

Thanks for your prompt answer. Here is the issue:

For Pages: Excerpt should return a short text having the search keyword, i.e. extract containing search keyword and few words before and after the keyword in the text. For e.g. if I am searching for health and if a page contains "health" in its description, it should return something like this - "We all are conscious about health, which is very important to survive..."

But what I am getting is last few words of the content of that node.

Another issue, I need to set particular fields for excerpt search. For that I tried to configure in com.day.cq.search.impl.builder.QueryBuilderImpl but seems its not picking up the configuration and only searches in the default fields which are "text" and "jcr:description".

For PDFs: getExcerpt expects jcr:description field to be populated by Author in order to return an excerpt. But like google search can I extract excerpt within document in case jcr:description not found. Authors mostly bulk upload PDFs and they do not fill jcr:description. Google reads PDFs for excerpt. Any pointers?

Thanks again!

-Manish

Avatar

Community Advisor

The API method may  not serve your purpose . I tried decompiling the "com.day.cq.search" API to understand how the getExcerpt method is implemented . It was not of much help though. What I can recommend you is to write a custom method and get your requirement done.

Avatar

Level 10

If you want to pull out more information  from a PDF - try lookgni at other APIs like PDFBOX.

Apache PDFBox | Cookbook - Text Extraction