Expand my Community achievements bar.

Guidelines for the Responsible Use of Generative AI in the Experience Cloud Community.

Registering a custom lucene token filter with AEM oak indexes

Avatar

Level 4

I am just trying to implement lemmatization for plural word search and implemented it via a Lucene custom token filter. However, AEM doesn't recognize the token filter which is part of the core bundle. Can someone help with an example of how to register a custom Lucene Token filter with AEM? 

4 Replies

Avatar

Community Advisor

Please refer to tokenizer section in the below blog post, there are already multiple tokenizers available.. why do you want to build custom one?

 

https://www.bounteous.com/insights/2018/06/07/aem-search-indexing-synonyms-filters-and-stop-words-oh...

Avatar

Level 4

Hi @Shashi_Mulugu , there is no filter that supports lemmatization for English, hence trying to implement a custom one.

Avatar

Administrator

An analyzer tokenizes text by performing any number of operations on it, which could include extracting words, discarding punctuation, removing accents from characters, lowercasing (also called normalizing), removing common words, reducing words to a root form (stemming), or changing words into the basic form (lemmatization).

 

See this article: https://www.albinsblog.com/2020/05/how-to-enable-case-insensitive-search-in-aem-lucene.html

Another one: https://aemcorner.com/search-in-aem/ 

 

I hope this helps.



Kautuk Sahni

Avatar

Level 4

@kautuk_sahni , thank you Kautuk, I have already gone through these articles. These articles provide information and instructions on how to use out of box tokenizers and filters to achieve a given requirement. However, I am using AEM 6.5.10 with Lucene version 4.7.1. which doesn't have any out of the box filters for lemmatization. I had to write a custom one to achieve lemmatization however I was not aware of how to register it with Lucene for it to be used with composition on Lucene indexes.

Eventually I managed to register it and got it working with composition on my custom oak index. Now the problem occurs when I restart AEM, the JCR repository won't start properly because of exceptions. I think the exception happens because the indexing happens before the custom filter [which is part of my project OSGi bundle] gets registered during the startup and Lucene is not able to find the custom filter. Is there a way to make the custom Lucene filter available to Lucene before it starts the indexing during the startup of AEM? Any help to get this working or in identifying an alternate solution will be appreciated.