Registering a custom lucene token filter with AEM oak indexes | Community
Skip to main content
Level 4
September 4, 2023

Registering a custom lucene token filter with AEM oak indexes

  • September 4, 2023
  • 2 replies
  • 1378 views

I am just trying to implement lemmatization for plural word search and implemented it via a Lucene custom token filter. However, AEM doesn't recognize the token filter which is part of the core bundle. Can someone help with an example of how to register a custom Lucene Token filter with AEM? 

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.

2 replies

Shashi_Mulugu
Community Advisor
Community Advisor
September 4, 2023

Please refer to tokenizer section in the below blog post, there are already multiple tokenizers available.. why do you want to build custom one?

 

https://www.bounteous.com/insights/2018/06/07/aem-search-indexing-synonyms-filters-and-stop-words-oh-my

Level 4
September 4, 2023

Hi @shashi_mulugu , there is no filter that supports lemmatization for English, hence trying to implement a custom one.

kautuk_sahni
Community Manager
Community Manager
September 11, 2023

An analyzer tokenizes text by performing any number of operations on it, which could include extracting words, discarding punctuation, removing accents from characters, lowercasing (also called normalizing), removing common words, reducing words to a root form (stemming), or changing words into the basic form (lemmatization).

 

See this article: https://www.albinsblog.com/2020/05/how-to-enable-case-insensitive-search-in-aem-lucene.html

Another one: https://aemcorner.com/search-in-aem/ 

 

I hope this helps.

Kautuk Sahni
Level 4
September 11, 2023

@kautuk_sahni , thank you Kautuk, I have already gone through these articles. These articles provide information and instructions on how to use out of box tokenizers and filters to achieve a given requirement. However, I am using AEM 6.5.10 with Lucene version 4.7.1. which doesn't have any out of the box filters for lemmatization. I had to write a custom one to achieve lemmatization however I was not aware of how to register it with Lucene for it to be used with composition on Lucene indexes.

Eventually I managed to register it and got it working with composition on my custom oak index. Now the problem occurs when I restart AEM, the JCR repository won't start properly because of exceptions. I think the exception happens because the indexing happens before the custom filter [which is part of my project OSGi bundle] gets registered during the startup and Lucene is not able to find the custom filter. Is there a way to make the custom Lucene filter available to Lucene before it starts the indexing during the startup of AEM? Any help to get this working or in identifying an alternate solution will be appreciated.