Expand my Community achievements bar.

SOLVED

Update Lucene analyzers in CQ5.5

Avatar

Level 2

Hello

I'm implementing a search feature in CQ5.5. A requirement for search is that 'stemming' be used - search for 'builder' and find 'building', 'builds' etc.

This functionality can be had using, for example, org.apache.lucene.analysis.en.EnglishAnalyzer. However, the Lucene that comes with CQ5.5 is v303. Its available analyzers are:

$  find . -name "lucene-core*.jar" | xargs jar tf | grep "analysis/.*Analyzer"
org/apache/lucene/analysis/Analyzer.class
org/apache/lucene/analysis/KeywordAnalyzer.class
org/apache/lucene/analysis/PerFieldAnalyzerWrapper.class
org/apache/lucene/analysis/SimpleAnalyzer.class
org/apache/lucene/analysis/StopAnalyzer$1.class
org/apache/lucene/analysis/StopAnalyzer$SavedStreams.class
org/apache/lucene/analysis/StopAnalyzer.class
org/apache/lucene/analysis/WhitespaceAnalyzer.class
org/apache/lucene/analysis/standard/StandardAnalyzer$1.class
org/apache/lucene/analysis/standard/StandardAnalyzer$SavedStreams.class
org/apache/lucene/analysis/standard/StandardAnalyzer.class

Non of these support stemming. The earliest version where EnglishAnalyzer is available is 3.2. Is there a way to update the Lucene on an existing installation?

Alternatively, in next versions of Lucene, the analyzers live in a jar of their own - analyzers-common (http://lucene.apache.org/core/4_0_0/analyzers-common/overview-summary.html)

How would I expose these analyzers to the org.apache.jackrabbit.core.query.lucene.SearchIndex, the class that reads the SearchIndex tag in workspace.xml?

 

Thanks,

Eli

1 Accepted Solution

Avatar

Correct answer by
Level 10
You can write own EnglishAnalyzer then provide that through fragment bundle for the embedded repository bundle (com.day.crx.sling.server). Refer an sample for lucene excerpt at [1] on similar lines you can implement here.

[1]  http://aemfaq.blogspot.com/2013/09/how-to-override-lucene-excerpt-provider.html

View solution in original post

4 Replies

Avatar

Level 10

You can create a Java OSGi fragment bundle that contains the updated Java classes that contains the supported functionality that you want to use. One of the most powerful aspects of AEM is if it does not have OOTB functionality that you need - you can create your own OSGi bundles that contain Java that you need and you can write custom front end components that call the backend service. 

Avatar

Level 2

Thank for the reply! So, suppose I wrap the lucene-core jar with OSGi metadata. This fragment would then, supposedly, export org.apache.lucene.analysis.*. Will org.apache.jackrabbit.core.query.lucene.SearchIndex become aware of the new analyzers?

Avatar

Correct answer by
Level 10
You can write own EnglishAnalyzer then provide that through fragment bundle for the embedded repository bundle (com.day.crx.sling.server). Refer an sample for lucene excerpt at [1] on similar lines you can implement here.

[1]  http://aemfaq.blogspot.com/2013/09/how-to-override-lucene-excerpt-provider.html

Avatar

Level 1

Hi,

I am trying to add a analyzer from lucene-analyzer-2.4.1.jar into the search-index tag of workspace.xml. As this jar is not available in OOB CQ 5.4 version, I can add this as a dependency to my custom OSGI bundle.

But, is it possible that CQ finds this jar while indexing?