AEM 6.2 - Oak Lucene Indexes - Configuring Composing Analyzer

Avatar

Avatar
Validate 1
Level 1
maximen66097493
Level 1

Likes

3 likes

Total Posts

4 posts

Correct reply

0 solutions
Top badges earned
Validate 1
Boost 3
Boost 1
View profile

Avatar
Validate 1
Level 1
maximen66097493
Level 1

Likes

3 likes

Total Posts

4 posts

Correct reply

0 solutions
Top badges earned
Validate 1
Boost 3
Boost 1
View profile
maximen66097493
Level 1

26-07-2017

Hello,

I'm looking for a way to have a configuration using Composing Analyzer in order to have a different filters between the analyzer used for the index and the query.

On the Oak Lucene Documentation ( Jackrabbit Oak – Lucene Index ​) i see that we can configure default or pathText analyzers and maybe others due to the ... but i can't find the documentation related to the exhaustive list of the analyzers we can use.

Oak Lucene documentation

+ sampleIndex
  
- jcr:primaryType = "oak:QueryIndexDefinition"
  
+ analyzers
    
+ default
    
+ pathText
    
...

They seems also to be different than the one in the Apache Solr documentation ( Analyzers - Apache Solr Reference Guide - Apache Software Foundation  😞 index and query

<analyzer type="index">

    <tokenizer class="solr.StandardTokenizerFactory"/>

    <filter class="solr.LowerCaseFilterFactory"/>

    <filter class="solr.KeepWordFilterFactory" words="keepwords.txt"/>

    <filter class="solr.SynonymFilterFactory" synonyms="syns.txt"/>

</analyzer>

<analyzer type="query">

    <tokenizer class="solr.StandardTokenizerFactory"/>

    <filter class="solr.LowerCaseFilterFactory"/>

</analyzer>

Does anyone know where i could find the documentation on the list of configurable analyzers for Oak Lucene index or how can i configure a different analyzer for query and for index ?

Best regards,

Maxime Nougarede

DigitasLbi

Accepted Solutions (0)

Answers (4)

Answers (4)

Avatar

Avatar
Level 1
ThomasMueller1
Level 1

Likes

0 likes

Total Posts

1 post

Correct reply

0 solutions
View profile

Avatar
Level 1
ThomasMueller1
Level 1

Likes

0 likes

Total Posts

1 post

Correct reply

0 solutions
View profile
ThomasMueller1
Level 1

12-02-2020

We found this incorrect, and will remove "pathText" from the documentation. Sorry for the delay!

Avatar

Avatar
Validate 25
MVP
PuzanovsP
MVP

Likes

140 likes

Total Posts

543 posts

Correct reply

165 solutions
Top badges earned
Validate 25
Validate 10
Validate 1
Contributor 2
Ignite 10
View profile

Avatar
Validate 25
MVP
PuzanovsP
MVP

Likes

140 likes

Total Posts

543 posts

Correct reply

165 solutions
Top badges earned
Validate 25
Validate 10
Validate 1
Contributor 2
Ignite 10
View profile
PuzanovsP
MVP

27-07-2017

Dear Maxim,

Thank you for asking such interesting question indeed,

Looking at:

org/apache/jackrabbit/oak/plugins/index/lucene/IndexDefinition.java

We can see that the assembly is happening in the:

private static Map<String, Analyzer> collectAnalyzers(NodeState defn) {

  Map<String, Analyzer> analyzerMap = newHashMap();

  NodeStateAnalyzerFactory factory = new NodeStateAnalyzerFactory(LuceneIndexConstants.VERSION);

  NodeState analyzersTree = defn.getChildNode(LuceneIndexConstants.ANALYZERS);

   for (ChildNodeEntry cne : analyzersTree.getChildNodeEntries()) {

  Analyzer a = factory.createInstance(cne.getNodeState());

  analyzerMap.put(cne.getName(), a);

  }

   if (getOptionalValue(analyzersTree, INDEX_ORIGINAL_TERM, false) && !analyzerMap.containsKey(ANL_DEFAULT)) {

  analyzerMap.put(ANL_DEFAULT, new OakAnalyzer(VERSION, true));

  }

   return ImmutableMap.copyOf(analyzerMap);

}

Which then are set during the IndexDefinition build to variable:

this.analyzers = collectAnalyzers(defn);

Which then are used in the following area:

if (analyzers.containsKey(LuceneIndexConstants.ANL_DEFAULT)){

  defaultAnalyzer = analyzers.get(LuceneIndexConstants.ANL_DEFAULT);

}

So, to answer your question. You can define as many analysers as you want, but as per current code base only the 'default' analyser will be used.


Regards,

Peter

Avatar

Avatar
Validate 1
Level 1
maximen66097493
Level 1

Likes

3 likes

Total Posts

4 posts

Correct reply

0 solutions
Top badges earned
Validate 1
Boost 3
Boost 1
View profile

Avatar
Validate 1
Level 1
maximen66097493
Level 1

Likes

3 likes

Total Posts

4 posts

Correct reply

0 solutions
Top badges earned
Validate 1
Boost 3
Boost 1
View profile
maximen66097493
Level 1

27-07-2017

Hi,

Thanks a lot for your help, meanwhile i found a part of my answer in the Jackrabbit Oak – Lucene Index documentation where they wrote:

@Note that currently only one analyzer can be configured per index. Its not possible to specify separate analyzer for query and index time currently.

https://jackrabbit.apache.org/oak/docs/query/lucene.html

But i still want to understand difference between default and pathText .

Avatar

Avatar
Validate 25
Level 10
smacdonald2008
Level 10

Likes

1,409 likes

Total Posts

12,671 posts

Correct reply

2,278 solutions
Top badges earned
Validate 25
Validate 10
Validate 1
Give back 900
Give back 600
View profile

Avatar
Validate 25
Level 10
smacdonald2008
Level 10

Likes

1,409 likes

Total Posts

12,671 posts

Correct reply

2,278 solutions
Top badges earned
Validate 25
Validate 10
Validate 1
Give back 900
Give back 600
View profile
smacdonald2008
Level 10

26-07-2017

I have asked within Adobe if anyone knows if such docs exist.