Expand my Community achievements bar.

Don’t miss the AEM Skill Exchange in SF on Nov 14—hear from industry leaders, learn best practices, and enhance your AEM strategy with practical tips.

Is it possible to select a lucene index to perform queries?

Avatar

Level 2

Hello,

I'm currently working with AEM 6.2 and I would like to know if I can select (deliberately) a lucene index to perform different queries. For example, imagine 2 different teams working on these paths:

  • TeamA works on /content/dam/stuff and defined an index named TAIndex
  • TeamB works on /content/dam/stuff/tbStuff and defined its own index named TBIndex

Depending on the index cost, path restrictions, filters... TeamA could be performing queries against TBIndex and thus getting an invalid response. So, I would like to make TeamA always work with TAIndex and TeamB work with TBIndex.

Different options come to mind:

  • Add two new fields to every asset  (e.g. belongsTA and belongsTB). Thus all the documents in TAIndex will include belongsTA and every TeamA query will include the clause belongsTA:true (the same for TeamB). I don't like it much as it implies new fields that need to be managed, bigger index and extra filter.
  • Use lucene native queries (I wasn't able to make them work in AEM :( ).

Any help is appreciated.

Regards.

8 Replies

Avatar

Level 10

I recommend watching this GEMS session on Indexing - it will help you: 

https://docs.adobe.com/ddc/en/gems/oak-lucene-indexes.html

Typically Indexes are used to improve query results - as discussed in the GEMS session.

Avatar

Level 2

Hello,

Thanks for the answer. Sure, that document is my bedtime book :) and according to it: "1. Q: For precise index selection is it advised always use native query? * A: if you can't manage to have the query engine select a specific index based on cost tweaks (recommended way), native query is your only alternative".  That statement was more than a year ago, so I was expecting a better way to solve the index collision. Anyway, I will keep working on make native queries work.

Regards.

Avatar

Level 10

That GEMS session is still the best material we have for Indexing. Also the native Sling docs are good: 

http://jackrabbit.apache.org/oak/docs/query/lucene.html

Avatar

Level 9

Hi Alvaro Cabrerizo,

Parent node is common for both teams & I won;t recommend duplicate index Nor adding the property.   Instead create a single index & have path restriction hidden based on team the user belong to.

Thanks,

Avatar

Level 2

Many thanks for the answer.

The main drawback for that solution is that TeamA and TeamB don't need the same index fields. Moreover, TeamB index definition contains a list of excluded paths. Merging both indexes in one penalizes TeamB as:

  • The reindex process takes longer.
  • The index management is more difficult as any modification need to be agreed by both teams.
  • The index size grows penalizing both teams.
  • Not sure if within AEM we can have 2 lucene fields comming from the same metadata field. For example if both teams need to index the metadata field myTitle but need different processing pipeline (e.g. lowercase, stopwords, stemming...)

Regards.

Avatar

Level 9

Hi Alvaro Cabrerizo,

You might need to end up using native query,

Thanks,

Avatar

Level 2

I aggre with you MC Stuff,

We finally made native queries work. It took us more than expected because there was a misunderstanding on how they work. Currently the index cost is calculated based on the next formula cost = costPerExecution + (costPerEntry*entryCount). Thus native queries (e.g. /jrc:root/content/dam//element(*,dam:Asset)[rep:native('myIndex',...) don't force the selection of myIndex but tweak the cost formula setting the entryCount  to 1 (i.e.  cost('myIndex') = costPerExecution + (costPerEntry*1) )

The proposed solution was to make teamB modify their index to include the property entryCount=VeryBigNumberHere. Thus every query from teamA will discard TBIndex as its cost is too high (1+(1*VeryBigNumberHere)) while teamB queries, including the native clause, will use TBIndex as its cost is 2. 

Moreover we can find cornercases in case teamA modifies its index, for example setting costPerEntry = 0. In that situation, the cost of TeamB native queries is 2 for TBIndex but 1 for TAIndex ( 1 = 1+(0*entryCount)), thus selecting the wrong index.

Regards.

Avatar

Level 2

Hello,

I've checked that using entryCount=VeryBigNumberHere does not work, as the indexPlanner uses the smaller value between entryCount and the number of documents within the index. 

Regards.