Is it possible to select a lucene index to perform queries? | Community
Skip to main content
Level 2
May 4, 2017
Question

Is it possible to select a lucene index to perform queries?

  • May 4, 2017
  • 8 replies
  • 2535 views

Hello,

I'm currently working with AEM 6.2 and I would like to know if I can select (deliberately) a lucene index to perform different queries. For example, imagine 2 different teams working on these paths:

  • TeamA works on /content/dam/stuff and defined an index named TAIndex
  • TeamB works on /content/dam/stuff/tbStuff and defined its own index named TBIndex

Depending on the index cost, path restrictions, filters... TeamA could be performing queries against TBIndex and thus getting an invalid response. So, I would like to make TeamA always work with TAIndex and TeamB work with TBIndex.

Different options come to mind:

  • Add two new fields to every asset  (e.g. belongsTA and belongsTB). Thus all the documents in TAIndex will include belongsTA and every TeamA query will include the clause belongsTA:true (the same for TeamB). I don't like it much as it implies new fields that need to be managed, bigger index and extra filter.
  • Use lucene native queries (I wasn't able to make them work in AEM :( ).

Any help is appreciated.

Regards.

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.

8 replies

smacdonald2008
Level 10
May 5, 2017

I recommend watching this GEMS session on Indexing - it will help you: 

https://docs.adobe.com/ddc/en/gems/oak-lucene-indexes.html

Typically Indexes are used to improve query results - as discussed in the GEMS session.

Alvaro_C_Author
Level 2
May 5, 2017

Hello,

Thanks for the answer. Sure, that document is my bedtime book :) and according to it: "1. Q: For precise index selection is it advised always use native query? * A: if you can't manage to have the query engine select a specific index based on cost tweaks (recommended way), native query is your only alternative".  That statement was more than a year ago, so I was expecting a better way to solve the index collision. Anyway, I will keep working on make native queries work.

Regards.

smacdonald2008
Level 10
May 5, 2017

That GEMS session is still the best material we have for Indexing. Also the native Sling docs are good: 

http://jackrabbit.apache.org/oak/docs/query/lucene.html

MC_Stuff
Level 10
May 5, 2017

Hi Alvaro Cabrerizo,

Parent node is common for both teams & I won;t recommend duplicate index Nor adding the property.   Instead create a single index & have path restriction hidden based on team the user belong to.

Thanks,

Alvaro_C_Author
Level 2
May 5, 2017

Many thanks for the answer.

The main drawback for that solution is that TeamA and TeamB don't need the same index fields. Moreover, TeamB index definition contains a list of excluded paths. Merging both indexes in one penalizes TeamB as:

  • The reindex process takes longer.
  • The index management is more difficult as any modification need to be agreed by both teams.
  • The index size grows penalizing both teams.
  • Not sure if within AEM we can have 2 lucene fields comming from the same metadata field. For example if both teams need to index the metadata field myTitle but need different processing pipeline (e.g. lowercase, stopwords, stemming...)

Regards.

MC_Stuff
Level 10
May 6, 2017

Hi Alvaro Cabrerizo,

You might need to end up using native query,

Thanks,

Alvaro_C_Author
Level 2
May 6, 2017

I aggre with you MC Stuff,

We finally made native queries work. It took us more than expected because there was a misunderstanding on how they work. Currently the index cost is calculated based on the next formula cost = costPerExecution + (costPerEntry*entryCount). Thus native queries (e.g. /jrc:root/content/dam//element(*,dam:Asset)[rep:native('myIndex',...) don't force the selection of myIndex but tweak the cost formula setting the entryCount  to 1 (i.e.  cost('myIndex') = costPerExecution + (costPerEntry*1) )

The proposed solution was to make teamB modify their index to include the property entryCount=VeryBigNumberHere. Thus every query from teamA will discard TBIndex as its cost is too high (1+(1*VeryBigNumberHere)) while teamB queries, including the native clause, will use TBIndex as its cost is 2. 

Moreover we can find cornercases in case teamA modifies its index, for example setting costPerEntry = 0. In that situation, the cost of TeamB native queries is 2 for TBIndex but 1 for TAIndex ( 1 = 1+(0*entryCount)), thus selecting the wrong index.

Regards.

Alvaro_C_Author
Level 2
May 7, 2017

Hello,

I've checked that using entryCount=VeryBigNumberHere does not work, as the indexPlanner uses the smaller value between entryCount and the number of documents within the index. 

Regards.