Expand my Community achievements bar.

SOLVED

how to configure stop words in searchindex to reduce index size

Avatar

Former Community Member

Hi ,
We have around half million assets in our repository, because of this the index size is huge. I have tried various techniques to reduce index size like purging workflow instance, audit  etc.
Also tried almost every thing by following every step of this doc .http://www.wemblog.com/2012/01/how-to-reduce-lucene-index-size-in-cq.html

The only thing that i am missing is to configures list of stop words as it appears that by default CQ does FullText indexing of every word causing high index size.
While digging deep i found various stop words list under lucene-analyzers-3.6.0\org\apache\lucene\analysis structure.

However not sure whether it is used or not. How can we configured stop words setting for searchindex in CQ 5.5/CQ 5.6 .

NB: I am not using SOLR.

Thanks
Shishir Srivastava
1 Accepted Solution

Avatar

Correct answer by
Level 10

AFAIK CQ 5.5 & above uses the com.day.crx.query.lucene.LuceneHandler for the SearchIndex by default which does not have stopword filtering.  Check your repository.xml, workspace.xml if lucence is configured for any custom handler apart from OOB.  If so check your implementation.  If you are using OOB configuration what made you to come to conclusion index space used by stop words setting

View solution in original post

2 Replies

Avatar

Correct answer by
Level 10

AFAIK CQ 5.5 & above uses the com.day.crx.query.lucene.LuceneHandler for the SearchIndex by default which does not have stopword filtering.  Check your repository.xml, workspace.xml if lucence is configured for any custom handler apart from OOB.  If so check your implementation.  If you are using OOB configuration what made you to come to conclusion index space used by stop words setting

Avatar

Level 5

Hello Shishir,

I am not sure if you checked tika config file before using it. There was some syntax error in that. I just fixed it and attached it again. Also make sure that you disable supportHighlight feature. I have also attached indexing_config file (With some more node type included) that you can use.

Yogesh