Your achievements

Level 1

0% to

Level 2

Tip /
Sign in

Sign in to Community

to gain points, level up, and earn exciting badges like the new
Bedrock Mission!

Learn more

View all

Sign in to view all badges

SOLVED

how to configure stop words in searchindex to reduce index size

Avatar

Level 1

Hi ,
We have around half million assets in our repository, because of this the index size is huge. I have tried various techniques to reduce index size like purging workflow instance, audit  etc.
Also tried almost every thing by following every step of this doc .http://www.wemblog.com/2012/01/how-to-reduce-lucene-index-size-in-cq.html

The only thing that i am missing is to configures list of stop words as it appears that by default CQ does FullText indexing of every word causing high index size.
While digging deep i found various stop words list under lucene-analyzers-3.6.0\org\apache\lucene\analysis structure.

However not sure whether it is used or not. How can we configured stop words setting for searchindex in CQ 5.5/CQ 5.6 .

NB: I am not using SOLR.

Thanks
Shishir Srivastava
1 Accepted Solution

Avatar

Correct answer by
Level 10

AFAIK CQ 5.5 & above uses the com.day.crx.query.lucene.LuceneHandler for the SearchIndex by default which does not have stopword filtering.  Check your repository.xml, workspace.xml if lucence is configured for any custom handler apart from OOB.  If so check your implementation.  If you are using OOB configuration what made you to come to conclusion index space used by stop words setting

View solution in original post

2 Replies

Avatar

Correct answer by
Level 10

AFAIK CQ 5.5 & above uses the com.day.crx.query.lucene.LuceneHandler for the SearchIndex by default which does not have stopword filtering.  Check your repository.xml, workspace.xml if lucence is configured for any custom handler apart from OOB.  If so check your implementation.  If you are using OOB configuration what made you to come to conclusion index space used by stop words setting

Avatar

Level 5

Hello Shishir,

I am not sure if you checked tika config file before using it. There was some syntax error in that. I just fixed it and attached it again. Also make sure that you disable supportHighlight feature. I have also attached indexing_config file (With some more node type included) that you can use.

Yogesh