how to configure stop words in searchindex to reduce index size | Community
Skip to main content
October 16, 2015
Solved

how to configure stop words in searchindex to reduce index size

  • October 16, 2015
  • 2 replies
  • 858 views

Hi ,
We have around half million assets in our repository, because of this the index size is huge. I have tried various techniques to reduce index size like purging workflow instance, audit  etc.
Also tried almost every thing by following every step of this doc .http://www.wemblog.com/2012/01/how-to-reduce-lucene-index-size-in-cq.html

The only thing that i am missing is to configures list of stop words as it appears that by default CQ does FullText indexing of every word causing high index size.
While digging deep i found various stop words list under lucene-analyzers-3.6.0\org\apache\lucene\analysis structure.

However not sure whether it is used or not. How can we configured stop words setting for searchindex in CQ 5.5/CQ 5.6 .

NB: I am not using SOLR.

Thanks
Shishir Srivastava
This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.
Best answer by Sham_HC

AFAIK CQ 5.5 & above uses the com.day.crx.query.lucene.LuceneHandler for the SearchIndex by default which does not have stopword filtering.  Check your repository.xml, workspace.xml if lucence is configured for any custom handler apart from OOB.  If so check your implementation.  If you are using OOB configuration what made you to come to conclusion index space used by stop words setting

2 replies

Sham_HC
Sham_HCAccepted solution
Level 10
October 16, 2015

AFAIK CQ 5.5 & above uses the com.day.crx.query.lucene.LuceneHandler for the SearchIndex by default which does not have stopword filtering.  Check your repository.xml, workspace.xml if lucence is configured for any custom handler apart from OOB.  If so check your implementation.  If you are using OOB configuration what made you to come to conclusion index space used by stop words setting

Yogesh_Upadhyay
Level 6
October 16, 2015

Hello Shishir,

I am not sure if you checked tika config file before using it. There was some syntax error in that. I just fixed it and attached it again. Also make sure that you disable supportHighlight feature. I have also attached indexing_config file (With some more node type included) that you can use.

Yogesh