Hi ,
We have around half million assets in our repository, because of this the index size is huge. I have tried various techniques to reduce index size like purging workflow instance, audit etc.
Also tried almost every thing by following every step of this doc .http://www.wemblog.com/2012/
The only thing that i am missing is to configures list of stop words as it appears that by default CQ does FullText indexing of every word causing high index size.
While digging deep i found various stop words list under lucene-analyzers-3.6.0\org\apache\lucene\analysis structure.
Solved! Go to Solution.
Views
Replies
Total Likes
AFAIK CQ 5.5 & above uses the com.day.crx.query.lucene.LuceneHandler for the SearchIndex by default which does not have stopword filtering. Check your repository.xml, workspace.xml if lucence is configured for any custom handler apart from OOB. If so check your implementation. If you are using OOB configuration what made you to come to conclusion index space used by stop words setting
Views
Replies
Total Likes
AFAIK CQ 5.5 & above uses the com.day.crx.query.lucene.LuceneHandler for the SearchIndex by default which does not have stopword filtering. Check your repository.xml, workspace.xml if lucence is configured for any custom handler apart from OOB. If so check your implementation. If you are using OOB configuration what made you to come to conclusion index space used by stop words setting
Views
Replies
Total Likes
Hello Shishir,
I am not sure if you checked tika config file before using it. There was some syntax error in that. I just fixed it and attached it again. Also make sure that you disable supportHighlight feature. I have also attached indexing_config file (With some more node type included) that you can use.
Yogesh
Views
Replies
Total Likes