In our application, we have multiple market search, both pages and dam. Below is our basic query. Search works fine for most markets, but for China market, seems like we are getting lot of irrelevant results. Not able to find how Lucene search is working for non-English languages. Are there any configurations required for this? Also, do we have any specific Analyzer configuration for Chinese language?
Normally, AEM tries to index for English Language, Lucene by standard also has everything configured for english language, indexes are also setup to follow English semantics.
AEM/OAK/Lucene/Java does not do any magic, it only crunches your data into numbers(hashes/hello inverted index), compares numbers of the matches and shows you them in the certain order. When you get irrelevant results it means that your indexes cotain irrelevant data. Therefore you need to correct:
a) How the data get's into your indexes
b) How you retrieve data from your indexes
It's fairly hard to get this 'right' just with plain Oak-Lucene integration.
Please consider using Oak Solr extension that provide support for Chinese language and human readable format of configuration.
Also, can recommend recent book on Relevancy by Doug