Stopwords Not Working as Expected in Custom Lucene Index (e.g. "are", "was", "that")
Hi Everyone,
I'm currently adding a custom stopwords.txt file to my custom Lucene index in AEM to filter out common stopwords during search. While most of the words in my list are being excluded as expected, I've noticed that some very common ones like "are", "was", and "that" are still being indexed and returned in search results.
My stopwords.txt file includes all of these terms (one per line), and I’ve confirmed the file is correctly referenced in the analyzer configuration for the index.
I’m wondering if anyone else has experienced this issue? Is there anything I might be missing related to:
File encoding or formatting of the stopwords.txt file?
Analyzer or tokenizer order/configuration?
Case sensitivity issues even with ignoreCase = true?
Possible overrides by other filters or analyzers?
Any suggestions or shared experiences would be greatly appreciated!
Thanks in advance!