I'm trying to determine if there's a way to create a custom Predicate to handle searches for text that contains accented characters.
The problem I am trying to solve is that I have the string "Montréal" stored in a property on a node in the JCR, and want it to show up in search results if my query contains a search for "Montreal" or even "Montre".
I am trying to use the XPATH function fn:replace to do something like this:
replace('Montréal', '[éè]+', 'e')
Here's an example xpath query (run using the query tool in the CRX/DE):
/jcr:root/content/dam/mysite/en//* [ (@jcr:primaryType = 'dam:AssetContent' and jcr:like(fn:replace(fn:lower-case(data/master/@city), '[éè]+', 'e'),'%montre%')) ]
However, when I attempt to use it, I get the error:
expected: jcr:like | jcr:contains | jcr:score | xs:dateTime | fn:lower-case | fn:upper-case | fn:name | rep:similar | rep:spellcheck | rep:suggest
Is there some way to enable the replace function? Or any other way to meet this requirement when searching for accented characters?
Solved! Go to Solution.
Views
Replies
Total Likes
Hi Jamie,
You would need to play with how you want Lucene to analyse your query.
Luckily latest Lucene thats embedded into AEM contains various language Stem Factories,
For example Latvian: https://lucene.apache.org/core/8_0_0/analyzers-common/org/apache/lucene/analysis/lv/LatvianStemFilte...
You would need to play with your index to make it use the Stem Filter you need.
Please note: while it's possible to tweak Search quite a bit in AEM, it might be just easier to include a proper Search Engine like SOLR or ElasticSearch to do real word search optimisations.
Regards,
Peter
Hello Jaime, nice to meet you.
Have you tried something like
String value = predicate.get(PREDICATE_VALUE).toLowerCase(data/master/@city).replace("é", "e"); String query = String.format("fn:lower-case(@%s)='%s'", value);
Hi Jean,
Interesting. AFAIK, all custom predicates must boil down to an entirely xpath-based result. I'm unsure if we can do this sort of manipulation in Java, since the predicate needs to result in an XPath query that can run without context of the current value. Otherwise, things like indexes would not be able to work. I'll give this a shot either way and see if it works.
Hi @jamiec4451712, did you consider to use synonyms mechanism? I think it should solve your issue, without any additional java coding. You can find nice tutorial under https://medium.com/tech-learnings/how-to-enable-search-synonyms-in-aem-with-lucene-ccb780375eb4
Thanks for the response! I did look into this, but in AEM Cloud, indexes are immutable at runtime and the data that I am searching can change dynamically (as it's stored in content fragments that authors modify). I'm not sure there's a way for me to dynamically update this synonym list at runtime as authors modify content. Let me know if you've got any ideas about how to get past this constraint. Thank you!
Hi Jamie,
You would need to play with how you want Lucene to analyse your query.
Luckily latest Lucene thats embedded into AEM contains various language Stem Factories,
For example Latvian: https://lucene.apache.org/core/8_0_0/analyzers-common/org/apache/lucene/analysis/lv/LatvianStemFilte...
You would need to play with your index to make it use the Stem Filter you need.
Please note: while it's possible to tweak Search quite a bit in AEM, it might be just easier to include a proper Search Engine like SOLR or ElasticSearch to do real word search optimisations.
Regards,
Peter
I had faced a similar issue.
I will explain what I did to overcome that.
The requirement : There is a search bar, and in that user were using accented chars.
The problem : same. jcr:like & fn:replace didn't work.
What I did was, sent the search param as it is intp the backend (Java) through servlet, as I was building queries through a service there. Then I just encoded them in base64, and added the same in the query, as AEM keeps non-english chars in base64 encoded values.
Then just decoded the results in the FE ( but you can do that in Java as well.)