Level 3

Solved

Searching for accented characters - Add support for XPATH replace in JCR (Jackrabbit Oak)

Forum|Forum|4 years ago
December 23, 2021
4 replies
3652 views

I'm trying to determine if there's a way to create a custom Predicate to handle searches for text that contains accented characters.

The problem I am trying to solve is that I have the string "Montréal" stored in a property on a node in the JCR, and want it to show up in search results if my query contains a search for "Montreal" or even "Montre".

I am trying to use the XPATH function fn:replace to do something like this:

replace('Montréal', '[éè]+', 'e')

Here's an example xpath query (run using the query tool in the CRX/DE):

/jcr:root/content/dam/mysite/en//*
[
(@jcr:primaryType = 'dam:AssetContent' and jcr:like(fn:replace(fn:lower-case(data/master/@city), '[éè]+', 'e'),'%montre%'))
]

However, when I attempt to use it, I get the error:

expected: jcr:like | jcr:contains | jcr:score | xs:dateTime | fn:lower-case | fn:upper-case | fn:name | rep:similar | rep:spellcheck | rep:suggest

Is there some way to enable the replace function? Or any other way to meet this requirement when searching for accented characters?

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.

Best answer by Peter_Puzanovs

Hi Jamie,

You would need to play with how you want Lucene to analyse your query.

Luckily latest Lucene thats embedded into AEM contains various language Stem Factories,

For example Latvian: https://lucene.apache.org/core/8_0_0/analyzers-common/org/apache/lucene/analysis/lv/LatvianStemFilterFactory.html

You would need to play with your index to make it use the Stem Filter you need.

Please note: while it's possible to tweak Search quite a bit in AEM, it might be just easier to include a proper Search Engine like SOLR or ElasticSearch to do real word search optimisations.

Regards,

Peter

Jeanmaradiaga

Level 3

Hello Jaime, nice to meet you.

Have you tried something like

String value = predicate.get(PREDICATE_VALUE).toLowerCase(data/master/@city).replace("é", "e");
String query = String.format("fn:lower-case(@%s)='%s'", value);

J

jamiec4451712Author

Level 3

Hi Jean,

Interesting. AFAIK, all custom predicates must boil down to an entirely xpath-based result. I'm unsure if we can do this sort of manipulation in Java, since the predicate needs to result in an XPath query that can run without context of the current value. Otherwise, things like indexes would not be able to work. I'll give this a shot either way and see if it works.

lukasz-m

Community Advisor

Hi @jamiec4451712, did you consider to use synonyms mechanism? I think it should solve your issue, without any additional java coding. You can find nice tutorial under https://medium.com/tech-learnings/how-to-enable-search-synonyms-in-aem-with-lucene-ccb780375eb4

J

jamiec4451712Author

Level 3

Thanks for the response! I did look into this, but in AEM Cloud, indexes are immutable at runtime and the data that I am searching can change dynamically (as it's stored in content fragments that authors modify). I'm not sure there's a way for me to dynamically update this synonym list at runtime as authors modify content. Let me know if you've got any ideas about how to get past this constraint. Thank you!

Peter_Puzanovs

Accepted solution

Community Advisor

Hi Jamie,

You would need to play with how you want Lucene to analyse your query.

Luckily latest Lucene thats embedded into AEM contains various language Stem Factories,

For example Latvian: https://lucene.apache.org/core/8_0_0/analyzers-common/org/apache/lucene/analysis/lv/LatvianStemFilterFactory.html

You would need to play with your index to make it use the Stem Filter you need.

Please note: while it's possible to tweak Search quite a bit in AEM, it might be just easier to include a proper Search Engine like SOLR or ElasticSearch to do real word search optimisations.

Regards,

Peter

Anmol_Bhardwaj

Community Advisor

I had faced a similar issue.

I will explain what I did to overcome that.

The requirement : There is a search bar, and in that user were using accented chars.

The problem : same. jcr:like & fn:replace didn't work.

What I did was, sent the search param as it is intp the backend (Java) through servlet, as I was building queries through a service there. Then I just encoded them in base64, and added the same in the query, as AEM keeps non-english chars in base64 encoded values.

Then just decoded the results in the FE ( but you can do that in Java as well.)

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded