Searching for accented characters - Add support for XPATH replace in JCR (Jackrabbit Oak) | Community
Skip to main content
Level 3
December 23, 2021
Solved

Searching for accented characters - Add support for XPATH replace in JCR (Jackrabbit Oak)

  • December 23, 2021
  • 4 replies
  • 3648 views

I'm trying to determine if there's a way to create a custom Predicate to handle searches for text that contains accented characters.

The problem I am trying to solve is that I have the string "Montréal" stored in a property on a node in the JCR, and want it to show up in search results if my query contains a search for "Montreal" or even "Montre".

 

I am trying to use the XPATH function fn:replace to do something like this:

replace('Montréal', '[éè]+', 'e')

 

Here's an example xpath query (run using the query tool in the CRX/DE):

/jcr:root/content/dam/mysite/en//*
[
(@jcr:primaryType = 'dam:AssetContent' and jcr:like(fn:replace(fn:lower-case(data/master/@city), '[éè]+', 'e'),'%montre%'))
]

 

However, when I attempt to use it, I get the error:

expected: jcr:like | jcr:contains | jcr:score | xs:dateTime | fn:lower-case | fn:upper-case | fn:name | rep:similar | rep:spellcheck | rep:suggest

 

Is there some way to enable the replace function? Or any other way to meet this requirement when searching for accented characters?

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.
Best answer by Peter_Puzanovs

Hi Jamie,

 

You would need to play with how you want Lucene to analyse your query.

 

Luckily latest Lucene thats embedded into AEM contains various language Stem Factories,

 

For example Latvian: https://lucene.apache.org/core/8_0_0/analyzers-common/org/apache/lucene/analysis/lv/LatvianStemFilterFactory.html 

 

You would need to play with your index to make it use the Stem Filter you need.

 

Please note: while it's possible to tweak Search quite a bit in AEM, it might be just easier to include a proper Search Engine like SOLR or ElasticSearch to do real word search optimisations.

 

Regards,

Peter

4 replies

Jeanmaradiaga
Level 3
December 23, 2021

Hello Jaime, nice to meet you.

 

Have you tried something like 

 

String value = predicate.get(PREDICATE_VALUE).toLowerCase(data/master/@city).replace("é", "e");
String query = String.format("fn:lower-case(@%s)='%s'", value);

 

Level 3
December 23, 2021

Hi Jean,

 

Interesting. AFAIK, all custom predicates must boil down to an entirely xpath-based result. I'm unsure if we can do this sort of manipulation in Java, since the predicate needs to result in an XPath query that can run without context of the current value. Otherwise, things like indexes would not be able to work. I'll give this a shot either way and see if it works.

lukasz-m
Community Advisor
Community Advisor
December 23, 2021

Hi @jamiec4451712, did you consider to use synonyms mechanism? I think it should solve your issue, without any additional java coding. You can find nice tutorial under https://medium.com/tech-learnings/how-to-enable-search-synonyms-in-aem-with-lucene-ccb780375eb4

Level 3
December 23, 2021

Thanks for the response! I did look into this, but in AEM Cloud, indexes are immutable at runtime and the data that I am searching can change dynamically (as it's stored in content fragments that authors modify). I'm not sure there's a way for me to dynamically update this synonym list at runtime as authors modify content. Let me know if you've got any ideas about how to get past this constraint. Thank you!

Peter_Puzanovs
Community Advisor
Peter_PuzanovsCommunity AdvisorAccepted solution
Community Advisor
December 24, 2021

Hi Jamie,

 

You would need to play with how you want Lucene to analyse your query.

 

Luckily latest Lucene thats embedded into AEM contains various language Stem Factories,

 

For example Latvian: https://lucene.apache.org/core/8_0_0/analyzers-common/org/apache/lucene/analysis/lv/LatvianStemFilterFactory.html 

 

You would need to play with your index to make it use the Stem Filter you need.

 

Please note: while it's possible to tweak Search quite a bit in AEM, it might be just easier to include a proper Search Engine like SOLR or ElasticSearch to do real word search optimisations.

 

Regards,

Peter

Anmol_Bhardwaj
Community Advisor
Community Advisor
January 5, 2022

I had faced a similar issue.

I will explain what I did to overcome that.

The requirement : There is a search bar, and in that user were using accented chars.

The problem : same. jcr:like & fn:replace didn't work.

What I did was, sent the search param as it is intp the backend (Java) through servlet, as I was building queries through a service there. Then I just encoded them in base64, and added the same in the query, as AEM keeps non-english chars in base64 encoded values.

Then just decoded the results in the FE ( but you can do that in Java as well.)