Expand my Community achievements bar.

SOLVED

Searching for accented characters - Add support for XPATH replace in JCR (Jackrabbit Oak)

Avatar

Level 4

I'm trying to determine if there's a way to create a custom Predicate to handle searches for text that contains accented characters.

The problem I am trying to solve is that I have the string "Montréal" stored in a property on a node in the JCR, and want it to show up in search results if my query contains a search for "Montreal" or even "Montre".

 

I am trying to use the XPATH function fn:replace to do something like this:

replace('Montréal', '[éè]+', 'e')

 

Here's an example xpath query (run using the query tool in the CRX/DE):

/jcr:root/content/dam/mysite/en//*
[
(@jcr:primaryType = 'dam:AssetContent' and jcr:like(fn:replace(fn:lower-case(data/master/@city), '[éè]+', 'e'),'%montre%'))
]

 

However, when I attempt to use it, I get the error:

expected: jcr:like | jcr:contains | jcr:score | xs:dateTime | fn:lower-case | fn:upper-case | fn:name | rep:similar | rep:spellcheck | rep:suggest

 

Is there some way to enable the replace function? Or any other way to meet this requirement when searching for accented characters?

1 Accepted Solution

Avatar

Correct answer by
Community Advisor

Hi Jamie,

 

You would need to play with how you want Lucene to analyse your query.

 

Luckily latest Lucene thats embedded into AEM contains various language Stem Factories,

 

For example Latvian: https://lucene.apache.org/core/8_0_0/analyzers-common/org/apache/lucene/analysis/lv/LatvianStemFilte... 

 

You would need to play with your index to make it use the Stem Filter you need.

 

Please note: while it's possible to tweak Search quite a bit in AEM, it might be just easier to include a proper Search Engine like SOLR or ElasticSearch to do real word search optimisations.

 

Regards,

Peter

View solution in original post

6 Replies

Avatar

Level 4

Hello Jaime, nice to meet you.

 

Have you tried something like 

 

String value = predicate.get(PREDICATE_VALUE).toLowerCase(data/master/@city).replace("é", "e");
String query = String.format("fn:lower-case(@%s)='%s'", value);

 

Avatar

Level 4

Hi Jean,

 

Interesting. AFAIK, all custom predicates must boil down to an entirely xpath-based result. I'm unsure if we can do this sort of manipulation in Java, since the predicate needs to result in an XPath query that can run without context of the current value. Otherwise, things like indexes would not be able to work. I'll give this a shot either way and see if it works.

Avatar

Community Advisor

Hi @jamiec4451712, did you consider to use synonyms mechanism? I think it should solve your issue, without any additional java coding. You can find nice tutorial under https://medium.com/tech-learnings/how-to-enable-search-synonyms-in-aem-with-lucene-ccb780375eb4

Avatar

Level 4

Thanks for the response! I did look into this, but in AEM Cloud, indexes are immutable at runtime and the data that I am searching can change dynamically (as it's stored in content fragments that authors modify). I'm not sure there's a way for me to dynamically update this synonym list at runtime as authors modify content. Let me know if you've got any ideas about how to get past this constraint. Thank you!

Avatar

Correct answer by
Community Advisor

Hi Jamie,

 

You would need to play with how you want Lucene to analyse your query.

 

Luckily latest Lucene thats embedded into AEM contains various language Stem Factories,

 

For example Latvian: https://lucene.apache.org/core/8_0_0/analyzers-common/org/apache/lucene/analysis/lv/LatvianStemFilte... 

 

You would need to play with your index to make it use the Stem Filter you need.

 

Please note: while it's possible to tweak Search quite a bit in AEM, it might be just easier to include a proper Search Engine like SOLR or ElasticSearch to do real word search optimisations.

 

Regards,

Peter

Avatar

Community Advisor

I had faced a similar issue.

I will explain what I did to overcome that.

The requirement : There is a search bar, and in that user were using accented chars.

The problem : same. jcr:like & fn:replace didn't work.

What I did was, sent the search param as it is intp the backend (Java) through servlet, as I was building queries through a service there. Then I just encoded them in base64, and added the same in the query, as AEM keeps non-english chars in base64 encoded values.

Then just decoded the results in the FE ( but you can do that in Java as well.)