Assets using tags with special characters are not returned using omnisearchbar

Avatar

Avatar

TimDonovanUK

Avatar

TimDonovanUK

TimDonovanUK

26-06-2020

Hi,

If I create a tag with the name "pröva", AEM will create a cq:tag with:

  • the jcr:name set to "prova" - i.e. the 'ö' is stripped out and replace with 'o'
  • the jcr:title set to "pröva" - i.e. the 'ö' is not stripped outI

If I now tag an asset with this tag, the asset nodes metadata\cq:tags property references the jcr:name of the tag and not the jcr:title of the tag, e.g. the asset will now have cq:tags set to "prova".

This means when using the omnisearchbar to try and retrieve the asset using the native spelling of the tag (i.e. searching for "pröva") nothing is returned.

This is because the omnisearchbar performs the following query:

(/jcr:root/content/dam//element(*, nt:folder)[(jcr:contains(., 'pröva'))] | /jcr:root/content/dam//element(*, dam:Asset)[(jcr:contains(., 'pröva'))])

And of course jcr:contains will only look at the cq:tags property and won't find pröva anywhere.

I have contacted Adobe support, but they told me to " update the indexes definition so that tags jcr:title are also part of the index" which as far as I can tell is a useless response.

Are the only options:

  • extend the omnisearch functionality to actually work with foreign tags (possibly by stripping out the foreign chars), or
  • every time someone tags an assets, also store the cq:tag title (not name) field as a hidden field, so that jcr:contains returns properly against special chars

Appreciate any guidance, as Adobe support were unable to help, despite insisting AEM has full support for languages.

indexing omnisearchbar special characters tags

Accepted Solutions (1)

Accepted Solutions (1)

Avatar

Avatar

Jörg_Hoh

Employee

Total Posts

3.0K

Likes

910

Correct Answer

1.0K

Avatar

Jörg_Hoh

Employee

Total Posts

3.0K

Likes

910

Correct Answer

1.0K
Jörg_Hoh
Employee

27-06-2020

Hi Tim,

 

In other words, the problem is just looking for the name of the tags (because it's part of the reference to the tag), and not the jcr:title of the tags, which are referenced from the assts. Is that correct?

 

From a technical point of view this is expected, because JCR imposes some limitations on node names. That means if jcr:title does not match the name, you won't find the asset. Another limitation is also, that if you change the title of a tag but not its name, and you are using this new name to find any asset, you won't get any result.

In both cases it's the problem that the jcr:title of a referenced path is not considered for search, thus more likely a missing feature than a bug.

 

 

Answers (2)

Answers (2)

Avatar

Avatar

aemmarc

Employee

Avatar

aemmarc

Employee

aemmarc
Employee

26-06-2020

Search is built on Lucene.

 

So this isn't so much an omni-search issue rather than the index definitions are not tokenizing the characters.

For example, ootb AEM will not tokenize Chinese. 

 

You need to look at adding an analyzer to your index definition to handle your Swedish word 

https://lucene.apache.org/core/4_7_0/analyzers-common/overview-summary.html

 

You can probably get away with the StandardAnalyzer.

 

There's some rough steps on how you can do this if you scroll most the way down this article : 

Look for this section "SPECIFYING THE ANALYZER CLASS DIRECTLY"

https://helpx.adobe.com/ca/experience-manager/6-3/sites/deploying/using/queries-and-indexing.html#Co...

 

 

Good luck.

 

Avatar

Avatar

Veena_Vikram

MVP

Avatar

Veena_Vikram

MVP

Veena_Vikram
MVP

26-06-2020

 @Jörg_Hoh Any help you can give for this question ?