Your achievements

Level 1

0% to

Level 2

Tip /
Sign in

Sign in to Community

to gain points, level up, and earn exciting badges like the new
Bedrock Mission!

Learn more

View all

Sign in to view all badges

SOLVED

SimpleSearch with Unicode Characters Takes Long Time/Crashes Instance

15473203
Level 2
Level 2

We are using AEM 6.3 SP1 and have cases where using SimpleSearch has brought down our AEM instances with one user doing a query with unicode characters. A simple example is something like

import com.day.cq.search.SimpleSearch; import com.day.cq.search.Predicate; import com.day.cq.search.result.SearchResult; SimpleSearch simpleSearch = resource.adaptTo(SimpleSearch.class); simpleSearch.setQuery("Knights of Columbus (@KofC) | TwitterPlan de Mobilité d Entreprise - PDFAllt om Bilar – Sveriges största motorsajt | Expressen | Allt om BilarStandard Bank Online Banking - Search Results | TimeErrors | Developers"); Predicate p = new Predicate("path", "/content"); simpleSearch.addPredicate(p); SearchResult searchResult = simpleSearch.getResult();

That is an example of a query that made one of our AEM nodes completely unresponsive. We never ran into this issue using AEM 5.6.1, but it has happened a few times with various unicode characters in AEM 6.3. Currently we are working around it by stripping out all non-ASCII characters before doing the search. Removing the unicode characters, the queries are very fast.

Is there a permanent fix for this?

Edit: Sort of off topic, if anyone can let me know how to format the code so it shows on multiple lines I would appreciate it.

1 Accepted Solution
kautuk_sahni
Correct answer by
Employee
Employee

Hi,

Please have a look at SimpleSearch ("The Adobe AEM Quickstart and Web Application.")

void setQuery(String query), accepts String type. Try unicode characters directly in strings in the code, by escaping the with \u.

//     The danish letters Æ Ø Å

       String myString = "\u00C6\u00D8\u00C5" ;

View solution in original post

0 Replies
kautuk_sahni
Correct answer by
Employee
Employee

Hi,

Please have a look at SimpleSearch ("The Adobe AEM Quickstart and Web Application.")

void setQuery(String query), accepts String type. Try unicode characters directly in strings in the code, by escaping the with \u.

//     The danish letters Æ Ø Å

       String myString = "\u00C6\u00D8\u00C5" ;

15473203
Level 2
Level 2

That does seem to work better, but how would I achieve this without a String literal? This is input that is coming in from a http query parameter so I can't just use a String literal like that.

Kunwar
Employee
Employee

Can you enable debug logs for the search and help me with the query that is generated at the end? Maybe we can somehow tune the index definitions and make it better performing ?

15473203
Level 2
Level 2

I was thinking the unicode literal encoding was helping, but I later found that it was doing some weird String conversions and ended up just turning the unicode characters to question marks, so it really wasn't helping like I thought it was.

This particular search seems to break AEM in mutliple ways. One being the unicode characters, the other seems to be just the sheer number of words in the search. Even after removing the unicode characters I have to knock it down to "Knights of Columbus KofC TwitterPlan de Mobilit d Entreprise - PDFAllt om Bilar Sveriges strsta motorsajt Expressen Allt om BilarStandard Bank" before the search doesn't take down the instance.

16.05.2018 09:49:15.019 *INFO* [127.0.0.1 [1526482153050] GET /content/website/Search.html HTTP/1.1] com.day.cq.search.ext.impl.SimpleSearchImpl SimpleSearch is searching with the types: [cq:Page, dam:Asset] 16.05.2018 09:49:15.023 *DEBUG* [127.0.0.1 [1526482153050] GET /content/website/Search.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl executing query (URL): 13_group.group.1_path=%2fcontent%2fwebsite&13_group.group.p.or=true&14_group.2_group.1_path=%2fcontent%2fdam%2fotherwebsite%2fcenter%2fresource&14_group.2_group.1_path.self=true&14_group.2_group.2 _path=%2fcontent%2fotherwebsite%2fportal%2fauthenticated&14_group.2_group.2_path.self=true&14_group.2_group.p.not=true&14_group.2_group.p.or=true&15_group.p.not=true&15_group.primaryType=jcr%3acontent%2fjcr% 3aprimaryType&15_group.primaryType.value=nt%3aunstructured&16_group.hideInNav=jcr%3acontent%2fhideInNav&16_group.hideInNav.value=true&16_group.p.not=true&group.0_fulltext=Knights%20of%20Columbus%20(%40KofC)%20%7 c%20TwitterPlan%20de%20Mobilit%c3%a9%20d%20Entreprise%20-%20PDFAllt%20om%20Bilar%20%e2%80%93%20Sveriges%20st%c3%b6rsta%20motorsajt%20%7c%20Expressen%20%7c%20Allt%20om%20BilarStandard%20Bank%20Online%20Banking%20 -%20Search%20Results%20%7c%20TimeErrors%20%7c%20Developers&group.0_fulltext.relPath=&group.1_fulltext=Knights%20of%20Columbus%20(%40KofC)%20%7c%20TwitterPlan%20de%20Mobilit%c3%a9%20d%20Entreprise%20-%20PDFAllt%2 0om%20Bilar%20%e2%80%93%20Sveriges%20st%c3%b6rsta%20motorsajt%20%7c%20Expressen%20%7c%20Allt%20om%20BilarStandard%20Bank%20Online%20Banking%20-%20Search%20Results%20%7c%20TimeErrors%20%7c%20Developers&group.1_fu lltext.relPath=%40jcr%3atitle&group.2_fulltext=Knights%20of%20Columbus%20(%40KofC)%20%7c%20TwitterPlan%20de%20Mobilit%c3%a9%20d%20Entreprise%20-%20PDFAllt%20om%20Bilar%20%e2%80%93%20Sveriges%20st%c3%b6rsta%20mot orsajt%20%7c%20Expressen%20%7c%20Allt%20om%20BilarStandard%20Bank%20Online%20Banking%20-%20Search%20Results%20%7c%20TimeErrors%20%7c%20Developers&group.2_fulltext.relPath=%40jcr%3adescription&group.p.or=true&lan guages=&lastModified.lowerBound=&lastModified.property=jcr%3acontent%2fcq%3alastModified&lastModified.upperBound=&mimeTypes=jcr%3acontent%2fjcr%3amimeType&mimeTypes.value=&nodeTypes.p.or=true&nodeTypes.type=dam% 3aAsset&orderByScore=%40jcr%3ascore&orderByScore.sort=desc&p.excerpt=true&p.limit=10&p.offset=0&path=%2fcontent&tags=&tags.property=jcr%3acontent%2fcq%3atags 16.05.2018 09:49:15.023 *DEBUG* [127.0.0.1 [1526482153050] GET /content/website/Search.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl executing query (predicate tree): ROOT=group: limit=10, offset=0, excerpt=true[     {group=group: or=true[         {0_fulltext=fulltext: fulltext=Knights of Columbus (@KofC) | TwitterPlan de Mobilité d Entreprise - PDFAllt om Bilar – Sveriges största motorsajt | Expressen | Allt om BilarStandard Bank Online Banking -  Search Results | TimeErrors | Developers, relPath=}         {1_fulltext=fulltext: fulltext=Knights of Columbus (@KofC) | TwitterPlan de Mobilité d Entreprise - PDFAllt om Bilar – Sveriges största motorsajt | Expressen | Allt om BilarStandard Bank Online Banking -  Search Results | TimeErrors | Developers, relPath=@jcr:title}         {2_fulltext=fulltext: fulltext=Knights of Columbus (@KofC) | TwitterPlan de Mobilité d Entreprise - PDFAllt om Bilar – Sveriges största motorsajt | Expressen | Allt om BilarStandard Bank Online Banking -  Search Results | TimeErrors | Developers, relPath=@jcr:description}     ]}     {path=path: path=/content}     {languages=language: language=null}     {tags=tagid: property=jcr:content/cq:tags, tagid=null}     {mimeTypes=property: property=jcr:content/jcr:mimeType, value=null}     {lastModified=daterange: property=jcr:content/cq:lastModified, lowerBound=null, upperBound=null}     {orderByScore=orderby: orderby=@jcr:score, sort=desc}     {13_group=group: [         {group=group: or=true[             {1_path=path: path=/content/website}         ]}     ]}     {14_group=group: [         {2_group=group: not=true, or=true[             {1_path=path: path=/content/dam/otherwebsite/center/resource, self=true}             {2_path=path: path=/content/otherwebsite/portal/authenticated, self=true}         ]}     ]}     {15_group=group: not=true[         {primaryType=property: property=jcr:content/jcr:primaryType, value=nt:unstructured}     ]}     {16_group=group: not=true[         {hideInNav=property: property=jcr:content/hideInNav, value=true}     ]}     {nodeTypes=group: or=true[         {type=type: type=cq:Page}         {type=type: type=dam:Asset}     ]} ] 16.05.2018 09:49:15.040 *DEBUG* [127.0.0.1 [1526482153050] GET /content/website/Search.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl xpath query: (/jcr:root/content/website//element(*, cq:Page)[(jcr:contains(., 'Knights of Columbus (@KofC) | TwitterPlan de Mobilité d Entreprise - PDFAllt om Bilar – Sveriges största motorsajt | Expressen | Allt om BilarStandard Bank Online Banking - Search Results | TimeErrors | Developers') or jcr:contains(@jcr:title, 'Knights of Columbus (@KofC) | TwitterPlan de Mobilité d Entreprise - PDFAllt om Bilar – Sveriges största motorsajt | Expressen | Allt om BilarStandard Bank Online Banking - Search Results | TimeErrors | Developers') or jcr:contains(@jcr:description, 'Knights of Columbus (@KofC) | TwitterPlan de Mobilité d Entreprise - PDFAllt om Bilar – Sveriges största motorsajt | Expressen | Allt om BilarStandard Bank Online Banking - Search Results | TimeErrors | Developers')) and not(jcr:content/@jcr:primaryType = 'nt:unstructured') and not(jcr:content/@hideInNav = 'true')] | /jcr:root/content/website//element(*, dam:Asset)[(jcr:contains(., 'Knights of Columbus (@KofC) | TwitterPlan de Mobilité d Entreprise - PDFAllt om Bilar – Sveriges största motorsajt | Expressen | Allt om BilarStandard Bank Online Banking - Search Results | TimeErrors | Developers') or jcr:contains(@jcr:title, 'Knights of Columbus (@KofC) | TwitterPlan de Mobilité d Entreprise - PDFAllt om Bilar – Sveriges största motorsajt | Expressen | Allt om BilarStandard Bank Online Banking - Search Results | TimeErrors | Developers') or jcr:contains(@jcr:description, 'Knights of Columbus (@KofC) | TwitterPlan de Mobilité d Entreprise - PDFAllt om Bilar – Sveriges största motorsajt | Expressen | Allt om BilarStandard Bank Online Banking - Search Results | TimeErrors | Developers')) and not(jcr:content/@jcr:primaryType = 'nt:unstructured') and not(jcr:content/@hideInNav = 'true')])/rep:excerpt(.) order by @jcr:score descending, @jcr:score descending 16.05.2018 09:49:15.058 *DEBUG* [127.0.0.1 [1526482153050] GET /content/website/Search.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl xpath query creation took 33 ms ... 16.05.2018 10:19:47.524 *DEBUG* [127.0.0.1 [1526482153050] GET /content/website/Search.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl entire query execution took 1832501 ms
15473203
Level 2
Level 2

Kunwar, have you had a chance to look at this?

15473203
Level 2
Level 2

Kunwar, have you had a chance to look at this?

MattDrees
Level 1
Level 1

We've run into the same problem. In our case the search string was relatively short: классификация+ЮНКТАД .

MattDrees
Level 1
Level 1

We figured out that, in our case, our problem was we were first converting the user's string to a byte array (via UTF-8), and then converting it back to a string (via ascii). The resulting garbage string was what we fed into SimpleSearch. It works if the original string uses ascii characters, but if not, the result is a live-lock of some kind. So probably a jackrabbit/lucene bug, in some sense, but we had bad code that triggered it.