Level 2

Solved

SimpleSearch with Unicode Characters Takes Long Time/Crashes Instance

Forum|Forum|7 years ago
May 14, 2018
9 replies
2880 views

We are using AEM 6.3 SP1 and have cases where using SimpleSearch has brought down our AEM instances with one user doing a query with unicode characters. A simple example is something like

import com.day.cq.search.SimpleSearch; import com.day.cq.search.Predicate; import com.day.cq.search.result.SearchResult; SimpleSearch simpleSearch = resource.adaptTo(SimpleSearch.class); simpleSearch.setQuery("Knights of Columbus (@KofC) | TwitterPlan de Mobilité d Entreprise - PDFAllt om Bilar – Sveriges största motorsajt | Expressen | Allt om BilarStandard Bank Online Banking - Search Results | TimeErrors | Developers"); Predicate p = new Predicate("path", "/content"); simpleSearch.addPredicate(p); SearchResult searchResult = simpleSearch.getResult();

That is an example of a query that made one of our AEM nodes completely unresponsive. We never ran into this issue using AEM 5.6.1, but it has happened a few times with various unicode characters in AEM 6.3. Currently we are working around it by stripping out all non-ASCII characters before doing the search. Removing the unicode characters, the queries are very fast.

Is there a permanent fix for this?

Edit: Sort of off topic, if anyone can let me know how to format the code so it shows on multiple lines I would appreciate it.

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.

Best answer by kautuk_sahni

Hi,

Please have a look at SimpleSearch ("The Adobe AEM Quickstart and Web Application.")

void setQuery(String query), accepts String type. Try unicode characters directly in strings in the code, by escaping the with \u.

// The danish letters Æ Ø Å

String myString = "\u00C6\u00D8\u00C5" ;

kautuk_sahni

Accepted solution

Community Manager

Hi,

Please have a look at SimpleSearch ("The Adobe AEM Quickstart and Web Application.")

void setQuery(String query), accepts String type. Try unicode characters directly in strings in the code, by escaping the with \u.

// The danish letters Æ Ø Å

String myString = "\u00C6\u00D8\u00C5" ;

Kautuk Sahni

15473203Author

Level 2

That does seem to work better, but how would I achieve this without a String literal? This is input that is coming in from a http query parameter so I can't just use a String literal like that.

K

Kunwarsaluja

Adobe Employee

Can you enable debug logs for the search and help me with the query that is generated at the end? Maybe we can somehow tune the index definitions and make it better performing ?

15473203Author

Level 2

I was thinking the unicode literal encoding was helping, but I later found that it was doing some weird String conversions and ended up just turning the unicode characters to question marks, so it really wasn't helping like I thought it was.

This particular search seems to break AEM in mutliple ways. One being the unicode characters, the other seems to be just the sheer number of words in the search. Even after removing the unicode characters I have to knock it down to "Knights of Columbus KofC TwitterPlan de Mobilit d Entreprise - PDFAllt om Bilar Sveriges strsta motorsajt Expressen Allt om BilarStandard Bank" before the search doesn't take down the instance.

16.05.2018 09:49:15.019 *INFO* [127.0.0.1 [1526482153050] GET /content/website/Search.html HTTP/1.1] com.day.cq.search.ext.impl.SimpleSearchImpl SimpleSearch is searching with the types: [cq:Page, dam:Asset] 16.05.2018 09:49:15.023 *DEBUG* [127.0.0.1 [1526482153050] GET /content/website/Search.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl executing query (URL): 13_group.group.1_path=%2fcontent%2fwebsite&13_group.group.p.or=true&14_group.2_group.1_path=%2fcontent%2fdam%2fotherwebsite%2fcenter%2fresource&14_group.2_group.1_path.self=true&14_group.2_group.2 _path=%2fcontent%2fotherwebsite%2fportal%2fauthenticated&14_group.2_group.2_path.self=true&14_group.2_group.p.not=true&14_group.2_group.p.or=true&15_group.p.not=true&15_group.primaryType=jcr%3acontent%2fjcr% 3aprimaryType&15_group.primaryType.value=nt%3aunstructured&16_group.hideInNav=jcr%3acontent%2fhideInNav&16_group.hideInNav.value=true&16_group.p.not=true&group.0_fulltext=Knights%20of%20Columbus%20(%40KofC)%20%7 c%20TwitterPlan%20de%20Mobilit%c3%a9%20d%20Entreprise%20-%20PDFAllt%20om%20Bilar%20%e2%80%93%20Sveriges%20st%c3%b6rsta%20motorsajt%20%7c%20Expressen%20%7c%20Allt%20om%20BilarStandard%20Bank%20Online%20Banking%20 -%20Search%20Results%20%7c%20TimeErrors%20%7c%20Developers&group.0_fulltext.relPath=&group.1_fulltext=Knights%20of%20Columbus%20(%40KofC)%20%7c%20TwitterPlan%20de%20Mobilit%c3%a9%20d%20Entreprise%20-%20PDFAllt%2 0om%20Bilar%20%e2%80%93%20Sveriges%20st%c3%b6rsta%20motorsajt%20%7c%20Expressen%20%7c%20Allt%20om%20BilarStandard%20Bank%20Online%20Banking%20-%20Search%20Results%20%7c%20TimeErrors%20%7c%20Developers&group.1_fu lltext.relPath=%40jcr%3atitle&group.2_fulltext=Knights%20of%20Columbus%20(%40KofC)%20%7c%20TwitterPlan%20de%20Mobilit%c3%a9%20d%20Entreprise%20-%20PDFAllt%20om%20Bilar%20%e2%80%93%20Sveriges%20st%c3%b6rsta%20mot orsajt%20%7c%20Expressen%20%7c%20Allt%20om%20BilarStandard%20Bank%20Online%20Banking%20-%20Search%20Results%20%7c%20TimeErrors%20%7c%20Developers&group.2_fulltext.relPath=%40jcr%3adescription&group.p.or=true&lan guages=&lastModified.lowerBound=&lastModified.property=jcr%3acontent%2fcq%3alastModified&lastModified.upperBound=&mimeTypes=jcr%3acontent%2fjcr%3amimeType&mimeTypes.value=&nodeTypes.p.or=true&nodeTypes.type=dam% 3aAsset&orderByScore=%40jcr%3ascore&orderByScore.sort=desc&p.excerpt=true&p.limit=10&p.offset=0&path=%2fcontent&tags=&tags.property=jcr%3acontent%2fcq%3atags 16.05.2018 09:49:15.023 *DEBUG* [127.0.0.1 [1526482153050] GET /content/website/Search.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl executing query (predicate tree): ROOT=group: limit=10, offset=0, excerpt=true[     {group=group: or=true[         {0_fulltext=fulltext: fulltext=Knights of Columbus (@KofC) | TwitterPlan de Mobilité d Entreprise - PDFAllt om Bilar – Sveriges största motorsajt | Expressen | Allt om BilarStandard Bank Online Banking -  Search Results | TimeErrors | Developers, relPath=}         {1_fulltext=fulltext: fulltext=Knights of Columbus (@KofC) | TwitterPlan de Mobilité d Entreprise - PDFAllt om Bilar – Sveriges största motorsajt | Expressen | Allt om BilarStandard Bank Online Banking -  Search Results | TimeErrors | Developers, relPath=@jcr:title}         {2_fulltext=fulltext: fulltext=Knights of Columbus (@KofC) | TwitterPlan de Mobilité d Entreprise - PDFAllt om Bilar – Sveriges största motorsajt | Expressen | Allt om BilarStandard Bank Online Banking -  Search Results | TimeErrors | Developers, relPath=@jcr:description}     ]}     {path=path: path=/content}     {languages=language: language=null}     {tags=tagid: property=jcr:content/cq:tags, tagid=null}     {mimeTypes=property: property=jcr:content/jcr:mimeType, value=null}     {lastModified=daterange: property=jcr:content/cq:lastModified, lowerBound=null, upperBound=null}     {orderByScore=orderby: orderby=@jcr:score, sort=desc}     {13_group=group: [         {group=group: or=true[             {1_path=path: path=/content/website}         ]}     ]}     {14_group=group: [         {2_group=group: not=true, or=true[             {1_path=path: path=/content/dam/otherwebsite/center/resource, self=true}             {2_path=path: path=/content/otherwebsite/portal/authenticated, self=true}         ]}     ]}     {15_group=group: not=true[         {primaryType=property: property=jcr:content/jcr:primaryType, value=nt:unstructured}     ]}     {16_group=group: not=true[         {hideInNav=property: property=jcr:content/hideInNav, value=true}     ]}     {nodeTypes=group: or=true[         {type=type: type=cq:Page}         {type=type: type=dam:Asset}     ]} ] 16.05.2018 09:49:15.040 *DEBUG* [127.0.0.1 [1526482153050] GET /content/website/Search.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl xpath query: (/jcr:root/content/website//element(*, cq:Page)[(jcr:contains(., 'Knights of Columbus (@KofC) | TwitterPlan de Mobilité d Entreprise - PDFAllt om Bilar – Sveriges största motorsajt | Expressen | Allt om BilarStandard Bank Online Banking - Search Results | TimeErrors | Developers') or jcr:contains(@jcr:title, 'Knights of Columbus (@KofC) | TwitterPlan de Mobilité d Entreprise - PDFAllt om Bilar – Sveriges största motorsajt | Expressen | Allt om BilarStandard Bank Online Banking - Search Results | TimeErrors | Developers') or jcr:contains(@jcr:description, 'Knights of Columbus (@KofC) | TwitterPlan de Mobilité d Entreprise - PDFAllt om Bilar – Sveriges största motorsajt | Expressen | Allt om BilarStandard Bank Online Banking - Search Results | TimeErrors | Developers')) and not(jcr:content/@jcr:primaryType = 'nt:unstructured') and not(jcr:content/@hideInNav = 'true')] | /jcr:root/content/website//element(*, dam:Asset)[(jcr:contains(., 'Knights of Columbus (@KofC) | TwitterPlan de Mobilité d Entreprise - PDFAllt om Bilar – Sveriges största motorsajt | Expressen | Allt om BilarStandard Bank Online Banking - Search Results | TimeErrors | Developers') or jcr:contains(@jcr:title, 'Knights of Columbus (@KofC) | TwitterPlan de Mobilité d Entreprise - PDFAllt om Bilar – Sveriges största motorsajt | Expressen | Allt om BilarStandard Bank Online Banking - Search Results | TimeErrors | Developers') or jcr:contains(@jcr:description, 'Knights of Columbus (@KofC) | TwitterPlan de Mobilité d Entreprise - PDFAllt om Bilar – Sveriges största motorsajt | Expressen | Allt om BilarStandard Bank Online Banking - Search Results | TimeErrors | Developers')) and not(jcr:content/@jcr:primaryType = 'nt:unstructured') and not(jcr:content/@hideInNav = 'true')])/rep:excerpt(.) order by @jcr:score descending, @jcr:score descending 16.05.2018 09:49:15.058 *DEBUG* [127.0.0.1 [1526482153050] GET /content/website/Search.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl xpath query creation took 33 ms ... 16.05.2018 10:19:47.524 *DEBUG* [127.0.0.1 [1526482153050] GET /content/website/Search.html HTTP/1.1] com.day.cq.search.impl.builder.QueryImpl entire query execution took 1832501 ms

15473203Author

Level 2

Kunwar, have you had a chance to look at this?

15473203Author

Level 2

Kunwar, have you had a chance to look at this?

M

MatthewDr1

We've run into the same problem. In our case the search string was relatively short: классификация+ЮНКТАД .

M

MatthewDr1

We figured out that, in our case, our problem was we were first converting the user's string to a byte array (via UTF-8), and then converting it back to a string (via ascii). The resulting garbage string was what we fed into SimpleSearch. It works if the original string uses ascii characters, but if not, the result is a live-lock of some kind. So probably a jackrabbit/lucene bug, in some sense, but we had bad code that triggered it.

smacdonald2008

Level 10

thanks for posting the information!

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded