Your achievements

Level 1

0% to

Level 2

Tip /
Sign in

Sign in to Community

to gain points, level up, and earn exciting badges like the new
BedrockMission!

Learn More

View all

Sign in to view all badges

AEM fulltext search result order

Avatar

Avatar
Validate 1
Level 3
codingStar
Level 3

Likes

10 likes

Total Posts

38 posts

Correct Reply

3 solutions
Top badges earned
Validate 1
Ignite 5
Ignite 3
Ignite 1
Give Back 5
View profile

Avatar
Validate 1
Level 3
codingStar
Level 3

Likes

10 likes

Total Posts

38 posts

Correct Reply

3 solutions
Top badges earned
Validate 1
Ignite 5
Ignite 3
Ignite 1
Give Back 5
View profile
codingStar
Level 3

16-10-2019

I am working on below scenario

  1. We have few PDFs file in dam
  2. Search any keyword  and  if that keyword found in any PDFs then show that in result list

I am able to achieve the above using functionality by using fulltext search in DAM. below is the query

SELECT * FROM [dam:Asset] AS a WHERE CONTAINS(a.*, '" + searchKeyword+ "') AND [jcr:path] like '/content/dam/mywebsitefolder/%'

Now next requirement is

     3. Sort the list of result based on number of occurrence of  "searchKeyword" found in pdfs.

For example : I have 3 pdfs in dam named as mypdf-1.pdf, mypdf-2.pdf, mypdf-3.pdf

PDF NamePDF Content text
mypdf-1.pdfworld
mypdf-2.pdfworld world world
mypdf-3.pdfworld world

If I am searching then result order should be like

/content/dam/mywebsitefolder/mypdf-2.pdf

/content/dam/mywebsitefolder/mypdf-3.pdf

/content/dam/mywebsitefolder/mypdf-1.pdf

Can you please share how should i write the query to get result in above mentioned order?

Accepted Solutions (1)

Accepted Solutions (1)

Avatar

Avatar
Give Back 5
Employee
sunjot16
Employee

Likes

103 likes

Total Posts

164 posts

Correct Reply

50 solutions
Top badges earned
Give Back 5
Give Back 3
Give Back 25
Give Back 10
Give Back
View profile

Avatar
Give Back 5
Employee
sunjot16
Employee

Likes

103 likes

Total Posts

164 posts

Correct Reply

50 solutions
Top badges earned
Give Back 5
Give Back 3
Give Back 25
Give Back 10
Give Back
View profile
sunjot16
Employee

22-10-2019

You can add a Boost in your index rule as follows:

Jackrabbit Oak – Lucene Index

How about adding Search Boost to the AEM Asset too?:

Search Boost

Answers (6)

Answers (6)

Avatar

Avatar
Shape 1
Employee
-ash
Employee

Likes

7 likes

Total Posts

10 posts

Correct Reply

3 solutions
Top badges earned
Shape 1
Boost 5
Boost 3
Boost 1
Affirm 3
View profile

Avatar
Shape 1
Employee
-ash
Employee

Likes

7 likes

Total Posts

10 posts

Correct Reply

3 solutions
Top badges earned
Shape 1
Boost 5
Boost 3
Boost 1
Affirm 3
View profile
-ash
Employee

20-10-2019

Hi,

the default ordering is by relevance... You don‘t have to do anything explicitly.

But „Relevance“ is a bit more elaborately calculated than just counting the word frequency in documents.

The document

TFIDFSimilarity (Lucene 7.6.0 API)

might give you a glimpse on what is happening behind the scenes. There is also a Wikipedia article that explains the very basics

tf–idf - Wikipedia

What you have experienced in your test case might be the normalization: Relevance is not counted by term frequency but by term frequency divided by document length - to give shorter documents a chance to be relevant.

That means, you have a normalized frequency of 1/1, 2/2 and 3/3 which are all equal 1 and thus the order seems random.

If you want to validate the query, I propose you test with real-world examples.

Avatar

Avatar
Shape 1
Employee
-ash
Employee

Likes

7 likes

Total Posts

10 posts

Correct Reply

3 solutions
Top badges earned
Shape 1
Boost 5
Boost 3
Boost 1
Affirm 3
View profile

Avatar
Shape 1
Employee
-ash
Employee

Likes

7 likes

Total Posts

10 posts

Correct Reply

3 solutions
Top badges earned
Shape 1
Boost 5
Boost 3
Boost 1
Affirm 3
View profile
-ash
Employee

21-10-2019

exactly 🙂

Avatar

Avatar
Validate 1
Level 3
codingStar
Level 3

Likes

10 likes

Total Posts

38 posts

Correct Reply

3 solutions
Top badges earned
Validate 1
Ignite 5
Ignite 3
Ignite 1
Give Back 5
View profile

Avatar
Validate 1
Level 3
codingStar
Level 3

Likes

10 likes

Total Posts

38 posts

Correct Reply

3 solutions
Top badges earned
Validate 1
Ignite 5
Ignite 3
Ignite 1
Give Back 5
View profile
codingStar
Level 3

21-10-2019

-ash​ You mean  that I don't need  to write any extra parameter in my query(below is my query) to get result in relevance order from DAM(either its .docx or .pdf file).?

SELECT * FROM [dam:Asset] AS a WHERE CONTAINS(a.*, '" + searchKeyword+ "') AND [jcr:path] like '/content/dam/mywebsitefolder/%'

Avatar

Avatar
Give Back 5
Level 4
Bharath_valse
Level 4

Likes

34 likes

Total Posts

65 posts

Correct Reply

11 solutions
Top badges earned
Give Back 5
Give Back 3
Give Back 25
Give Back 10
Give Back
View profile

Avatar
Give Back 5
Level 4
Bharath_valse
Level 4

Likes

34 likes

Total Posts

65 posts

Correct Reply

11 solutions
Top badges earned
Give Back 5
Give Back 3
Give Back 25
Give Back 10
Give Back
View profile
Bharath_valse
Level 4

18-10-2019

This one's a tricky requirement, I believe this can be achieved via custom predicate [0] where the sorting has to happen based on the number of occurrences(count) of a search term. Here's a another forum [1] somewhat similar but with pages where the requirement was search for occurrence of a search term only twice

Another thought on the requirement it self, relevance is hard to derive based on a single search term. however you could try using use boosts [2] for index similar to below. hope this helps!

jcr:contains(., 'jelly sandwich^4') 
In this example, the word "sandwich" has weight four times more than the word "jelly."

[0]

Implementing a Custom Predicate Evaluator for the Query Builder

[1]

How to use QueryBuilder API to search a keyword a minimum of 2 times in the Page content.

[2]

Use Boosts | Indexing time and query runtime

Avatar

Avatar
Validate 1
Level 3
codingStar
Level 3

Likes

10 likes

Total Posts

38 posts

Correct Reply

3 solutions
Top badges earned
Validate 1
Ignite 5
Ignite 3
Ignite 1
Give Back 5
View profile

Avatar
Validate 1
Level 3
codingStar
Level 3

Likes

10 likes

Total Posts

38 posts

Correct Reply

3 solutions
Top badges earned
Validate 1
Ignite 5
Ignite 3
Ignite 1
Give Back 5
View profile
codingStar
Level 3

18-10-2019

This not working.

let me rephrase my question

I want to show most relevant file on top then so on.

Suppose pdfs contain thousands words and only few word will match with keyword 'world'  then i want to show list in order so i can say that in this list first file have the most matching word.

Avatar

Avatar
Coach
Employee
jbrar
Employee

Likes

380 likes

Total Posts

867 posts

Correct Reply

283 solutions
Top badges earned
Coach
Establish
Give Back 50
Give Back 5
Give Back 3
View profile

Avatar
Coach
Employee
jbrar
Employee

Likes

380 likes

Total Posts

867 posts

Correct Reply

283 solutions
Top badges earned
Coach
Establish
Give Back 50
Give Back 5
Give Back 3
View profile
jbrar
Employee

16-10-2019

I believe the query is using damAssetLucene index. You can add ordered=true property to make the index as ordered

ordered

If the property is to be used in order by

clause to perform sorting then this should be set to true. This should be set to true only if the property is to be used to perform sorting as it increases the index size. Example

  • //element(*, app:Asset)[jcr:contains(type, ‘image’)] order by @size
  • //element(*, app:Asset)[jcr:contains(type, ‘image’)] order by jcr:content/@jcr:lastModified

[1] Jackrabbit Oak – Lucene Index