Expand my Community achievements bar.

Guidelines for the Responsible Use of Generative AI in the Experience Cloud Community.

Boost configuration for jcr-sql2 querying with lucene index

Avatar

Level 5

Good morning

 

I am modifying a query to boost results on a pages when a "Master tag" is found in cq:tags or a masterTag property. I am running 6.5.14.0

 

My (simplified) query is as follows:

 

 

 

Select * FROM [cq:Page] AS story WHERE ISDESCENDANTNODE(story,'/content/test-uc/news/articles')  AND( CONTAINS(story.[jcr:content/cq:tags],'"blog:college-prep/paying-for-college"^2 OR "blog:college-prep"^5 OR "blog:college-prep/choosing-a-major"^2') OR CONTAINS(story.[jcr:content/masterTag],'"blog:college-prep"^10') )ORDER BY story.[jcr:score] desc 

 

 

 

 

I have an index for these pages, and have checked the query performance diagnostic to ensure that it is using this index. I have also reindexed several times to ensure that the index us up to date with the config.

 
newsSearch: {
  "jcr:primaryType": "oak:QueryIndexDefinition",
  "compatVersion": 2,
  "includedPaths": [
    "/content/uc/news/articles",
    "/content/test-uc/news/articles"
  ],
  "name": "News Search",
  "seed": 7725922112693290843,
  "type": "lucene",
  "async": "async",
  "evaluatePathRestrictions": true,
  "reindex": false,
  "reindexCount": 5,
  "indexRules": {
    "jcr:primaryType": "nt:unstructured",
    "cq:Page": {
      "jcr:primaryType": "nt:unstructured",
      "includePropertyTypes": "all",
      "properties": {
        "jcr:primaryType": "nt:unstructured",
        "jcrTitle": {
          "jcr:primaryType": "nt:unstructured",
          "ordered": false,
          "propertyIndex": true,
          "analyzed": true,
          "name": "jcr:content/jcr:title",
          "type": "String"
          },
        "author": {
          "jcr:primaryType": "nt:unstructured",
          "ordered": true,
          "propertyIndex": true,
          "name": "jcr:content/author",
          "type": "String"
          },
        "isEmergency": {
          "jcr:primaryType": "nt:unstructured",
          "nullCheckEnabled": true,
          "ordered": false,
          "propertyIndex": true,
          "name": "jcr:content/isEmergency",
          "type": "Boolean"
          },
        "isEvent": {
          "jcr:primaryType": "nt:unstructured",
          "nullCheckEnabled": true,
          "ordered": false,
          "propertyIndex": true,
          "name": "jcr:content/isEvent",
          "type": "Boolean"
          },
        "hideInNav": {
          "jcr:primaryType": "nt:unstructured",
          "nullCheckEnabled": true,
          "ordered": false,
          "propertyIndex": true,
          "name": "jcr:content/hideInNav",
          "type": "Boolean"
          },
        "slingresourceType": {
          "jcr:primaryType": "nt:unstructured",
          "ordered": false,
          "propertyIndex": true,
          "name": "jcr:content/sling:resourceType",
          "type": "String"
          },
        "nodeName": {
          "jcr:primaryType": "nt:unstructured",
          "ordered": true,
          "propertyIndex": true,
          "analyzed": true,
          "name": ":nodeName",
          "type": "String"
          },
        "dispDate": {
          "jcr:primaryType": "nt:unstructured",
          "ordered": true,
          "propertyIndex": true,
          "name": "jcr:content/dispDate",
          "type": "Date"
          },
        "cqModified": {
          "jcr:primaryType": "nt:unstructured",
          "ordered": false,
          "propertyIndex": true,
          "name": "jcr:content/jcr:lastModified",
          "type": "Date"
          },
        "cqTags": {
          "jcr:primaryType": "nt:unstructured",
          "nodeScopeIndex": true,
          "ordered": false,
          "propertyIndex": true,
          "analyzed": true,
          "name": "jcr:content/cq:tags",
          "type": "String"
          },
        "masterTag": {
          "jcr:primaryType": "nt:unstructured",
          "nodeScopeIndex": true,
          "ordered": false,
          "propertyIndex": true,
          "analyzed": true,
          "name": "jcr:content/masterTag",
          "type": "String"
          },
        "onTime": {
          "jcr:primaryType": "nt:unstructured",
          "ordered": false,
          "propertyIndex": true,
          "name": "jcr:content/onTime",
          "type": "Date"
          },
        "offTime": {
          "jcr:primaryType": "nt:unstructured",
          "ordered": false,
          "propertyIndex": true,
          "name": "jcr:content/offTime",
          "type": "Date"
          }
        }
      }
    }
  }

When I run the query, it finds the correct set of stories based on CONTAINS() but none of the boosts in teh query are applied. All of them end up with the same score, as in this screenshot of the list component with my debug data exposed

B_Stockwell_0-1671030936570.png

Based on the query above, with "College Prep" being queried as the masterTag, I believe that the #2 story should appear first (in addition to the scores being different).

I have never used boosting before, so I am thinking this is developer error, but I can't determine what I'm missing. My index has analyzed and nodeScopeIndex set to true for both cq:tags and masterTag, and explain query shows that fulltextqueries are read agains the CONTAINS constraints in the query.

As an alternative to in-query boosting, I have also tried boosting the properties on the index, as well, but no dice. I've combed through just about every piece of documentation and discussion from the last decade on this, and I've hit a wall. I would appreciate some advice.

3 Replies

Avatar

Level 5

I have made a little progress with this this morning.

 

I found that the reported 0.01 score was incorrect. For my debug output, I was reading this using row.getScore() to get this value, but Oak's does not implement this yet (https://github.com/apache/jackrabbit-oak/blob/c6ddcc55bee3de915459af01e91edad32d538f3d/oak-jcr/src/m...)

I modified my query to:

"SELECT story.[jcr:score]...."

and I modified how I read the score by using

row.getValue("story.jcr:score).getDouble();


This didn't affect the order, but I do get a more complete output at least:

B_Stockwell_0-1671039243512.png

At this point, I have tried modifying the boost's on the CONTAINS options, but this does not cause the score values to change at all, so I am thinking they don't have any effect on thing as I have them currently configured.

Avatar

Community Advisor

Hi,

I think boosting the query requires : 

 

It is also possible to configure a boost value for the nodes that match the index rule. The default boost value is 1.0. Higher boost values (a reasonable range is 1.0 - 5.0) will yield a higher score value and appear as more relevant.

<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.0.dtd">
<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
  <index-rule nodeType="nt:unstructured"
              boost="2.0">
    <property>Text</property>
  </index-rule>
</configuration>

 

If you do not whish to boost the complete node but only certain properties you can also provide a boost value for the listed properties:

<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.0.dtd">
<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
  <index-rule nodeType="nt:unstructured">
    <property boost="3.0">Title</property>
    <property boost="1.5">Text</property>
  </index-rule>
</configuration>

 

I am no expert in query but sharing some inputs.



Arun Patidar

Avatar

Level 5

Thanks, I'm getting closer.

 

I took a detour into trying JCR-JQOM, but it doesn't seem like that library is maintained (and really needs factory methods like factory.and(Constraint... constraints), thats for a different discussion).

Anyway, here's where I landed with my query and index, this is good enough, as it returns the stories im looking for in the top few, even if they aren't the exact order I want them in. adding [oak:scoreExplanation] to my query helped me understand what was happening.

 

https://gist.github.com/stockwellbm/2d8e089444e1b7302bba14c60e561d52

 

I found that my scoring was actually working, but the first result was hitting a partial match on my master tag, and since it only has 2 tags, it was weighted way higher than one with an exact match with 3 tags. I need this fuzzyiness becasue I need to have pages with 5-10 related stories, based mostly on the tags (so for a query with story.tag=a,b,c result may only returns 3 items, but I have 5 spaces to fill in the front end). I could accomplish the correct ordering of my stories with IN constraints, but I'd have to do a union with another query if it doesn't return enough results.

If anyway has an suggestios on how to fine tune what's in my gist, then I'm all ears. Thanks