Expand my Community achievements bar.

SOLVED

Optimizing Retrieving of Nodes from Iterator after Query

Avatar

Level 3

Hi Everyone,

We are in a situation where we have to perform up to a 1000 queries per request looking for asset nodes. The queries we use are similar to this one:

SELECT * FROM [dam:Asset] As s WHERE ISDESCENDANTNODE ([/content/dam/msi-dam]) AND s.[jcr:content/metadata/cq:productReference] IN ("/etc/commerce/products/msi/smo/smot-wqsav-hon10mm")

Using an index the query execution seems to be quick (1-2ms). Here's what the query performance monitor shows for these queries:

Query execution time: 2 ms

Get nodes time: 992 ms

Result node count time: 2421 ms

Number of nodes in result: 2

When the time comes to work with the nodes returned as result the things get slow. Here's an example of our code:

QueryResult result = query.execute();

@SuppressWarnings("unchecked")

Iterator<Node> iterator = result.getNodes();

while (iterator.hasNext()) {

       Node assetNode = iterator.next();

       String path = assetNode.getPath();

       if (!assetsUrls.contains(path)) {

            assetsUrls.add(path);

       }

}

The two lines of code is what slows down the whole process to a little more that 1s for each iteration of this code. This eventually will result in spending around 1200s for the whole process when we need to have 1000 queries. We understand that there are some limitations when accessing the repository this way but are really trying to find a way to optimize the process. In another functionality we build, we were able have one long query for all the 1000 things we are searching for but we cannot implement it here unless there's a way to retrieve the number of resulting nodes without getting the nodes themselves. This might be very helpful, as well.

My questions here would be, is there a way to optimize this whole process and also is there a way to retrieve the number of resulting nodes without accessing them?

We are using AEM 6.3.2.1

Thank you very much for your help in advance,

Bobby

1 Accepted Solution

Avatar

Correct answer by
Employee Advisor

The query engine works lazy, that means it doesn't load all results directly when doing the request, but only when you explicitly request it (via the iterator). This explains why the nodeIterator.next() call actually is an operation which can take a significant amount of time.

1000 queries per request will never perform well, you should really change your approach! It looks like your query can be changed quite easily into a traversal, which performs probably much better.

But even then I would definitely think about the content model. Looking at the query, assets will link to products, and it seems to me that you want to display all assets which belong to a product. Why don't you add the references to the product and link all assets from there (actually reversing the relation)? Or create a folder for each product in DAM and place the related assets there. If you build your content model in a clever way, you can often reduce the amount of searching or querying and replace it rather by direct lookups at known locations.

Jörg

View solution in original post

4 Replies

Avatar

Community Advisor

Hi,

I am not sure if below is gonna help or not but you can try :

NodeIterator searchResults = query.execute().getNodes();

  if(searchResults != null) {

    while (searchResults.hasNext()) {

        String path = searchResults.nextNode().getPath();

        if (!assetsUrls.contains(path)) {

               assetsUrls.add(path);

        }

      }

  }



Arun Patidar

Avatar

Correct answer by
Employee Advisor

The query engine works lazy, that means it doesn't load all results directly when doing the request, but only when you explicitly request it (via the iterator). This explains why the nodeIterator.next() call actually is an operation which can take a significant amount of time.

1000 queries per request will never perform well, you should really change your approach! It looks like your query can be changed quite easily into a traversal, which performs probably much better.

But even then I would definitely think about the content model. Looking at the query, assets will link to products, and it seems to me that you want to display all assets which belong to a product. Why don't you add the references to the product and link all assets from there (actually reversing the relation)? Or create a folder for each product in DAM and place the related assets there. If you build your content model in a clever way, you can often reduce the amount of searching or querying and replace it rather by direct lookups at known locations.

Jörg

Avatar

Level 3

Hi Jorg,

Thank you very much for your reply. I was wondering if there's a way to retrieve the number of results without retrieving the nodes and thus speed it up a bit.

We cannot unfortunately display all assets related to a product directly because actually there could be specific asset criteria added to that query that I excluded in the example, but can you give us more details on how to reference assets to products. Is it by adding a property like cq:productReference that holds the reverse relation?

Thank you again,

Bobby

Avatar

Employee Advisor

As said, the query engine is lazy, that means you have to iterate through all the result set to get the exact number.

Regarding your usecase, your requirement is not a scenario unique to commerce. It's rather a pretty common case where you have a 1:n relation between entities. In the JCR world you would use the hierarchy and place all references as siblings belong a common parent. When dealing with products this could look like this:

* /cotent/products/productA

* /content/dam/products/productA

In this case it's quite obvious where you need to search for the assets for productA. Of course this is a very simplified version and it's unlikely that you can apply it directly to your case. But I hope it illustrates that you have more options than just querying the repository to find your references.

A more complex could be, that it is not a 1:n relation, but rather a n:m relation (assets are shared between multiple products). That's much harder to implement in an efficient way in the JCR, and maybe the only way to implement is using queries.

(But in the end that's probably a very efficient of misusing AEM ...)

Jörg