Level 2

Solved

Best ways to query large AEM repository with query builder

Forum|Forum|2 years ago
June 5, 2023
6 replies
3859 views

Hello Everyone,

I'm looking for the best possible options to query AEM repository with query builder Query. I have a requirement to get all the assets from AEM DAM in JSON format. I have a big repository with Large assets of 400k in my AEM DAM instance.

AEM has a configurable limit on the number of nodes that can be visited in a single query run. By default, it is set to 100,000 nodes. I changed in System Console > OSGi Configuration > Apache Jackrabbit Query Engine Settings Service. The parameter name is queryLimitReads to 200k. I used the below query to fetch all assets under my site but the query is unable to run in AEM it throws error: "The query read or traversed more than 200000 nodes. To avoid affecting other tasks, processing was stopped."

p.hits=selective

p.limit=1000

p.offest=0

p.guessTotal=true

type=dam:Asset

path=/content/dam/mysite/

p.properties=jcr: content/cq:name

orderby=@jcr:path

Any help or suggestion on resolving the error would be really helpful.

Any alternative options if you have please let me know.

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.

Best answer by EstebanBustamante

YES, it sounds to me that you can actually avoid the query if you really need all the assets' info. So, it is better to traverse the nodes for that, you could use one of the techniques I suggested. See below the best practice suggested in such scenarios

https://experienceleague.adobe.com/docs/experience-manager-65/deploying/practices/best-practices-for-queries-and-indexing.html?lang=en

EstebanBustamante

Community Advisor and Adobe Champion

Always avoid as much as you can query builder, if you need all the assets listed you could safely traverse the repository instead.

Do it in small chunks and using multithreading, if you can deliver small pieces the better otherwise you could assemble small output pieces in a separate operation.

For the multithreading, you could use Bulk Workflow Manager or Managed Controlled Process

Esteban Bustamante

A

ArunaSAuthor

Level 2

Thank you @estebanbustamante.

Do you mean, instead of querying all the assets we need to implement workflow or MCP process to crawl the repository to fetch all the assets?

EstebanBustamante

Accepted solution

Community Advisor and Adobe Champion

YES, it sounds to me that you can actually avoid the query if you really need all the assets' info. So, it is better to traverse the nodes for that, you could use one of the techniques I suggested. See below the best practice suggested in such scenarios

https://experienceleague.adobe.com/docs/experience-manager-65/deploying/practices/best-practices-for-queries-and-indexing.html?lang=en

Esteban Bustamante

ManviSharma

Adobe Employee

Hi,

When you encounter the error message "The query read or traversed more than 200000 nodes. To avoid affecting other tasks, processing was stopped," it indicates that the configured query limit of 200,000 nodes has been exceeded. This limit is in place to prevent excessive resource consumption and potential performance issues.

Your use case requires fetching a large number of assets and you have assessed the impact on system resources, you can try increasing the query limit further. However, keep in mind that this can potentially impact system performance, so it's important to test and monitor the system behavior after making such changes.

DPrakashRaj

Community Advisor

I think above issue occurs when the query is not using any indexing and it’s a traversal query. I believe jcr:content/cq:name is not a property in damAssetLucene indexing(available OOTB). If you have to specifically asked to add cq:name in query the you need to add indexes for that property in oak indexes. Once created the indexing you need to run the indexing so that it completes it then you should be able to get the query result.

you can also try with query that uses indexes. Aem provides you the console to check how optimize your query is. You can check there if it’s using any indexes or not

aanchal-sikka

Community Advisor

Hello @arunas

There are multiple areas where an improvement can be made:

Indexes: Assure that "jcr: content/cq:name" is indexed
Consider removing orderby, if you don't need it.
Use tree traversal to generate results. Since, you not filtering results, a tree traversal might be more apt.
- Then you can avoid touching other system params and affecting system performance in long run
The Json will be huge. Once you are able to compile the data, you will hit the block on the size of json returned. Consider breaking Json into multiple chunks. Then you would also be able to break query also in chunks
Caching: Consider caching the json, if this query is generated frequently.