Hello Everyone,
I'm looking for the best possible options to query AEM repository with query builder Query. I have a requirement to get all the assets from AEM DAM in JSON format. I have a big repository with Large assets of 400k in my AEM DAM instance.
AEM has a configurable limit on the number of nodes that can be visited in a single query run. By default, it is set to 100,000 nodes. I changed in System Console > OSGi Configuration > Apache Jackrabbit Query Engine Settings Service. The parameter name is queryLimitReads to 200k. I used the below query to fetch all assets under my site but the query is unable to run in AEM it throws error: "The query read or traversed more than 200000 nodes. To avoid affecting other tasks, processing was stopped."
p.hits=selective
p.limit=1000
p.offest=0
p.guessTotal=true
type=dam:Asset
path=/content/dam/mysite/
p.properties=jcr: content/cq:name
orderby=@jcr:path
Any help or suggestion on resolving the error would be really helpful.
Any alternative options if you have please let me know.
Solved! Go to Solution.
Views
Replies
Total Likes
YES, it sounds to me that you can actually avoid the query if you really need all the assets' info. So, it is better to traverse the nodes for that, you could use one of the techniques I suggested. See below the best practice suggested in such scenarios
Always avoid as much as you can query builder, if you need all the assets listed you could safely traverse the repository instead.
Do it in small chunks and using multithreading, if you can deliver small pieces the better otherwise you could assemble small output pieces in a separate operation.
For the multithreading, you could use Bulk Workflow Manager or Managed Controlled Process
Thank you @EstebanBustamante.
Do you mean, instead of querying all the assets we need to implement workflow or MCP process to crawl the repository to fetch all the assets?
YES, it sounds to me that you can actually avoid the query if you really need all the assets' info. So, it is better to traverse the nodes for that, you could use one of the techniques I suggested. See below the best practice suggested in such scenarios
@Thanks @EstebanBustamante Yes, MCP is really cool and works the way we need. I did implement this however it is taking a long time to export results into a spreadsheet. Your messages help me to troubleshoot the issue
Hi,
When you encounter the error message "The query read or traversed more than 200000 nodes. To avoid affecting other tasks, processing was stopped," it indicates that the configured query limit of 200,000 nodes has been exceeded. This limit is in place to prevent excessive resource consumption and potential performance issues.
Your use case requires fetching a large number of assets and you have assessed the impact on system resources, you can try increasing the query limit further. However, keep in mind that this can potentially impact system performance, so it's important to test and monitor the system behavior after making such changes.
I think above issue occurs when the query is not using any indexing and it’s a traversal query. I believe jcr:content/cq:name is not a property in damAssetLucene indexing(available OOTB). If you have to specifically asked to add cq:name in query the you need to add indexes for that property in oak indexes. Once created the indexing you need to run the indexing so that it completes it then you should be able to get the query result.
you can also try with query that uses indexes. Aem provides you the console to check how optimize your query is. You can check there if it’s using any indexes or not
Hello @ArunaS
There are multiple areas where an improvement can be made:
Hi @ArunaS ,
Make queries more efficient with indexing. Also, you can monitor the query performance with the below tool.
https://experienceleague.adobe.com/docs/experience-manager-65/deploying/deploying/queries-and-indexi...
https://experienceleague.adobe.com/docs/experience-manager-65/deploying/practices/best-practices-for...
http://localhost:4502/libs/granite/operations/content/diagnosistools/queryPerformance.html
Why don't you use the already existing metadata exporter tooling?
If you follow https://experienceleague.adobe.com/docs/experience-manager-cloud-service/content/assets/admin/metada... you should be able to capture the HTTP-Request, analyze its parameters and then call it from externally.
Or do you need the Java API to access this feature?