Adobe Experience Manager Sites & More

ArunaS · 6/5/23

Hello Everyone,

I'm looking for the best possible options to query AEM repository with query builder Query. I have a requirement to get all the assets from AEM DAM in JSON format. I have a big repository with Large assets of 400k in my AEM DAM instance.

AEM has a configurable limit on the number of nodes that can be visited in a single query run. By default, it is set to 100,000 nodes. I changed in System Console > OSGi Configuration > Apache Jackrabbit Query Engine Settings Service. The parameter name is queryLimitReads to 200k. I used the below query to fetch all assets under my site but the query is unable to run in AEM it throws error: "The query read or traversed more than 200000 nodes. To avoid affecting other tasks, processing was stopped."

p.hits=selective

p.limit=1000

p.offest=0

p.guessTotal=true

type=dam:Asset

path=/content/dam/mysite/

p.properties=jcr: content/cq:name

orderby=@jcr:path

Any help or suggestion on resolving the error would be really helpful.

Any alternative options if you have please let me know.

EstebanBustamante · 6/6/23

YES, it sounds to me that you can actually avoid the query if you really need all the assets' info. So, it is better to traverse the nodes for that, you could use one of the techniques I suggested. See below the best practice suggested in such scenarios

https://experienceleague.adobe.com/docs/experience-manager-65/deploying/practices/best-practices-for...

Esteban Bustamante

View solution in original post

EstebanBustamante · 6/5/23

Always avoid as much as you can query builder, if you need all the assets listed you could safely traverse the repository instead.

Do it in small chunks and using multithreading, if you can deliver small pieces the better otherwise you could assemble small output pieces in a separate operation.

For the multithreading, you could use Bulk Workflow Manager or Managed Controlled Process

Esteban Bustamante

ArunaS · 6/5/23

Thank you @EstebanBustamante.

Do you mean, instead of querying all the assets we need to implement workflow or MCP process to crawl the repository to fetch all the assets?

EstebanBustamante · 6/6/23

YES, it sounds to me that you can actually avoid the query if you really need all the assets' info. So, it is better to traverse the nodes for that, you could use one of the techniques I suggested. See below the best practice suggested in such scenarios

https://experienceleague.adobe.com/docs/experience-manager-65/deploying/practices/best-practices-for...

Esteban Bustamante

ArunaS · 6/8/23

@Thanks @EstebanBustamante Yes, MCP is really cool and works the way we need. I did implement this however it is taking a long time to export results into a spreadsheet. Your messages help me to troubleshoot the issue

ManviSharma · 6/5/23

Hi,

When you encounter the error message "The query read or traversed more than 200000 nodes. To avoid affecting other tasks, processing was stopped," it indicates that the configured query limit of 200,000 nodes has been exceeded. This limit is in place to prevent excessive resource consumption and potential performance issues.

Your use case requires fetching a large number of assets and you have assessed the impact on system resources, you can try increasing the query limit further. However, keep in mind that this can potentially impact system performance, so it's important to test and monitor the system behavior after making such changes.

DPrakashRaj · 6/5/23

I think above issue occurs when the query is not using any indexing and it’s a traversal query. I believe jcr:content/cq:name is not a property in damAssetLucene indexing(available OOTB). If you have to specifically asked to add cq:name in query the you need to add indexes for that property in oak indexes. Once created the indexing you need to run the indexing so that it completes it then you should be able to get the query result.

you can also try with query that uses indexes. Aem provides you the console to check how optimize your query is. You can check there if it’s using any indexes or not

aanchal-sikka · 6/5/23

Hello @ArunaS

There are multiple areas where an improvement can be made:

Indexes: Assure that "jcr: content/cq:name" is indexed
Consider removing orderby, if you don't need it.
Use tree traversal to generate results. Since, you not filtering results, a tree traversal might be more apt.
- Then you can avoid touching other system params and affecting system performance in long run
The Json will be huge. Once you are able to compile the data, you will hit the block on the size of json returned. Consider breaking Json into multiple chunks. Then you would also be able to break query also in chunks
Caching: Consider caching the json, if this query is generated frequently.

Aanchal Sikka

Vaibhavi_J · 6/6/23

Hi @ArunaS ,

Make queries more efficient with indexing. Also, you can monitor the query performance with the below tool.

https://experienceleague.adobe.com/docs/experience-manager-65/deploying/deploying/queries-and-indexi...
https://experienceleague.adobe.com/docs/experience-manager-65/deploying/practices/best-practices-for...

http://localhost:4502/libs/granite/operations/content/diagnosistools/queryPerformance.html

Jörg_Hoh · 6/8/23

Why don't you use the already existing metadata exporter tooling?

https://experienceleague.adobe.com/docs/experience-manager-learn/assets/metadata/metadata-import-exp...

ArunaS · 6/8/23

Thanks, @Jörg_Hoh for the response.

It really helps me to export data into CSV. I'm looking to automate this process instead of manual clicks.

Jörg_Hoh · 6/10/23

If you follow https://experienceleague.adobe.com/docs/experience-manager-cloud-service/content/assets/admin/metada... you should be able to capture the HTTP-Request, analyze its parameters and then call it from externally.

Or do you need the Java API to access this feature?