Expand my Community achievements bar.

Don’t miss the AEM Skill Exchange in SF on Nov 14—hear from industry leaders, learn best practices, and enhance your AEM strategy with practical tips.
SOLVED

Is it recommended to store huge datasets in AEM ?

Avatar

Level 3

Hi Team , 

Is it recommended to store huge datasets - non-web content into AEM (around 20000 records) of data or it is recommended to use external database for storing and retrieving the data through RestFul service calls.

Can you please provide pros and cons for both the approaches ? 


Regards, 
Akash

1 Accepted Solution

Avatar

Correct answer by
Community Advisor

@akashkriz005 

Sharing few related excerpts:

Oak scales to large number of direct child nodes of a node as long as those are not orderable. For orderable child nodes Oak keeps the order in an internal property, which will lead to a performance degradation when the list grows too large. For such scenarios Oak provides the oak:Unstructured node type, which is equivalent to nt:unstructured except that it is not orderable.

 

Reference: https://jackrabbit.apache.org/oak/docs/dos_and_donts.html

 

  • For the aspect of reading nodes there is no impact on performance. But, if the content is ordered, the time to add/remove nodes will degrade. Also, when you use UI to browse large number of child nodes, it would be slow due to browsers and Javascript to render it

Ref: https://cqdump.joerghoh.de/2015/07/09/1000-nodes-per-folder-and-oak-orderable-nodes/ 

 

 

 


Aanchal Sikka

View solution in original post

5 Replies

Avatar

Community Advisor

@akashkriz005

 The immediate issue with storing anything like this is going to be data organization.

Based on Adobe's recommendations, if you have more than 1000 immediate child nodes underneath a single parent node, you will start experiencing performance issues when you work with such data. You will have to take care of the organization of this 20000 records so that you don't end up in the mentioned situation. 

The other issue would be scalability issues with this organization. AEM's repository may not scale as efficiently as dedicated databases for handling large volumes of data.

Upside to storing this data in AEM would be that you will have access to AEM features like versioning, workflows, and permissions that can be leveraged for this data. Also, you in this case you don't need to worry about building any Rest services and integrating them with AEM. Since, everything is going to be in AEM, you will be saving on network calls that you would otherwise see  in case of using Rest services.

Hope this helps.

 

 

 

Avatar

Level 3

Thanks for the info @Harwinder-singh .

Do we have any supporting documents from Adobe mentioning the same scenario like if more than 1000 immediate child nodes underneath a single parent node there will be an issue with scalability and other issues ? 

Cheers !

Avatar

Correct answer by
Community Advisor

@akashkriz005 

Sharing few related excerpts:

Oak scales to large number of direct child nodes of a node as long as those are not orderable. For orderable child nodes Oak keeps the order in an internal property, which will lead to a performance degradation when the list grows too large. For such scenarios Oak provides the oak:Unstructured node type, which is equivalent to nt:unstructured except that it is not orderable.

 

Reference: https://jackrabbit.apache.org/oak/docs/dos_and_donts.html

 

  • For the aspect of reading nodes there is no impact on performance. But, if the content is ordered, the time to add/remove nodes will degrade. Also, when you use UI to browse large number of child nodes, it would be slow due to browsers and Javascript to render it

Ref: https://cqdump.joerghoh.de/2015/07/09/1000-nodes-per-folder-and-oak-orderable-nodes/ 

 

 

 


Aanchal Sikka

Avatar

Community Advisor

@akashkriz005 This is something that we ran into with our existing data model in AEM as cloud service recently.

When we contacted the Adobe support, not having more than 1000 direct childs was the recommendation.

while this may not be a problem when reading the nodes data, it will impact query performance when you are trying to invoke a write operation in this content tree. 

Avatar

Administrator

Worth Checking this:



Kautuk Sahni