I'm supposed to fetch data from a REST endpoint and converting it into AEM nodes*. Basically there are multiple categories(max 10) and every category has approx 1000-1200+ objects(which i need to convert to JCR nodes). I have read that I shouldn't be saving more than 1000 nodes under one node and that is my major concern here. Once I save the data in AEM(nodes), I'm supposed to do a fulltext search and I need to incorporate filters(approx 7) along with it:
Could someone please help me defining the best architecture for this requirement? I'm going to have a total of ~3000 nodes in the beginning(in all the categories included), the number may increase going forward(won't exceed ~10000).
Should I save the nodes in AEM in the first place? or think about storing this data in any external DB?
If I save this in JCR nodes, how do I make sets of 1000 nodes(or less than that) for every category. What would be the most efficient way to save and utilize this data?
As you mentioned in your question, my first concern would be to determine if the JCR is really the best place for this kind of data. If it's not AEM-related, then it should probably be stored elsewhere I think.
Judging by your mockup, it seems like you want to use AEM to create an insterface for some CRUD actions or something like that? To me this seems more like a job for SQL, Mongo or another external database.
JCR is used in many CMS solutions, not just AEM (eg: Sitecore, Bloomreach, etc.) for a number of reasons, but one of them is hidden in the name: JCR = Java Content Repository. It's the fact that JCR models hierarchical content that makes it such a good fit to create a web content repository 🙂 Just like a relational DB (Oracle) would be a great way to model a library's book collection or a graph DB (Neo4J) would be great to model a social network. So if the data you want to import is not hierarchical in nature and not related to AEM itself, then I would put it in a different place.
Regarding performance, there is no limit to the number of nodes you can store as children but there will be performance issues if you try to manipulate large numbers of nodes, especially if they are ordered. Reading is not so problematic. You can find a more detailed explanation and a some benchmark figures here.