Expand my Community achievements bar.

Guidelines for the Responsible Use of Generative AI in the Experience Cloud Community.
SOLVED

AEM support for 1000+ jcr nodes - user profile/products

Avatar

Level 5

Please share your thoughts on these questions

1) I see recommendation on JCR content models to keep child nodes limited under 1000 nodes . Is the same applicable for JCR user profile nodes as well  ? if there is a need to support an public site which can have 300K end users ? Does AEM support 300K JCR user nodes (keeping 1000+ jcr node limit and better performance) assuming all these users are provisioned either all at once (old users) or created after custom authentication  (new users) ?

2) Also can we assume this limitation (JCR 1000 + node) is irrespective of TarMK or MongoMk storage considering limitation is on JCR API based content access which is common for both TarMK or MongoMK ?

3) Also if we need to use Personalization / AEM Communities with end user ACL , assuming that these user profiles needs to be reverse replicated to author and synchronized across publish clusters. Is there any possible / known risks in maintaining end user profiles in AEM ?

4) If there is a limitation please let know what is the maximum / possible user profile nodes /content (product) nodes support in AEM with CRX2/CRX3 - OAK Repository ?

3) In case we need to support e-commerce portal what is the limitation on product nodes and possible risks assuming author syncs with a PIM to import / create product nodes which can be more than 150 K ?

4) Even if assume that Product and UGC can be bucketed based on some time stamp ? How to decide on 300k user profile nodes which might have different possibilities and might not fit into a particular pattern buckets?

5) Overall looking for some best practices on end user profile nodes and product data nodes within in AEM ?

 
1 Accepted Solution

Avatar

Correct answer by
Employee Advisor

Please consider, that these numbers are ok with the repository itself. But when you have 300k+ users, you need to have the right user interface to manage these massive numbers. And that's a case, for which the AEM /useradmin isn't really designed for.

Jörg

View solution in original post

13 Replies

Avatar

Employee
  1. OAK should be able to manage 300K users . By default, it will create 300K users in 64 sub trees. So it wil be some 5000 users in a folder.for ex.,(/home/users/(A-Z,a-z) etc.,
  2. This 1000 node limitation is only if the parent node can have only ordered child nodes. This is irrespective of tarmk or mongomk. The rep:authorizable fodler(the immediate folder under/home/users ) can have unordered child nodes. So it doesnt have that problem
  3. As always, do not store any PII info. Do not store and sensitive info.
  4. As mentioned in 2., as long as the folder can have unordered child nodes(for ex., nt:unstructured will only have ordered child nodes where as oak:unstructured can have unordered child nodes), we wont have this 1000 node children limitation
  5. Please use unordered node types.

Avatar

Level 5

Hi Kalyan,

Thanks for your inputs few more question to get more clarity 

1) Is there any technical references / links to the nature of unordered child nodes and 1000+ ordered node limitations ?

2) How does 1000 node limitation impact DAM assets and its nodes ?

3) Also are there any specific references maximum node depth supported by JCR both with TarMK/MongoMK kind of load data test references on CRX2/CRX3 ?

4) What is the optimal depth recommended for user profile nodes to have better read and write performance ?

5) Overall looking for some best practices on end user profile nodes and product data nodes within in AEM

 

Avatar

Level 10

I dont think there is any restriction on the node depth ! Node depth is just for better organizing and grouping.

Avatar

Employee Advisor

Hi

1) I haven't found a good source of documentation for this.

2) It mostly affects the performance of the UI. Displaying thousands of assets isn't fun for the browser ...

3) I am not aware of any limits beyond the jcr standard.

kind regards,
Jörg

Avatar

Level 5

 Hi all,

Just got to see this jackrabbit 3 oak goals 

http://wiki.apache.org/jackrabbit/Goals%20and%20non%20goals%20for%20Jackrabbit%203

Which has some details on supporting - not sure if it is current (done) or future (in-progress)

- 10M direct child nodes

- Number of users: 200M / 20M per group

Does this mean the above node statistics is alsosupported in AEM  as it uses OAK ? or it is still a goal kind of in-progress ?

Answer to this might give more clarity AEM support for 300K + Users / Product /Content nodes .

Avatar

Correct answer by
Employee Advisor

Please consider, that these numbers are ok with the repository itself. But when you have 300k+ users, you need to have the right user interface to manage these massive numbers. And that's a case, for which the AEM /useradmin isn't really designed for.

Jörg

Avatar

Employee

As I pointed out, 1000 node limitation is only in the case of ordered children  in OAK.If you dont need to maintain order, you can use nodetypes like oak:unstructured, which allows you to store a large number of children. Also for your usecases of 300K users, oak handles it.

Avatar

Level 5

Hi Thanks,

Also this http://docs.adobe.com/docs/en/aem/6-0/develop/platform/custom-authentication-scenarios.html gives multiple user scenarios . But for millions of users the design speaks about UserManager implementation to Authorization table in DB ? Will a MongoMK be a better alternative solution OOTB for a large end user profile storage and personalization ? considering profile sync handled in MongoDB cluster (data layer) 

Also please share thoughts on product/PIM nodes in jcr - recommended scale /number of product nodes for PIM sync  with e-commerce /PIM importers running in author considering large product catalog imports? also please share if any reference number for large product catalog using PIM importers ?

 

Avatar

Level 1

We come across this question time and again while creating solution for enterprise. Profile management along with their identity. Before i churn out my set of questions for audience, i would love to know what was the decision taken on this?

Avatar

Employee

The 1000 child node limitation was for CRX2, in Oak as Joerg pointed out it is more around it's not a great UX experience to have so many children under a single node. 

As for having 300k users in the repository, is this really necessary? What kind of site is this? For large numbers of users you should consider an external authentication mechanism LDAP or SAML. If the personal details are alwaready stored in a back end system, then leave them there and you can then get personal details after authentication and store these in the browser, allowing you to personalise the content. Rather than persisting the user, you can create the user on authentication and then have a job to delete users that have not logged in for a certain time period. It all comes down to use case, it sounds like this is a theoretical question rather than a specific use case, which makes it hard to give advice without concrete requirements.

Reverse replication for users has been deprecated and you should use SCD to sync users across a TarMK farm.

Regards,

Opkar

Avatar

Level 1

Good points Opkar. What kind of site is this?  This is exact questions that pop in mind when we hear requests from clients / biz.

As Architects/consultants, we always come across question on not using AEM user repositories. Doesn't AEM support user sessions, user management, access permissions, personalization; SAML/OAuth based login handling? YES AEM supports everything mentioned above. In fact for SAML/OAuth based login, user need to be present in Oak core. Core personalization depends on persisted users in repository.  Also, these featurs can be realized using social communities. May be we can tweak platform to pull those features out of social communities but none the less its customization. 

In my mind, AEM is a great WCM and part of greater marketing cloud. But certainly they are not on par with application servers like oracle/JBoss. CRX - TarMK doesn't replace oracle database (RDBMS)/ mongo DB. Though it can play role of service provider in SSO solution using SAML Auth Handler, it can't replace as complete SSO Solutions like IBM/Site Minders of world. They don't really replace ESB for service orchestration. Every engagement starts with demystifying above points and sometimes to an extent of proving points.

Some of demerits that are evident if we choose to manage users in AEM and should be highlighted are

1. Session management across multiple publish nodes

2. User synchronization across multiple repositories

3. Integrating user repositories to DMP for constantly evolving segments (audience manager)

4. Integrating user repositories to Target / Campaign mgmt for personalization

There is no one article on managing users from house of Adobe and recommended practice. I see there are Architects from Adobe who can consolidate recommendations.

Avatar

Level 1

Good points Opkar. What kind of site is this?  This is exact questions that pop in mind when we hear requests from clients / biz.

As Architects/consultants, we always come across question on not using AEM user repositories. Doesn't AEM support user sessions, user management, access permissions, personalization; SAML/OAuth based login handling? YES AEM supports everything mentioned above. In fact for SAML/OAuth based login, user need to be present in Oak core. Core personalization depends on persisted users in repository.  Also, these featurs can be realized using social communities. May be we can tweak platform to pull those features out of social communities but none the less its customization. 

In my mind, AEM is a great WCM and part of greater marketing cloud. But certainly they are not on par with application servers like oracle/JBoss. CRX - TarMK doesn't replace oracle database (RDBMS)/ mongo DB. Though it can play role of service provider in SSO solution using SAML Auth Handler, it can't replace as complete SSO Solutions like IBM/Site Minders of world. They don't really replace ESB for service orchestration. Every engagement starts with demystifying above points and sometimes to an extent of proving points.

Some of demerits that are evident if we choose to manage users in AEM and should be highlighted are

1. Session management across multiple publish nodes

2. User synchronization across multiple repositories

3. Integrating user repositories to DMP for constantly evolving segments (audience manager)

4. Integrating user repositories to Target / Campaign mgmt for personalization

There is no one article on managing users from house of Adobe and recommended practice. I see there are Architects from Adobe who can consolidate recommendations.

Avatar

Level 5

Opkar Gill wrote...

Reverse replication for users has been deprecated and you should use SCD to sync users across a TarMK farm.

 

@Opkar Gill - I took a look at the documentation for User Sync with SCD. How is this different than Forward Replication/Reverse Replication? It doesn't use Replication Agents, so I'm guessing it's closer to the Sling layer and takes fewer resources to process maybe?

Flow-wise it sounds exactly the same, just done through OSGi console configs instead of an admin/authoring interface. It's still using the Author instance (and it's resources that are already being shared with the currently active authors) as the arbiter to distribute content between a horizontally scaled publish farm.

This line in the above documentation kinda scares me away from even trying to use UserSync/SCD:

"With infrequent updates, it is reasonable for user data to be synchronized with other publish instances using Sling Content Distribution (Sling distribution)."

I say that mostly because right now we've got some custom FR/RR agents working to keep 4 publish instances in sync, with 10k+ active users. Trying to keep profile preferences and things in sync is basically killing our single author instance right now, making the content authoring experience horrible. I really hope SCD is a better solution. :(