MongoDB (MSRP) - "Social Data"

Avatar

Avatar

Gdubz-57m2mu

Avatar

Gdubz-57m2mu

Gdubz-57m2mu

16-11-2016

Has anyone at Adobe read this article[1] about MongoDB? Is this not a problem, given that pretty much all use cases for using MongoDB with AEM revolves around "social data"?

Just a guess after looking at what's generated in MongoDB after leaving a test comment on /content/community-components/en/comments.html, it would appear that this suffers from the some of the same problems mentioned in that article...

  1. You need to perform a lookup of the "authorizableId_t" or "author_username" (whichever is the most consistent) if you want more information about that user.
  2. You are storing "author_display_name" on the comments as well, so maybe you don't need to do any extra lookups when displaying comments, but if that display name changes for any reason, that's now wrong for this comment. If the user left thousands of comments, they would all need to be updated?

Also looks like an activity stream document is generated and stored in MongoDB as well when you comment, so that's yet another piece of data that's got all information about that comment duplicated and stored within MongoDB. Though I suppose this one isn't such a big deal, given that it's more of a notification/snapshot of a specific comment at a specific time.

I can only imagine a lot of the "join" work, putting together all of the data is being done by Java? Is this not that much of a concern given everything revolves around a user session/resolver, so that information isn't terribly difficult to lookup?

I'm just concerned before we dive in head first to MSRP, what sort of headaches we'll have down the road.

[1] http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/

View Entire Topic

Avatar

Avatar

Gdubz-57m2mu

Avatar

Gdubz-57m2mu

Gdubz-57m2mu

21-11-2016

dwalling wrote...

I hadn't read this article until you pointed it out, but its discussing well-known concepts IMO

Regarding MSRP, the author_display_name field is deprecated and no longer being used. It will likely be removed in a future update. The reason is that architecturally we use the authorizableId as the handle to the user info, and can separately handle PII information that might be in a user profile separate from the UGC which doesn't have PII. The authorizableId is the link between the two, and we do this consistently for MSRP and ASRP and future SRPs. This allows UGC to be stored in remote locations such as ASRP while mitigating concerns about off-site storage of PII. Performance concerns are mitigated by using a simple application-level cache inside of AEM.

What sort of headaches do you anticipate?

 

I'm working on my first project to incorporate the Communities codebase and unfortunately it isn't a new project. My team is converting a lot of custom-code components to store content in MSRP now (instead of forward/reverse replicating everywhere) and just trying to learn as much as I can. Came across that article and it had me a little worried, primarily because I don't know much about databases.

Thank you so much for the info about the authorizableId and the bit about PII makes complete sense, we wouldn't want to store anything there either, even though we're going to be hosting our own database servers.

So when storing a custom map of properties and property values, we should be using PropConstants.AUTHORIZABLE_ID (com.adobe.granite.security.user.util.PropConstants) instead of CollabUser.PROP_NAME (com.adobe.cq.social.ugcbase.CollabUser)? I ask because in most (if not all) of the AEM SCF examples on github, they're using CollabUser.PROP_NAME. Then again, a lot of that code is already a little outdated, some of it even deprecated. 

As far as headaches go, nothing in particular comes to mind. We just want whatever we end up using (or developing) to perform well, which is most likely going to be the case, given that we're currently heavily using forward/reverse replication (which is awful).