Managing a large number of collection items

Avatar

Avatar

Aaronius9er9er

Avatar

Aaronius9er9er

Aaronius9er9er

18-06-2009

Hey folks,

I'm just starting an application and I need some help understanding how to architect it.  My application could potentially have thousands of users, so I'm trying to build for it.  Let's say there's User A, User B, User C, User D, etc.  User A may be "following" (the word subscribed could work...but I don't want to get anyone confused) User C, but not User B or User D.  If User A is logged in and then User C logs in, User A needs to retrieve an attribute from User C.  Likewise, User C needs to retrieve an attribute from User A.  User A doesn't care about User B or User D's attribute.  A single user can be following multiple friends.  So User A could follow both User C and User X.  On the other hand, User C and User X aren't required to follow User A in return.

So, right now I have a Users shared collection and when a user logs in he adds himself to the collection.  That's working fine and all the users are getting added when they log in as expected, but I'm not sure what to do when the application hits thousands of users.  If User A is subscribed to the users shared collection (which it needs to be in order to see when User C logs in), isn't it also getting ALL the thousands of user objects that are on the Users shared collection?

Here's the second part of my question: Each user has a "library" of information (this is different than the "attribute" I mentioned earlier) which is an array of objects that will likely change over the period of the user's session.  So User C has a library and when an item gets added to his library User A needs to know about it.  If I have a shared collection of users and the library is an attribute of the user end node, I don't think User A would receive notification when an item gets added to User C's library unless User C re-adds himself to the Users shared collection.  I haven't actually verified that but I read about it somewhere so I could be wrong.  Here's my workaround: User C creates a UID for his library, creates a shared collection for his library using the UID for the node name, and then stored the UID on his user node.  So when User C logs in, User A hears about it, pulls the library UID from User C's user node, then subscribes to User C's shared collection that happens to have the same UID.

So both of these strategies are working, but I'm concerned on the scalability or that there's a better way to do what I'm trying to do.  Any tips?

Thanks,

Aaron

Accepted Solutions (0)

Answers (4)

Answers (4)

Avatar

Avatar

Tony_Sykes

Avatar

Tony_Sykes

Tony_Sykes

23-06-2009

Hi Aaron,

Maybe another approach that came to mind that might help - I assume your app would be authenticating the user as they log in so you could lookup your user tree in the DB to determine if any friends are logged in. If no friends are logged yet then you could dynamically create a AFCS room using createRoom and record that room name against the user tree in the DB. Then when other friends arrive you could enter them into that room and they could interact. If there was a need for a user to have multiple sets of friends then you could check to see if say Group1Friends and Group2Friends both had active rooms then the user could choose which room to be in. There is obviously more to do on the house keeping side for active rooms/users but as a start idea it might help.

Regards,

Tony

Avatar

Avatar

Aaronius9er9er

Avatar

Aaronius9er9er

Aaronius9er9er

23-06-2009

Nigel, thank you for the great response.  I really appreciate it.  I'd be glad to share my app details and see what you think.

At its core the app is essentially a peer-to-peer file-sharing application.  A user can add other users as friends/co-workers and pull files off their machines.  AFCS would be used for:

(1) Letting a user know when his/her friends have logged into the service.

(2) Providing details about the files the user's friends have made available.  If a friend adds/removes/renames a file, the user needs to know about it.

(3) Providing a means for swapping stratus ids.

That's pretty much it.  The actual file transfer will be done p2p using Stratus.

The nice thing about using AFCS is the push strategy.  The app doesn't have to continually poll the server for new info.

What ideas does that bring to mind?  I can't figure out how one user would know if one of his/her friends is connected to the service unless everyone is in the same room and the user is analyzing each new user that connects to see if he/she is a friend.

Thanks for any guidance.

Aaron

Avatar

Avatar

Nigel_Pegg

Avatar

Nigel_Pegg

Nigel_Pegg

22-06-2009

Hi Aaron,

(Apologies for the late reply)

For 1), I don't know if AFCS is a great fit - If what you're looking for is really simple presence (not much "collaboration" per se) but on a very large scale, AFCS is designed more with the inverse case in mind. Our design center tends around fewer users in a room (which can scale into the 1000s, but 10000 is probably more than would be comfortable), with more participation from each user to the others. I think I liked your previous approach a little more, because it didn't rely as much on looping over a property in the set of UserDescriptors.

For 2) You definitely can subscribe to 1 or more nodes within a Collection, but not the entire thing. Note that this is a code path less-traveled, so it might have some bugs here and there, but we're committed (as always) to fixing whatever you might find.

My overall reaction to the problem you're presenting kinda comes down to wondering what all ~10000 users are doing in that room together =). Are they working on something together? Or are you mostly thinking just presence, with some status updates? For large-scale presence, I think once we have HTTP APIs, you could do some more interesting things here (for example, perhaps have users spread over multiple rooms with HTTP messaging between rooms). We are also considering some new ways of supporting much larger rooms - a lot of this has to do with not broadcasting UserDescriptors until they're definitely necessary (or perhaps not at all).

can you tell us more about what you're trying to do? If it's too sensitive to discuss publicly, afcs@adobe.com can work too.

Thanks!

nigel

Avatar

Avatar

Aaronius9er9er

Avatar

Aaronius9er9er

Aaronius9er9er

19-06-2009

I've made some advances since my post.  I figured out how to use the UserManager to manage users including custom attributes so I no longer have a Users SharedCollection.  So I guess there are two main things I need to figure out.  I'll probably figure them out eventually but here they are if anyone wants to chime in:

(1) What happens with the UserManager when there are 10,000 users in a room?  Currently User A has an array of his friend's IDs which were pulled from a database.These user IDs have no real relation to AFCS; they're stored in a database specific to my application and are used to match up which user has which friends.  In order to see if User C is in the room, User A must loop through all the UserDescriptors and see if the user's custom attribute "userId" matches.  That way User A knows which UserDescriptor belongs to User C.  So if User A has to loop through 10,000 users to find which UserDescriptors belong to his friends, that's a heavy load.  I'm not sure if I can do what I need to with AFCS here.

(2) Can a user publish a node to a CollectionNode without subscribing to the CollectionNode?  Likewise, can a user subscribe to a node within a CollectionNode without subscribing to the CollectionNode itself?  If either of those answers are no, then that presents a scalability problem as well.  Each user will be creating his own "library" node to which other users can subscribe.  If the user must subscribe to the CollectionNode in order to publish a child node or subscribe to a child node, then it sounds like they would be hearing about all child nodes that get added/removed...which in my scenario could be ~10,000 different child nodes.

Am I as clear as mud?  I'm sure I'll run into these answers as I move along, but if anyone has any insight that might steer my architecture beforehand, that would be fantastic.  Thanks!

Aaron