Level 4

Solved

Approach for Storing Huge/larger number of nodes in JCR

Forum|Forum|10 years ago
February 16, 2016
13 replies
6118 views

Hi,

I am working on a very peculiar requirement where I am required to store analytical data under JCR as nodes and properties. Scenario is like we are avoiding using Adobe Analytics but we have our own analytics developed in some SAP related tool. So from CQ all we need to do is post analytical data to SAP through some webservice. Here as the requirement suggests the data we are storing will be huge. Every day at some time scheduler will run and post the recorded data from JCR Nodes to SAP.The data which we are storing in nodes is related to the download of executables that user will click and download, details which we are going to record include like username, usertype, executable name, download start time etc. Now the issue is as I understand we can only have 1000 child nodes of a node. So how can I arrange the storing of details in JCR so that it overcomes this 1000 child nodes storage limitation also where to store such records(under etc or content or where). Also wanted to know is there any way to ensure the optimization of retrieval of values from these nodes.

Thanks,

Samir

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.

Best answer by joerghoh

Hi Samir,

so you collect tracking data inside of AEM and then export it regularly into SAP? In this case you are using the JCR repo as a storage for quite transient data I don't think that this is a good idea from a conceptual point of view.

* Do you want to collect this data on publish systems? Because in that case you will probably store this data on each publish instance; and then your export (or SAP) needs to consolidate this data. Not a real problem, but you might loose data unless you make all publishs high-available.

* You put lot of pressure on the repo. With TarMK the write performance has also improved, but do you really want to store each data point in the repo? Please do a performance test upfront and check your KPIs.

* Your incoming data is not structured at all, so any order doesn't matter. Just use a oak:unstructured node as parent and you're fine. Just don't expect that you can check this folder with CRXDE Lite :-)

I would choose a different approach, maybe setting up a queueing service (eg. RabittMQ), where each download will be submitted to. Then your AEM instances are stateless again and are not loaded with storing and exporting this transient data. And you have an application which can fetch the datapoints from the queue and feed it directly to SAP (either live or batched, as you like).

Jörg

Show previous replies

joerghoh

Adobe Employee

Hi Samir,

don't store the data inside the JCR, but rather directly into the queue.

Jörg

__96Author

Level 4

Manikumar wrote...

Hi Samir,

If you want really to store the data under jcr and delete once sent to SAP team then you can use the strategy that is used by Adobe for storing user

As we know users under the /home/users will store under folder which starts with alphabetical order and same storing strategy you can use create folder structure based on some parameter and each folder can store 1000 nodes under it.

I think this may help you :)

Thanks

Mani Kumar K

Thanks,Yes Mani. Thats what I have been doing for saving products data under /etc/. breaking the products character by character and creating nodes till 5th character thereby reducing the number of immediate child nodes.

__96Author

Level 4

Jörg Hoh wrote...

Hi Samir,

don't store the data inside the JCR, but rather directly into the queue.

Jörg

hi Jorg,

Thanks for replying. Actually storing into JCR, I will only do if I am unable to implement the RabbitMq. Otherwise I will definitely try to implement Rabbitmq queuing. Do you have any document or reference for queuing.

Thanks,

Samir

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded