Expand my Community achievements bar.

SOLVED

Import number of small package versus importing one big package

Avatar

Level 4

We have a 3rd party application from which we ingest data into CRX. This 3rd party application gives us driver information like first name , last name , score , bio etc. We create one CQ package per driver and ingest into CRX. We have 200 drivers. Each driver data create 30+ nodes and sets some properties.

We are thinking of changing this logic to create one big package with all 200 drivers and ingest this one big CQ package instead of ingesting small 200 CQ packages.

Which is the best way to solve this problem and which will be fast (200 cq package or just one big CQ package) ? We were thinking that with 200 CQ package CQ has to track events for all 200 packages which might result in more time versus one big CQ package in which CQ has to track only one event.

1 Accepted Solution

Avatar

Correct answer by
Employee

Events are produced, not called. When you execute a save(), the data is synchronously written to the persistence manager (Tar PM in default cases) and reindexed by the search engine (Lucene in the default case). Everything else should be async based on events. And those wouldn't be package-specific - event listeners shouldn't care that a node was updated because a package was installed or someone edited a node manually in CRXDE Lite (or something else).

It is possible to write synchronous listeners (see http://wiki.apache.org/jackrabbit/Observation), but this really shouldn't be used in most cases.

Section 12 of the JCR Specification (http://www.day.com/specs/jcr/2.0/12_Observation.html) describes repository events in detail.

View solution in original post

6 Replies

Avatar

Employee

Hi Rohat,

The one big package will be faster to install; how much faster depends a lot on the specifics. With 200 packages of 30-50 nodes each, installing all 200 packages will result in 200 save() calls. Comparatively, if you have one package will 6000-10000 notes (200*30 to 200*50), that will be between 6 and 10 save() calls (assuming the default save threshold of 1024 is used).

Regarding events, that really shouldn't be an issue either way as the number of modifications would be identical. And events are async anyway, so they shouldn't impact import time.

HTH,

Justin

Avatar

Level 4

Hi Justin,

Thanks so much for replying to this thread. Appreciated.

Yesterday I was trying to import one big package instead of small one into my local CRX. Everytime I did it not not successful and giving me Java Heap Error at end. On production system what do you recommend - should we use one big package or break that package into small pieces and import.

Each small package is around 16KB in size. If I consider one big package it will be  3.6 MB package (16 KB * 216 = 3.6 MB). Do you suggest importing 3.6 MB package instead of small chucks. My main objective is to make things faster - which ever way I choose.

Thanks in advance for your quick response.

Avatar

Employee

I can't speak to your memory issues in specifics, but obviously you can give your instance a larger heap. Personally, I run local instances with a 2GB heap. Which might be overkill, but I never have problems :)

A 3.6MB package is nothing. We routinely see packages 100 times (or more) that size.

Avatar

Level 4

Hi Justin,

Thanks again for quick response & reply. Apart from save() call are there any other events which gets called when we try uploading a package. Is there any documentation where I can read what are the different operations which are called when we import a package.

Avatar

Correct answer by
Employee

Events are produced, not called. When you execute a save(), the data is synchronously written to the persistence manager (Tar PM in default cases) and reindexed by the search engine (Lucene in the default case). Everything else should be async based on events. And those wouldn't be package-specific - event listeners shouldn't care that a node was updated because a package was installed or someone edited a node manually in CRXDE Lite (or something else).

It is possible to write synchronous listeners (see http://wiki.apache.org/jackrabbit/Observation), but this really shouldn't be used in most cases.

Section 12 of the JCR Specification (http://www.day.com/specs/jcr/2.0/12_Observation.html) describes repository events in detail.