Import number of small package versus importing one big package | Community
Skip to main content
Level 4
October 16, 2015
Solved

Import number of small package versus importing one big package

  • October 16, 2015
  • 6 replies
  • 1097 views

We have a 3rd party application from which we ingest data into CRX. This 3rd party application gives us driver information like first name , last name , score , bio etc. We create one CQ package per driver and ingest into CRX. We have 200 drivers. Each driver data create 30+ nodes and sets some properties.

We are thinking of changing this logic to create one big package with all 200 drivers and ingest this one big CQ package instead of ingesting small 200 CQ packages.

Which is the best way to solve this problem and which will be fast (200 cq package or just one big CQ package) ? We were thinking that with 200 CQ package CQ has to track events for all 200 packages which might result in more time versus one big CQ package in which CQ has to track only one event.

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.
Best answer by JustinEd3

Events are produced, not called. When you execute a save(), the data is synchronously written to the persistence manager (Tar PM in default cases) and reindexed by the search engine (Lucene in the default case). Everything else should be async based on events. And those wouldn't be package-specific - event listeners shouldn't care that a node was updated because a package was installed or someone edited a node manually in CRXDE Lite (or something else).

It is possible to write synchronous listeners (see http://wiki.apache.org/jackrabbit/Observation), but this really shouldn't be used in most cases.

Section 12 of the JCR Specification (http://www.day.com/specs/jcr/2.0/12_Observation.html) describes repository events in detail.

6 replies

Adobe Employee
October 16, 2015

Hi Rohat,

The one big package will be faster to install; how much faster depends a lot on the specifics. With 200 packages of 30-50 nodes each, installing all 200 packages will result in 200 save() calls. Comparatively, if you have one package will 6000-10000 notes (200*30 to 200*50), that will be between 6 and 10 save() calls (assuming the default save threshold of 1024 is used).

Regarding events, that really shouldn't be an issue either way as the number of modifications would be identical. And events are async anyway, so they shouldn't impact import time.

HTH,

Justin

Level 4
October 16, 2015

Hi Justin,

Thanks so much for replying to this thread. Appreciated.

Yesterday I was trying to import one big package instead of small one into my local CRX. Everytime I did it not not successful and giving me Java Heap Error at end. On production system what do you recommend - should we use one big package or break that package into small pieces and import.

Each small package is around 16KB in size. If I consider one big package it will be  3.6 MB package (16 KB * 216 = 3.6 MB). Do you suggest importing 3.6 MB package instead of small chucks. My main objective is to make things faster - which ever way I choose.

Thanks in advance for your quick response.

Adobe Employee
October 16, 2015

I can't speak to your memory issues in specifics, but obviously you can give your instance a larger heap. Personally, I run local instances with a 2GB heap. Which might be overkill, but I never have problems :)

A 3.6MB package is nothing. We routinely see packages 100 times (or more) that size.

Level 4
October 16, 2015

Hi Justin,

Thanks again for quick response & reply. Apart from save() call are there any other events which gets called when we try uploading a package. Is there any documentation where I can read what are the different operations which are called when we import a package.

smacdonald2008
Level 10
October 16, 2015
JustinEd3Adobe EmployeeAccepted solution
Adobe Employee
October 16, 2015

Events are produced, not called. When you execute a save(), the data is synchronously written to the persistence manager (Tar PM in default cases) and reindexed by the search engine (Lucene in the default case). Everything else should be async based on events. And those wouldn't be package-specific - event listeners shouldn't care that a node was updated because a package was installed or someone edited a node manually in CRXDE Lite (or something else).

It is possible to write synchronous listeners (see http://wiki.apache.org/jackrabbit/Observation), but this really shouldn't be used in most cases.

Section 12 of the JCR Specification (http://www.day.com/specs/jcr/2.0/12_Observation.html) describes repository events in detail.