Expand my Community achievements bar.

SOLVED

Issues: Bulk Image Upload to AEM DAM ->8000 Assets

Avatar

Level 10

All,

I am facing issues with uploading assets to AEM via a bundle. I have assets close to 9000 and i notice issues such as

1. Disk space decreasing , around 9-10 gb and on tar opt i see around 5 gb released. 

Question: What is the potential cause for this disk space increase?

Approach taken: I am invoking the save, for every 20-22 assets, and i have tried for 10-15 as well. Along with that the code attempts to check if the workflow for the asset is complete, and if the rendition, 48x48 is generated, after creating the asset and setting metadata values and tags, if the workflow status is true or the rendition has not been generated, i invoke thread.sleep and wait for it to complete, the process for waiting for the completion of workflow and rendition is done for 15 assets in a batch .Once complete , save operation is invoked if image counter is 20 or 15.

Used asset.setBatch but in vain.

Question : Is there any caching of inputstream in AEM?

Approach Taken: I open the inputstream and close it after streaming the assets, and  uploading them using assetmanager.createAsset method.

Note: Waiting for workflow completion status and rendition for a set of assets avoids eating much of disk space, i noticed that it still ate, 7.5gb.

2.Batch wait and save Approach: I noticed that the disk space increase could somehow be averted by a few gigs, by having to wait for workflow completion status and rendition for a set of assets, 15 in particular is what i tried.

 With this however I noticed the below mentioned issues

Issue1: I see exceptions at times, however the workflow status is complete, thus workflow status does not help me identify which asset processing is incomplete.

  • com.day.cq.dam.commons.handler.StandardImageHandler failed to extract image using Layer will try the fallback java.util.ConcurrentModificationException
  • Image Read Exception : Invalid marker found in entropy data
  • "failure to create asset rendition"

Issue2: The instance just stops further parsing when it reaches a limit, and displays caching statements and Tar journal thread lock message. Instance is unresponsive.

Message Observed in log

02.10.2014 16:32:55.804 *INFO* [pool-6-thread-4] org.apache.jackrabbit.core.persistence.bundle.AbstractBundlePersistenceManager cachename=crx.defaultBundleCache[ConcurrentCache@26178eac], elements=953, usedmemorykb=8186, maxmemorykb=8192, access=8607679, miss=1240177
02.10.2014 16:35:47.489 *WARN* [Tar PM Optimization] com.day.crx.persistence.tar.ReentrantLockWithInfo Lock on tarJournal still held by Thread[pool-6-thread-4,5,main]: 0

Issue3: Is there a way that we could capture that the errors in (Issue1) and avoid processing that. I hoped Workflow would have failed, but does not look like.

3. I would like to disable workflow processing before uploading assets, and restart it after upload is complete. and i hear many mention it helps,

Question: does this still create necessary renditions and versions?

Question: Which is the workflow or workflows that i am supposed to stop temporarily?

I feel this would help avoiding many issues with Asset upload dealing with approx 9000 assets.

Regards,

1 Accepted Solution

Avatar

Correct answer by
Level 10

Hi NitroHazeDev,

   It is my personal opinion Splitting the questions into small pieces might attract more response.
   
   
   1)     Always plan for 3 times the actual size of assets.  If you are noticing more I would debug more. Generally cases I seen is custom implementation calls update assets and increases disk space. If you feel it is product bug engage with official support team.
   
    2)     There is no caching of inputstream. uploading causes workflow to rerun.
    
    2a)     Workflow engine has inbuild option to retry the failed jobs. As long as workflow is complete should be fine. Look into stale workflow or failed worflow to find asset processing incomplete.
    
    2b)    Tar journal thread lock:-  Disable tar optimization when you doing bulk upload if it takes more than day. Enable once job is done.
    
     3a)   Enabling workflow processing after uploading assets does not create redentions. You need to trigger externally.
     
    3b)    Deponds on your use case.
    
    Looking into use case & if time is short.  I would have gone with offloading approach.
    
Thanks,
Sham
Tweet: @adobe_sham

View solution in original post

4 Replies

Avatar

Correct answer by
Level 10

Hi NitroHazeDev,

   It is my personal opinion Splitting the questions into small pieces might attract more response.
   
   
   1)     Always plan for 3 times the actual size of assets.  If you are noticing more I would debug more. Generally cases I seen is custom implementation calls update assets and increases disk space. If you feel it is product bug engage with official support team.
   
    2)     There is no caching of inputstream. uploading causes workflow to rerun.
    
    2a)     Workflow engine has inbuild option to retry the failed jobs. As long as workflow is complete should be fine. Look into stale workflow or failed worflow to find asset processing incomplete.
    
    2b)    Tar journal thread lock:-  Disable tar optimization when you doing bulk upload if it takes more than day. Enable once job is done.
    
     3a)   Enabling workflow processing after uploading assets does not create redentions. You need to trigger externally.
     
    3b)    Deponds on your use case.
    
    Looking into use case & if time is short.  I would have gone with offloading approach.
    
Thanks,
Sham
Tweet: @adobe_sham

Avatar

Level 10

To all who may find it helpful, after monitoring it closely, i did some performance tuning to aem instance and also invoked save operation for around 23-25 assets, and batch wait of assets(11-15 assets) for the workflow to be complete. Workflow completion can be obtained by using a combination of workflow status and any rendition availability

Observations: 6gb before Tar opt, for 8654 assets with renditions and after tar opt, 1gb, if tar opt does not complete you might have to rerun tar opt

Avatar

Level 10

Thank you Sham, point noted.:)

I do see close to 7gb consumed, and after tar optimization and index merge,3gb gets released. On packaging the assets in crx package manager i see the size, approx 559MB. Is this normal?

Tar journal Thread lock, i see this happening when the number of images are approx 7000+, I am going to disable tar optimization, though i must say that the tar optimizer runs midnight, wonder what is triggering it to run during the days, size maybe? This bulk upload takes 2hrs due to image processing

Disabling DAM workflow is done by disabling the "DAM Update asset " for Node modified and create event?

 Is there a way to trigger the workflow back again for all the assets under different folders,.I don't want to do this at individual assets but at a parent folder which consists of folders and assets.

Please let me know

Avatar

Level 10

NitroHazeDev wrote...

Is there a way to trigger the workflow back again for all the assets under different folders,.I don't want to do this at individual assets but at a parent folder which consists of folders and assets.

Not oob you need to write a custom workflow step to run for all assets in folder.