Expand my Community achievements bar.

Get ready! An upgraded Experience League Community experience is coming in January.
SOLVED

java.io.InputStream Use for AEM Direct Binary Upload

Avatar

Level 2

Hi,

 

Our recent Best Practices Analyzer (BPA) report highlighted multiple instances where we use java.io.InputStream APIs for general I/O operations, including file reading (e.g., using BufferedReader or InputStreamReader) and handling network streams (e.g., processing HTTP responses).

 

We are preparing to implement the Direct Binary Upload mechanism DG | Adobe Experience Manager in AEM to replace our current use of java.io.InputStream. 

We would appreciate guidance or shared experiences from anyone who has already successfully transitioned to this cloud-native, recommended approach.

 

TIA

1 Accepted Solution

Avatar

Correct answer by
Employee

Hi @ShikhaSharma 

 

  • Deprecation: java.io.InputStream usage and legacy APIs like AssetManager.createAsset* are deprecated in AEM as a Cloud Service - streaming binaries through JVM is an anti-pattern.
  • Direct Binary Upload: AEM CS uses a 3-step REST pattern - initiateUpload, PUT binary to signed URL, and completeUpload—to upload assets directly to cloud storage (Azure/S3), skipping AEM JVM.
  • Recommended Tools: Use the open-source aem-upload Node.js library or equivalent REST API calls; there’s currently no official Java SDK version.
  • Bulk/Mass Migration: For large-scale ingestion, use Bulk Import (cloud-to-cloud transfers from S3/Azure) to efficiently import assets and metadata mappings without streaming.
  • Performance & Reliability: Implement retries and backoff for eventual consistency (sometimes folder creation takes time), and avoid large synchronous file uploads.
  • Post-Processing: After upload completion, AEM triggers Asset Compute microservices for renditions and metadata extraction, replacing local workflows.
  • Best Practices:
    • Run ingestion jobs outside AEM JVM (e.g., AppBuilder or external integrations).
    • Avoid LimitRequestBody changes or posting large (>1GB) binaries to AEM directly.
    • Store only binary references, not file data.

References:
Developer references for Assets Bulk Import Guide

View solution in original post

3 Replies

Avatar

Correct answer by
Employee

Hi @ShikhaSharma 

 

  • Deprecation: java.io.InputStream usage and legacy APIs like AssetManager.createAsset* are deprecated in AEM as a Cloud Service - streaming binaries through JVM is an anti-pattern.
  • Direct Binary Upload: AEM CS uses a 3-step REST pattern - initiateUpload, PUT binary to signed URL, and completeUpload—to upload assets directly to cloud storage (Azure/S3), skipping AEM JVM.
  • Recommended Tools: Use the open-source aem-upload Node.js library or equivalent REST API calls; there’s currently no official Java SDK version.
  • Bulk/Mass Migration: For large-scale ingestion, use Bulk Import (cloud-to-cloud transfers from S3/Azure) to efficiently import assets and metadata mappings without streaming.
  • Performance & Reliability: Implement retries and backoff for eventual consistency (sometimes folder creation takes time), and avoid large synchronous file uploads.
  • Post-Processing: After upload completion, AEM triggers Asset Compute microservices for renditions and metadata extraction, replacing local workflows.
  • Best Practices:
    • Run ingestion jobs outside AEM JVM (e.g., AppBuilder or external integrations).
    • Avoid LimitRequestBody changes or posting large (>1GB) binaries to AEM directly.
    • Store only binary references, not file data.

References:
Developer references for Assets Bulk Import Guide

Avatar

Level 2

Thanks @PavanGaddam 

Avatar

Employee Advisor

In my personal opinion the general discouragment of the InputStream handling is a bit far far-fetched. It is true that you should avoid reading/writing binary streams from/to the repository, when you don't know for sure how large these binaries can get. Too often the implementation consumes the entire stream and holds the result in a single byte array, which can consume the entire heap, leading to stability issues.

 

On the other hand side, reading the input stream of requests / writing to the output stream of responses is possible and no problem.