Best practices for importing Salsify data (CSV/JSON) to create Content Fragments in AEM | Community
Skip to main content
Level 2
March 30, 2026
Question

Best practices for importing Salsify data (CSV/JSON) to create Content Fragments in AEM

  • March 30, 2026
  • 6 replies
  • 97 views

Hi Community,

I’m looking for suggestions and best practices around importing Salsify product data into AEM, specifically for creating or updating Content Fragments.

Current Architecture

  • Salsify product data is exported and stored in AEM DAM as .txt files.
  • AEM Workflow is triggered on these DAM assets
  • The workflow:
    • Reads the Salsify JSON
    • Creates Product Pages
    • Sets product properties on the pages based on the JSON
  • This approach is currently page‑centric, driven by workflow logic

New / Evolving Requirement

We are exploring a shift to:

  • Content Fragments as the primary product data source
  • Importing product data via:
    • CSV files where one or more columns contain Salsify JSON, or
    • Directly from JSON assets
  • Mapping Salsify attributes to Content Fragment Models
  • Supporting re‑imports / updates (idempotent behavior)

What I’m Looking For

I’d appreciate guidance on:

  • Recommended patterns for CSV/JSON → Content Fragment creation
  • Whether workflows are still the preferred approach, or if:
    • Custom servlets
    • Schedulers
    • Asset processing
    • Or other AEM-native mechanisms are better suited
  • How teams are handling:
    • Large Salsify JSON payloads
    • Partial updates vs full re‑imports
    • Storing raw source JSON for traceability
  • Any lessons learned from real-world Salsify + AEM integrations

We’re currently on AEM as a Cloud Service and dealing with product data at scale, so performance and maintainability are important.

Looking forward to hearing how others have approached this.
Thanks in advance!

6 replies

Harwinder-singh
Community Advisor
Community Advisor
March 30, 2026

@Meghana_N  How big is the product catalog? what is the average size of your sku data payload?

Based on that we can explore couple of design patterns. You can go with a pure dynamic API route, where you have AEM templates for different PDP layouts and a sling model can feed the sku data directly from PIM Apis. If you are looking for more content control , localization, performance gain, you might want to stick with CFM route. The only thing that you do to make your life easier is to split the CFM models based on the type of sku data, core product data , SEO attributes, campaign data , cross sell/upsell data. 

Meghana_NAuthor
Level 2
March 31, 2026

@Harwinder-singh 

Thanks for your suggestions.

The product catalog is enterprise‑scale. -

  • Product ranges (parents): ~150–300
  • SKUs / variants (children): ~8,000–15,000
  • Digital assets (images, PDFs, line art): 20,000+
  • Typical SKU (most fields, 2–3 images) - ~7–9 KB

 

Meghana_NAuthor
Level 2
March 31, 2026

Adding to the above , I have the below four below options. Please suggest the one that is more feasible in this scenario -

  • Option 1: Build a cloud‑native importer on AEMaaCS (using custom jobs or Adobe App Builder) to ingest Salsify data and create/update Content Fragments via AEM APIs.

  • Option 2: Use AEM Content Fragment HTTP APIs directly from external services (Node/Java/Python) to programmatically create and maintain product CFs without relying on ACS Commons MCP.

  • Option 3: Implement an external/CI‑CD–driven ingestion pipeline that transforms PIM data outside AEM and syncs it into AEM using Content Fragment and Assets APIs.

  • Option 4: Skip persisting product data in AEM entirely and use a pure dynamic PDP approach where AEM templates render layouts and Sling Models fetch SKU data live from PIM APIs at request time.

AmitVishwakarma
Community Advisor
Community Advisor
April 1, 2026

Hi ​@Meghana_N 

At your scale, the most robust pattern is:

  • Salsify stays the system of record. AEM holds a curated, editor‑friendly "cache" of product data in Content Fragments, kept in sync by an external ingestion service using the CF Management APIs.

1. Model the data correctly in AEM

  • Design CF Models by concern, not "one giant product":
    • Product Core – SKU, title, base description, key attributes.
    • SEO/Discovery – SEO title/description, canonical URL, search facets.
    • Merch/Marketing – badges, feature bullets, banners, storytelling copy.
    • Relationships – cross‑sell/upsell references, collections.
  • Use product/SKU ID as the stable key:
    • Either as the CF name, or a dedicated sku field + index.

This keeps the payload per fragment small, reduces churn, and lets marketing/localization own what AEM is good at.

2. Build an external "importer service" (not workflows)

Implement a small service (Node/Java/Python, or App Builder app) that lives outside AEM and does the heavy lifting:

3. Make the ingestion incremental, idempotent, and safe

For 8k–15k SKUs:

  • Never full‑reimport blindly.
    • Use Salsify change feeds or timestamps to build "delta batches".
  • Batch & throttle:
    • E.g., 500–1,000 SKUs per run, with retry + backoff.
  • Idempotent updates:
    • Importer is allowed to re‑run the same batch; it should converge to the same CF state.
  • Partial updates:
    • Separate jobs/pipelines per concern:
      • Core data job > updates Product Core CFs only.
      • SEO job > touches only
      • SEO CFs. Merch job > touches campaign/merch CFs.
    • This keeps marketing/localization edits safe.
  • Keep a lightweight "import job" log (in your service DB or as a simple log index) with:
    • batch id, time, number of SKUs, success/error count, and a replay mechanism.

4. Store raw Salsify JSON for traceability (optional but useful)

For audit/debugging:

  • Store the raw Salsify payload either:
    • in a sourceJson long‑text field on the CF (for small-ish payloads), or
    • as a JSON file in DAM (e.g., /content/dam/pim-dumps/{sku}.json) with a reference field on the CF.

This lets you prove "what Salsify said" when authors or QA see mismatches.

5. Connect CFs to pages and headless delivery

On the delivery side:

  • Use GraphQL or CF Delivery APIs to fetch product CFs by SKU for:
    • PDPs (single SKU).
    • PLPs or carousels (multi‑SKU queries).
  • PDP page pattern:
    • AEM page or SPA template reads the SKU from URL.
    • Sling Model / React calls GraphQL to fetch CF data for that SKU.
    • Merge in live commerce data (price, stock) from PIM/commerce APIs as needed.

Keep very volatile values (price, inventory) out of CFs and resolve them live from the commerce/PIM APIs at render time.

6. Operational best practices

  • Error isolation: don't run as AEM workflows; keep failures in your external service and surface a concise error report to ops.
  • Re‑runs: design every import job so you can re‑run a specific batch or a single SKU safely.
  • Performance: avoid storing huge blobs or completely duplicating Salsify; only bring what authors/localization really need in AEM.
Amit Vishwakarma - Adobe Commerce Champion 2025 | 16x Adobe certified | 4x Adobe SME
Meghana_NAuthor
Level 2
April 1, 2026

Hi ​@AmitVishwakarma 

Thanks for the detailed solutions. We are exploring App Builder approach as well. Your suggestions will help us in that area. 

giuseppebaglio
Level 10
April 1, 2026

hi ​@Meghana_N,

I would be cautious about using Experience Fragments as the destination for Salsify imports if Salsify is intended to remain the source of truth. An XF introduces presentation and layout into what is fundamentally product data, which can create a second system of record inside AEM.

If the requirement is structured product content, Content Fragments are a better fit because they are presentation-agnostic and can be delivered across channels via JSON/GraphQL or used in AEM Sites. If the requirement is to manage a composed, branded experience, then Experience Fragments are appropriate.

So my preference would be: keep Salsify as the master, sync into AEM only what AEM needs, and use CFs for data and XFs only for reusable experiences/layout.

 

partyush
Community Advisor
Community Advisor
April 1, 2026

Hi ​@Meghana_N 

dealing with Salsify and AEMaaCS at scale is tricky, but here is what usually works best in production.

First off, definitely step away from traditional DAM workflows for bulk data ingestion. On Cloud Service, using workflows for this is an anti-pattern. You'll hit JVM memory spikes, Oak repository bloat, and execution timeouts pretty quickly if you're dealing with a large product catalog.

if possible, take the heavy lifting off the AEM JVM entirely. Use Adobe App Builder (I/O) as middleware. You can have Salsify push to an App Builder webhook, let App Builder parse the JSON and map it to your CF Models, and then push the fragments directly into AEM using the Content Fragment OpenAPI.

If you strictly need to keep it AEM-native: Drop the Salsify file into a specific DAM folder, use a Sling Event Listener to detect it, and immediately hand the processing off to a Sling Job (via JobManager API). This gives you guaranteed async processing and retries without tying up authoring resources.

For handling the scale and logic you mentioned:

  • Memory Management: Never load the full Salsify JSON or CSV into memory. Use the Jackson Streaming API (JsonParser) or a streaming CSV reader to process the file node-by-node.

  • Batching: Save your JCR session in strict batches (e.g., every 500 to 1000 nodes). Trying to commit 10k+ nodes in a single save will crash the JVM.

  • Idempotency & Partial Updates: Don't do blind overwrites. Generate a SHA-256 hash of the Salsify product payload and save it as a hidden property (like salsifyHash) on the Content Fragment. On your next import, hash the incoming payload and compare it. If the hashes match, skip the JCR write entirely. This drastically reduces repository overhead.

  • Traceability: Don't dump raw JSON strings into a CF property—it's bad for the JCR. Instead, archive the ingested .txt or .json file in a dedicated DAM folder (e.g., /content/dam/salsify-archive/) and just add a sourceAssetPath property to your CF Model that links back to that specific file.

Looking forward to hearing back from you on this approach. Let me know if this helps you in any way.

 

Thanks,

Partyush :)

Meghana_NAuthor
Level 2
April 2, 2026

Hi ​@partyush 

Thanks for the solution. We are exploring App Builder approach as well. Your suggestions will help us in that area. 

partyush
Community Advisor
Community Advisor
April 2, 2026

Noted ​@Meghana_N

sarav_prakash
Community Advisor
Community Advisor
April 1, 2026

Hi ​@Meghana_N , I exactly worked on Salsify-AEM integration, bringing assets + text content. I explained my process in my articles. And use JSON, coz with csv, we ran into utf-8 encoding issues with strings having commas. 

Attempt 1: AEM workflow

  1. Salsify json will be stored into /landing folders in DAM using Assets Bulk Importer. Salsify ships to sftp. We had a Cloud job, that moves from salsify sftp to azure blob. And Asset Bulk Importer, periodically pulls azure blob into DAM
  2. Scheduled workflow to check if files exists in /landing folder, process, and move to /complete folder. Workflow coz, its series of steps. check availability of landing files, process each record, reprocess skipped/failed records, finally move to /complete folder. 
  3. Each record in json, we use FragmentTemplate to create fragments. And ContentElement to populate values. 
  4. For assets, this was 2024, so wrote the 3way AssetCompute microservice in java and imported  assets in right dam folder. 

Cons:

  1. AssetCompute is a very expensive operation to run in JVM. Whenever payload exceeds ~>200MB, server was crashing in production. 
  2. No good java library to find Content fragments. We did a search to find right content fragment, still its Slow query in production. We have 1.8M cf . all lucene indexing done correctly, still queries are slow and graphql query also.

 

Attempt2: AIO Journalling

  1. To solve the #1 and #2 problem above, we started running expensive operations outside AEM. My article explains eventing from Workfront. But same topic we kept for Salsify events as well
  2. So an AIO Runtime action will read json from azure blob, converts each record into event and post into Journalling queue
  3. The next action, will subscribe read event, and perform same operations of creating folders/cf/importing asset and updating cf
  4. We used OpenAPI to update. This is more secure and guaranteed. 
  5. And ImportFromUrl API to upload assets. This performs same 3way AssetCompute. except runs on adobe cloud, so not overloading AEM
  6. Main benefit was the Journalling Kafka queue. Whenever events get stuck in queue or record fails, we simply push the event back into queue. So auto retry happens after current queue is flushed. We could incident manage in Prod, just blocking queue, and unblocking when we had emergency on AEM server. 

Conclusion: We specifically had volume issues. Certain days, job runs for bulk updating 50K records. Or uploading 20K assets. normal day volume is 2000 records. So running ALL-JVM java job was not scalable for us. We switched to Cloud-native outside AEM. So you too smell volume issues with scalability, better go with AIO journaling.

Overall CF with Graphql is pretty successful for us, for 1.8 products multiple catalogs. 

PGURUKRISHNA
Level 5
April 7, 2026

Hey ​@Meghana_N For your Salsify → Content Fragment migration on AEMaaCS:

Drop the DAM workflow approach. Use a custom Sling Job + servlet instead. Expose a POST endpoint that your middleware (or a Sling scheduler) calls with the Salsify payload. This gives you control over batching, retries, and error handling — things DAM workflows aren't designed for.

For CSV with embedded JSON: Parse CSV with Apache Commons CSV, then deserialize the JSON columns with Jackson. Map Salsify attributes to CF fields via an externalized OSGi config so you're not hardcoding mappings.

For idempotent re-imports: Use the Salsify product ID as the Content Fragment node name. Do a 

resolver.getResource(path)

 — if it exists, update it; if not, create it. Store an MD5 hash of the incoming payload on the CF and skip writes if unchanged. This alone will save you massive write overhead at scale.

For large payloads: Chunk into batches of 100-500 products, each processed as a separate Sling Job on a dedicated ordered queue (2-4 threads). Use Jackson's streaming parser for large JSON files. Never commit after every single fragment — batch 50-100 per 

resolver.commit()

.

For traceability: Store the raw Salsify JSON as a multi-line text field on the Content Fragment itself. Move processed source files to a 

/content/dam/salsify-imports/processed/

 folder, failed ones to 

/failed/

.

For delta syncs: Store a 

lastSyncTimestamp

 and use Salsify's change log API to fetch only updated products instead of full re-imports every time.

The core pattern is: Servlet receives data → Sling Job queue processes batches → CF API creates/updates fragments → hash comparison prevents unnecessary writes. This scales to 10K+ SKUs reliably on AEMaaCS.

 

 

Pagidala GuruKrishna