Hi Community,I’m looking for suggestions and best practices around importing Salsify product data into AEM, specifically for creating or updating Content Fragments.Current ArchitectureSalsify product data is exported and stored in AEM DAM as .txt files. AEM Workflow is triggered on these DAM assets The workflow: Reads the Salsify JSON Creates Product Pages Sets product properties on the pages based on the JSON This approach is currently page‑centric, driven by workflow logicNew / Evolving RequirementWe are exploring a shift to:Content Fragments as the primary product data source Importing product data via: CSV files where one or more columns contain Salsify JSON, or Directly from JSON assets Mapping Salsify attributes to Content Fragment Models Supporting re‑imports / updates (idempotent behavior)What I’m Looking ForI’d appreciate guidance on:Recommended patterns for CSV/JSON → Content Fragment creation Whether workflows are still the preferred approach, or if: Custom servlets Schedulers Asset processing Or other AEM-native mechanisms are better suited How teams are handling: Large Salsify JSON payloads Partial updates vs full re‑imports Storing raw source JSON for traceability Any lessons learned from real-world Salsify + AEM integrationsWe’re currently on AEM as a Cloud Service and dealing with product data at scale, so performance and maintainability are important.Looking forward to hearing how others have approached this.Thanks in advance!

Hi @Meghana_N, I exactly worked on Salsify-AEM integration, bringing assets + text content.I explained my process in my articles. And use JSON, coz with csv, we ran into utf-8 encoding issues with strings having commas.Attempt 1: AEM workflowSalsify json will be stored into /landing folders in DAM using Assets Bulk Importer. Salsify ships to sftp. We had a Cloud job, that moves from salsify sftp to azure blob. And Asset Bulk Importer, periodically pulls azure blob into DAMScheduled workflow to check if files exists in /landing folder, process, and move to /complete folder. Workflow coz, its series of steps. check availability of landing files, process each record, reprocess skipped/failed records, finally move to /complete folder.Each record in json, we use FragmentTemplateto create fragments. And ContentElement to populate values.For assets, this was 2024, so wrote the 3way AssetCompute microservice in java and imported assets in right dam folder.Cons:AssetCompute is a very expensive operation to run in JVM. Whenever payload exceeds ~>200MB, server was crashing in production.No good java library to find Content fragments. We did a search to find right content fragment, still its Slow query in production. We have 1.8M cf . all lucene indexing done correctly, still queries are slow and graphql query also.Attempt2: AIO JournallingTo solve the #1 and #2 problem above, we started running expensive operations outside AEM. My article explains eventing from Workfront. But same topic we kept for Salsify events as wellSo an AIO Runtime action will read json from azure blob, converts each record into event and post into Journalling queueThe next action, will subscribe read event, and perform same operations of creating folders/cf/importing asset and updating cfWe used OpenAPI to update. This is more secure and guaranteed.And ImportFromUrl API to upload assets. This performs same 3way AssetCompute. except runs on adobe cloud, so not overloading AEMMain benefit was the Journalling Kafka queue. Whenever events get stuck in queue or record fails, we simply push the event back into queue. So auto retry happens after current queue is flushed. We could incident manage in Prod, just blocking queue, and unblocking when we had emergency on AEM server.Conclusion: We specifically had volume issues. Certain days, job runs for bulk updating 50K records. Or uploading 20K assets. normal day volume is 2000 records. So running ALL-JVM java job was not scalable for us. We switched to Cloud-native outside AEM. So you too smell volume issues with scalability, better go with AIO journaling.Overall CF with Graphql is pretty successful for us, for 1.8 products multiple catalogs.

Hi @Meghana_NAt your scale, the most robust pattern is:Salsify stays the system of record. AEM holds a curated, editor‑friendly "cache" of product data in Content Fragments, kept in sync by an external ingestion service using the CF Management APIs.1. Model the data correctly in AEMDesign CF Models by concern, not "one giant product":Product Core – SKU, title, base description, key attributes.SEO/Discovery – SEO title/description, canonical URL, search facets.Merch/Marketing – badges, feature bullets, banners, storytelling copy.Relationships – cross‑sell/upsell references, collections.Use product/SKU ID as the stable key:Either as the CF name, or a dedicated sku field + index.This keeps the payload per fragment small, reduces churn, and lets marketing/localization own what AEM is good at.2. Build an external "importer service" (not workflows)Implement a small service (Node/Java/Python, or App Builder app) that lives outside AEM and does the heavy lifting:Input:Salsify export (CSV or JSON) delivered to cloud storage or via webhook.For CSV, normalize first into clean JSON (one JSON doc per SKU).Mapping:Map Salsify properties > CF model fields via a configuration file (e.g., JSON/YAML), not hard‑coded, so you can evolve the mapping without redeploying.Upsert logic using Content Fragment Management OpenAPI:Lookup CF by SKU.If not found > POST create.If found > PUT/PATCH update only the mapped fields, leaving author‑edited fields untouched.Use the modern CF Management OpenAPIs rather than the older Assets HTTP API for CFs.https://experienceleague.adobe.com/en/docs/experience-manager-cloud-service/content/headless/content-fragment-openapishttps://experienceleague.adobe.com/en/docs/experience-manager-cloud-service/content/assets/admin/mac-api-assets3. Make the ingestion incremental, idempotent, and safeFor 8k–15k SKUs:Never full‑reimport blindly.Use Salsify change feeds or timestamps to build "delta batches".Batch & throttle:E.g., 500–1,000 SKUs per run, with retry + backoff.Idempotent updates:Importer is allowed to re‑run the same batch; it should converge to the same CF state.Partial updates:Separate jobs/pipelines per concern:Core data job > updates Product Core CFs only.SEO job > touches onlySEO CFs. Merch job > touches campaign/merch CFs.This keeps marketing/localization edits safe.Keep a lightweight "import job" log (in your service DB or as a simple log index) with:batch id, time, number of SKUs, success/error count, and a replay mechanism.4. Store raw Salsify JSON for traceability (optional but useful)For audit/debugging:Store the raw Salsify payload either:in a sourceJson long‑text field on the CF (for small-ish payloads), oras a JSON file in DAM (e.g., /content/dam/pim-dumps/{sku}.json) with a reference field on the CF.This lets you prove "what Salsify said" when authors or QA see mismatches.5. Connect CFs to pages and headless deliveryOn the delivery side:Use GraphQL or CF Delivery APIs to fetch product CFs by SKU for:PDPs (single SKU).PLPs or carousels (multi‑SKU queries).PDP page pattern:AEM page or SPA template reads the SKU from URL.Sling Model / React calls GraphQL to fetch CF data for that SKU.Merge in live commerce data (price, stock) from PIM/commerce APIs as needed.Keep very volatile values (price, inventory) out of CFs and resolve them live from the commerce/PIM APIs at render time.6. Operational best practicesError isolation: don't run as AEM workflows; keep failures in your external service and surface a concise error report to ops.Re‑runs: design every import job so you can re‑run a specific batch or a single SKU safely.Performance: avoid storing huge blobs or completely duplicating Salsify; only bring what authors/localization really need in AEM.

M

Meghana_N

Solved

Best practices for importing Salsify data (CSV/JSON) to create Content Fragments in AEM

Forum|Forum|1 month ago
March 30, 2026
6 replies
122 views

Hi Community,

I’m looking for suggestions and best practices around importing Salsify product data into AEM, specifically for creating or updating Content Fragments.

Current Architecture

Salsify product data is exported and stored in AEM DAM as .txt files.
AEM Workflow is triggered on these DAM assets
The workflow:
- Reads the Salsify JSON
- Creates Product Pages
- Sets product properties on the pages based on the JSON
This approach is currently page‑centric, driven by workflow logic

New / Evolving Requirement

We are exploring a shift to:

Content Fragments as the primary product data source
Importing product data via:
- CSV files where one or more columns contain Salsify JSON, or
- Directly from JSON assets
Mapping Salsify attributes to Content Fragment Models
Supporting re‑imports / updates (idempotent behavior)

What I’m Looking For

I’d appreciate guidance on:

Recommended patterns for CSV/JSON → Content Fragment creation
Whether workflows are still the preferred approach, or if:
- Custom servlets
- Schedulers
- Asset processing
- Or other AEM-native mechanisms are better suited
How teams are handling:
- Large Salsify JSON payloads
- Partial updates vs full re‑imports
- Storing raw source JSON for traceability
Any lessons learned from real-world Salsify + AEM integrations

We’re currently on AEM as a Cloud Service and dealing with product data at scale, so performance and maintainability are important.

Looking forward to hearing how others have approached this.
Thanks in advance!

Best answer by sarav_prakash

Hi @Meghana_N , I exactly worked on Salsify-AEM integration, bringing assets + text content. I explained my process in my articles. And use JSON, coz with csv, we ran into utf-8 encoding issues with strings having commas.

Attempt 1: AEM workflow

Salsify json will be stored into /landing folders in DAM using Assets Bulk Importer. Salsify ships to sftp. We had a Cloud job, that moves from salsify sftp to azure blob. And Asset Bulk Importer, periodically pulls azure blob into DAM
Scheduled workflow to check if files exists in /landing folder, process, and move to /complete folder. Workflow coz, its series of steps. check availability of landing files, process each record, reprocess skipped/failed records, finally move to /complete folder.
Each record in json, we use FragmentTemplate to create fragments. And ContentElement to populate values.
For assets, this was 2024, so wrote the 3way AssetCompute microservice in java and imported assets in right dam folder.

Cons:

AssetCompute is a very expensive operation to run in JVM. Whenever payload exceeds ~>200MB, server was crashing in production.
No good java library to find Content fragments. We did a search to find right content fragment, still its Slow query in production. We have 1.8M cf . all lucene indexing done correctly, still queries are slow and graphql query also.

Attempt2: AIO Journalling

To solve the #1 and #2 problem above, we started running expensive operations outside AEM. My article explains eventing from Workfront. But same topic we kept for Salsify events as well
So an AIO Runtime action will read json from azure blob, converts each record into event and post into Journalling queue
The next action, will subscribe read event, and perform same operations of creating folders/cf/importing asset and updating cf
We used OpenAPI to update. This is more secure and guaranteed.
And ImportFromUrl API to upload assets. This performs same 3way AssetCompute. except runs on adobe cloud, so not overloading AEM
Main benefit was the Journalling Kafka queue. Whenever events get stuck in queue or record fails, we simply push the event back into queue. So auto retry happens after current queue is flushed. We could incident manage in Prod, just blocking queue, and unblocking when we had emergency on AEM server.

Conclusion: We specifically had volume issues. Certain days, job runs for bulk updating 50K records. Or uploading 20K assets. normal day volume is 2000 records. So running ALL-JVM java job was not scalable for us. We switched to Cloud-native outside AEM. So you too smell volume issues with scalability, better go with AIO journaling.

Overall CF with Graphql is pretty successful for us, for 1.8 products multiple catalogs.

Harwinder-singh

Community Advisor

@Meghana_N How big is the product catalog? what is the average size of your sku data payload?

Based on that we can explore couple of design patterns. You can go with a pure dynamic API route, where you have AEM templates for different PDP layouts and a sling model can feed the sku data directly from PIM Apis. If you are looking for more content control , localization, performance gain, you might want to stick with CFM route. The only thing that you do to make your life easier is to split the CFM models based on the type of sku data, core product data , SEO attributes, campaign data , cross sell/upsell data.

M

Meghana_NAuthor

@Harwinder-singh

Thanks for your suggestions.

The product catalog is enterprise‑scale. -

Product ranges (parents): ~150–300
SKUs / variants (children): ~8,000–15,000
Digital assets (images, PDFs, line art): 20,000+
Typical SKU (most fields, 2–3 images) - ~7–9 KB

M

Meghana_NAuthor

Adding to the above , I have the below four below options. Please suggest the one that is more feasible in this scenario -

Option 1: Build a cloud‑native importer on AEMaaCS (using custom jobs or Adobe App Builder) to ingest Salsify data and create/update Content Fragments via AEM APIs.
Option 2: Use AEM Content Fragment HTTP APIs directly from external services (Node/Java/Python) to programmatically create and maintain product CFs without relying on ACS Commons MCP.
Option 3: Implement an external/CI‑CD–driven ingestion pipeline that transforms PIM data outside AEM and syncs it into AEM using Content Fragment and Assets APIs.
Option 4: Skip persisting product data in AEM entirely and use a pure dynamic PDP approach where AEM templates render layouts and Sling Models fetch SKU data live from PIM APIs at request time.

AmitVishwakarma

Community Advisor

Hi @Meghana_N

At your scale, the most robust pattern is:

Salsify stays the system of record. AEM holds a curated, editor‑friendly "cache" of product data in Content Fragments, kept in sync by an external ingestion service using the CF Management APIs.

1. Model the data correctly in AEM

Design CF Models by concern, not "one giant product":
- Product Core – SKU, title, base description, key attributes.
- SEO/Discovery – SEO title/description, canonical URL, search facets.
- Merch/Marketing – badges, feature bullets, banners, storytelling copy.
- Relationships – cross‑sell/upsell references, collections.
Use product/SKU ID as the stable key:
- Either as the CF name, or a dedicated sku field + index.

This keeps the payload per fragment small, reduces churn, and lets marketing/localization own what AEM is good at.

2. Build an external "importer service" (not workflows)

Implement a small service (Node/Java/Python, or App Builder app) that lives outside AEM and does the heavy lifting:

Input:
- Salsify export (CSV or JSON) delivered to cloud storage or via webhook.
- For CSV, normalize first into clean JSON (one JSON doc per SKU).
Mapping:
- Map Salsify properties > CF model fields via a configuration file (e.g., JSON/YAML), not hard‑coded, so you can evolve the mapping without redeploying.
Upsert logic using Content Fragment Management OpenAPI:
- Lookup CF by SKU.
- If not found > POST create.
- If found > PUT/PATCH update only the mapped fields, leaving author‑edited fields untouched.
Use the modern CF Management OpenAPIs rather than the older Assets HTTP API for CFs.
https://experienceleague.adobe.com/en/docs/experience-manager-cloud-service/content/headless/content-fragment-openapis
https://experienceleague.adobe.com/en/docs/experience-manager-cloud-service/content/assets/admin/mac-api-assets

3. Make the ingestion incremental, idempotent, and safe

For 8k–15k SKUs:

Never full‑reimport blindly.
- Use Salsify change feeds or timestamps to build "delta batches".
Batch & throttle:
- E.g., 500–1,000 SKUs per run, with retry + backoff.
Idempotent updates:
- Importer is allowed to re‑run the same batch; it should converge to the same CF state.
Partial updates:
- Separate jobs/pipelines per concern:
  - Core data job > updates Product Core CFs only.
  - SEO job > touches only
  - SEO CFs. Merch job > touches campaign/merch CFs.
- This keeps marketing/localization edits safe.
Keep a lightweight "import job" log (in your service DB or as a simple log index) with:
- batch id, time, number of SKUs, success/error count, and a replay mechanism.

4. Store raw Salsify JSON for traceability (optional but useful)

For audit/debugging:

Store the raw Salsify payload either:
- in a sourceJson long‑text field on the CF (for small-ish payloads), or
- as a JSON file in DAM (e.g., /content/dam/pim-dumps/{sku}.json) with a reference field on the CF.

This lets you prove "what Salsify said" when authors or QA see mismatches.

5. Connect CFs to pages and headless delivery

On the delivery side:

Use GraphQL or CF Delivery APIs to fetch product CFs by SKU for:
- PDPs (single SKU).
- PLPs or carousels (multi‑SKU queries).
PDP page pattern:
- AEM page or SPA template reads the SKU from URL.
- Sling Model / React calls GraphQL to fetch CF data for that SKU.
- Merge in live commerce data (price, stock) from PIM/commerce APIs as needed.

Keep very volatile values (price, inventory) out of CFs and resolve them live from the commerce/PIM APIs at render time.

6. Operational best practices

Error isolation: don't run as AEM workflows; keep failures in your external service and surface a concise error report to ops.
Re‑runs: design every import job so you can re‑run a specific batch or a single SKU safely.
Performance: avoid storing huge blobs or completely duplicating Salsify; only bring what authors/localization really need in AEM.

Amit Vishwakarma - Adobe Commerce Champion 2025 | 16x Adobe certified | 4x Adobe SME

M

Meghana_NAuthor

Hi @AmitVishwakarma

Thanks for the detailed solutions. We are exploring App Builder approach as well. Your suggestions will help us in that area.

giuseppebaglio

Level 10

hi @Meghana_N,

I would be cautious about using Experience Fragments as the destination for Salsify imports if Salsify is intended to remain the source of truth. An XF introduces presentation and layout into what is fundamentally product data, which can create a second system of record inside AEM.

If the requirement is structured product content, Content Fragments are a better fit because they are presentation-agnostic and can be delivered across channels via JSON/GraphQL or used in AEM Sites. If the requirement is to manage a composed, branded experience, then Experience Fragments are appropriate.

So my preference would be: keep Salsify as the master, sync into AEM only what AEM needs, and use CFs for data and XFs only for reusable experiences/layout.

partyush

Community Advisor

Hi @Meghana_N

dealing with Salsify and AEMaaCS at scale is tricky, but here is what usually works best in production.

First off, definitely step away from traditional DAM workflows for bulk data ingestion. On Cloud Service, using workflows for this is an anti-pattern. You'll hit JVM memory spikes, Oak repository bloat, and execution timeouts pretty quickly if you're dealing with a large product catalog.

if possible, take the heavy lifting off the AEM JVM entirely. Use Adobe App Builder (I/O) as middleware. You can have Salsify push to an App Builder webhook, let App Builder parse the JSON and map it to your CF Models, and then push the fragments directly into AEM using the Content Fragment OpenAPI.

If you strictly need to keep it AEM-native: Drop the Salsify file into a specific DAM folder, use a Sling Event Listener to detect it, and immediately hand the processing off to a Sling Job (via JobManager API). This gives you guaranteed async processing and retries without tying up authoring resources.

For handling the scale and logic you mentioned:

Memory Management: Never load the full Salsify JSON or CSV into memory. Use the Jackson Streaming API (JsonParser) or a streaming CSV reader to process the file node-by-node.
Batching: Save your JCR session in strict batches (e.g., every 500 to 1000 nodes). Trying to commit 10k+ nodes in a single save will crash the JVM.
Idempotency & Partial Updates: Don't do blind overwrites. Generate a SHA-256 hash of the Salsify product payload and save it as a hidden property (like salsifyHash) on the Content Fragment. On your next import, hash the incoming payload and compare it. If the hashes match, skip the JCR write entirely. This drastically reduces repository overhead.
Traceability: Don't dump raw JSON strings into a CF property—it's bad for the JCR. Instead, archive the ingested .txt or .json file in a dedicated DAM folder (e.g., /content/dam/salsify-archive/) and just add a sourceAssetPath property to your CF Model that links back to that specific file.

Looking forward to hearing back from you on this approach. Let me know if this helps you in any way.

Thanks,

Partyush :)

M

Meghana_NAuthor

Hi @partyush

Thanks for the solution. We are exploring App Builder approach as well. Your suggestions will help us in that area.

partyush

Community Advisor

Noted @Meghana_N

sarav_prakash

Accepted solution

Community Advisor

Hi @Meghana_N , I exactly worked on Salsify-AEM integration, bringing assets + text content. I explained my process in my articles. And use JSON, coz with csv, we ran into utf-8 encoding issues with strings having commas.

Attempt 1: AEM workflow

Salsify json will be stored into /landing folders in DAM using Assets Bulk Importer. Salsify ships to sftp. We had a Cloud job, that moves from salsify sftp to azure blob. And Asset Bulk Importer, periodically pulls azure blob into DAM
Scheduled workflow to check if files exists in /landing folder, process, and move to /complete folder. Workflow coz, its series of steps. check availability of landing files, process each record, reprocess skipped/failed records, finally move to /complete folder.
Each record in json, we use FragmentTemplate to create fragments. And ContentElement to populate values.
For assets, this was 2024, so wrote the 3way AssetCompute microservice in java and imported assets in right dam folder.

Cons:

AssetCompute is a very expensive operation to run in JVM. Whenever payload exceeds ~>200MB, server was crashing in production.
No good java library to find Content fragments. We did a search to find right content fragment, still its Slow query in production. We have 1.8M cf . all lucene indexing done correctly, still queries are slow and graphql query also.

Attempt2: AIO Journalling

To solve the #1 and #2 problem above, we started running expensive operations outside AEM. My article explains eventing from Workfront. But same topic we kept for Salsify events as well
So an AIO Runtime action will read json from azure blob, converts each record into event and post into Journalling queue
The next action, will subscribe read event, and perform same operations of creating folders/cf/importing asset and updating cf
We used OpenAPI to update. This is more secure and guaranteed.
And ImportFromUrl API to upload assets. This performs same 3way AssetCompute. except runs on adobe cloud, so not overloading AEM
Main benefit was the Journalling Kafka queue. Whenever events get stuck in queue or record fails, we simply push the event back into queue. So auto retry happens after current queue is flushed. We could incident manage in Prod, just blocking queue, and unblocking when we had emergency on AEM server.

Conclusion: We specifically had volume issues. Certain days, job runs for bulk updating 50K records. Or uploading 20K assets. normal day volume is 2000 records. So running ALL-JVM java job was not scalable for us. We switched to Cloud-native outside AEM. So you too smell volume issues with scalability, better go with AIO journaling.

Overall CF with Graphql is pretty successful for us, for 1.8 products multiple catalogs.

PGURUKRISHNA

Level 5

Hey @Meghana_N For your Salsify → Content Fragment migration on AEMaaCS:

Drop the DAM workflow approach. Use a custom Sling Job + servlet instead. Expose a POST endpoint that your middleware (or a Sling scheduler) calls with the Salsify payload. This gives you control over batching, retries, and error handling — things DAM workflows aren't designed for.

For CSV with embedded JSON: Parse CSV with Apache Commons CSV, then deserialize the JSON columns with Jackson. Map Salsify attributes to CF fields via an externalized OSGi config so you're not hardcoding mappings.

For idempotent re-imports: Use the Salsify product ID as the Content Fragment node name. Do a

resolver.getResource(path)

— if it exists, update it; if not, create it. Store an MD5 hash of the incoming payload on the CF and skip writes if unchanged. This alone will save you massive write overhead at scale.

For large payloads: Chunk into batches of 100-500 products, each processed as a separate Sling Job on a dedicated ordered queue (2-4 threads). Use Jackson's streaming parser for large JSON files. Never commit after every single fragment — batch 50-100 per

resolver.commit()

.

For traceability: Store the raw Salsify JSON as a multi-line text field on the Content Fragment itself. Move processed source files to a

/content/dam/salsify-imports/processed/

folder, failed ones to

/failed/

.

For delta syncs: Store a

lastSyncTimestamp

and use Salsify's change log API to fetch only updated products instead of full re-imports every time.

The core pattern is: Servlet receives data → Sling Job queue processes batches → CF API creates/updates fragments → hash comparison prevents unnecessary writes. This scales to 10K+ SKUs reliably on AEMaaCS.

Pagidala GuruKrishna

Current Architecture

New / Evolving Requirement

What I’m Looking For

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded