Expand my Community achievements bar.

Best Approach for Sitemap Generation in AEM Headless CMS Using Node.js/React

Avatar

Level 2

Hi Adobe Community,

We are currently transitioning from traditional AEM Sites to a headless architecture where AEM serves as the CMS, and the frontend is built using React with a Node.js backend.

In this setup, we're using Content Fragments delivered via AEM’s Graph QL APIs and no longer rely on traditional jcr:content-based pages.

I have the following questions regarding sitemap generation in this headless architecture:

  1. Since there are no traditional pages, is it still possible to generate a sitemap using content fragment paths?

  2. What is the recommended best practice for sitemap generation in a headless AEM setup?

    • Should the sitemap be generated within AEM (using a custom service)?

    • Or is it better to handle sitemap generation externally via Node.js by consuming AEM’s Graph QL APIs?

  3. How should we handle alternate language URLs (href lang tags) in this architecture?

  4. Are there any Adobe-recommended tools, frameworks, or approaches for generating XML sitemaps in a headless setup?

5 Replies

Avatar

Community Advisor

HI @Abdul_Wajeed 

 

If you have page information in AEM GraphQL data, write a custom AEM service to query your required content fragments or structured data and dynamically build your sitemap

 

In Node.js tools like sitemap a Node.js library, can generate XML files from data fetched via REST APIs or GraphQL.

 

Hope this helps

 

Thanks

Avatar

Level 2

Hi @PRATHYUSHA_VP ,

 

Thank you for the detailed suggestion.

You're absolutely right—if the page data is available via AEM GraphQL, generating the sitemap dynamically makes sense. However, in our current setup, Node.js is used only as a lightweight wrapper layer and not as a full backend service.

Given this constraint, implementing sitemap generation purely within the Node.js layer isn’t feasible or optimal, as it doesn't handle business logic or data processing responsibilities.

 

Appreciate your insights—please let me know if there are any best practices around implementing this directly within AEM.

Avatar

Level 4

You can generate the sitemap either inside AEM by creating a custom OSGi service or Sling servlet that queries relevant Content Fragments via JCR or GraphQL and outputs URLs mapped to the frontend; this approach centralises sitemap generation within AEM—the content source of truth—leveraging its caching and dispatch infrastructure and allowing inclusion of metadata like last modified dates, though it requires custom development and careful synchronisation with frontend URL structure. Alternatively, you can generate the sitemap externally in the Node.js backend by querying AEM’s GraphQL APIs for published fragments, constructing URLs based on the React frontend routing, and producing the sitemap XML dynamically or during build time; this method aligns sitemap URLs closely with frontend logic, integrates smoothly into deployment pipelines, and supports static generation but introduces API querying overhead and necessitates managing rate limits and performance.

Avatar

Level 2

Hi @ButhpurKiran ,

 

Thank you for the comprehensive explanation and the thoughtful comparison of the two approaches.

In our case, Node.js is not functioning as a full backend service—it acts solely as a lightweight wrapper layer to render and serve the React application. It doesn't handle business logic, API orchestration, or data processing. Because of this, generating the sitemap in the Node.js layer (either dynamically or during build time) isn't a viable option in our architecture.

Given that AEM is our system of record and already manages content via GraphQL and JCR, we're leaning toward implementing sitemap generation directly within AEM. We are considering a custom OSGi service or Sling servlet that:

  • Queries the relevant Content Fragments via Graph QL,

  • Constructs sitemap URLs based on the actual frontend routing structure,

  • Embeds metadata like lastmod and changefreq, and

  • Leverages AEM's built-in caching and dispatcher layers for performance and scalability.

This approach keeps sitemap generation close to the content source, ensures consistency, and avoids unnecessary overhead on the wrapper layer.

Appreciate your insights—this helps in validating our direction and ensuring alignment with AEM best practices.

Avatar

Administrator

@Abdul_Wajeed Just checking in — were you able to resolve your issue?
We’d love to hear how things worked out. If the suggestions above helped, marking a response as correct can guide others with similar questions. And if you found another solution, feel free to share it — your insights could really benefit the community. Thanks again for being part of the conversation!



Kautuk Sahni