AEM Sitemap Creation (Complete Guide)
AEM Sitemap Creation (Complete Guide)
In this blog, we will understand everything about Sitemaps, why they are important for SEO, and how to implement them in Adobe Experience Manager (AEM) for both AEM On-Premise and AEM as a Cloud Service.
A Sitemap helps search engines identify the list of URLs that are eligible for crawling. When search engines crawl these URLs, the pages can appear in search results.
Typically, sitemaps are accessed using the following URL format:
<site_domain>/sitemap.xml
Example:
https://www.google.com/sitemap.xml
Scroll to the end of the page to see the list of pages applicable for crawling.
What is a Sitemap?
A Sitemap is an XML file that lists all the important pages of a website. It helps search engines understand:
-
Site structure
-
Page relationships
-
Last modified dates
-
Alternate language versions
Search engines such as Google and Bing use sitemaps to efficiently crawl websites.
Sample Sitemap XML Structure
Below is an example sitemap containing the page URL and last modified date.
<urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
xmlns:video="http://www.google.com/schemas/sitemap-video/1.1"
xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"
xmlns:news="http://www.google.com/schemas/sitemap-news/0.9">
<url>
<loc>/content/practice/us/en/things-to-do.html</loc>
<lastmod>2023-09-11T13:01:03.478Z</lastmod>
</url>
<url>
<loc>/content/practice/us/en/things-to-do/all-experiences.html</loc>
<lastmod>2023-09-11T13:01:04.38Z</lastmod>
</url>
</urlset>
Key Benefits of Sitemap
-
Helps search engines crawl pages faster
-
Improves SEO discoverability
-
Shows last updated content
-
Helps large websites manage crawling
-
Supports multilingual alternate links
Types of Sitemap in AEM
In AEM, we generally implement two types:
| Sitemap Type | Description |
|---|---|
| Author Sitemap | Generated on-demand mainly for testing |
| Publish Sitemap | Generated using a scheduler and served to search engines |
Author Sitemap Implementation
To generate a sitemap on the Author instance, configure the following OSGi configuration.
Create the config file under:
config.author
OSGI Configuration
org.apache.sling.sitemap.impl.SitemapGeneratorManagerImpl~practice.cfg.json
{
"allOnDemand": true
}
Important Note
The allOnDemand configuration has a drawback.
Each time the sitemap URL is accessed, AEM generates the sitemap dynamically. This can impact performance if used in production environments.
Therefore, this configuration should only be used in Author environments.
Common Configuration for Author and Publish
The following configuration enables:
-
Last modified date
-
XML sitemap generation
Configuration file:
com.adobe.aem.wcm.seo.impl.sitemap.PageTreeSitemapGeneratorImpl.cfg.json
{
"enableLastModified": true,
"lastModifiedSource": "cq:lastModified",
"enableLanguageAlternates": false
}
Optional Recommended Properties
You may also include:
includeInheritValue = true
excludePagesWithNoIndex = true
This helps exclude pages with noindex robot tags automatically.
Page Properties Configuration
To enable sitemap generation, enable the Generate Sitemap option in the root page properties.
Steps:
-
Open root page
-
Go to Page Properties
-
Navigate to Advanced Tab
-
Enable Generate Sitemap
This allows AEM to treat the page as a sitemap root.
Excluding Pages from Sitemap
If certain pages should not be indexed by search engines, configure Robot Tags.
Example robot tags:
-
noindex -
nofollow -
noarchive
These tags prevent search engines from indexing or following the page.
Generate Sitemap on Author
Once configuration is completed, the sitemap can be generated using the following URL:
http://localhost:4502/content/practice/en.sitemap.xml
Publish Sitemap Implementation
Unlike Author, the Publish environment should not use allOnDemand because generating sitemaps dynamically is inefficient.
Instead, use a scheduled job to generate sitemaps periodically.
Sitemap Scheduler Configuration
Create the following configuration in:
config.publish
File:
org.apache.sling.sitemap.impl.SitemapScheduler~practice.cfg.json
{
"scheduler.name": "Practice Daily Sitemap Scheduler",
"scheduler.expression": "0 0 2 1/1 * ? *",
"searchPath": "/content/practice/us"
}
Scheduler Explanation
This cron expression means:
-
Runs daily
-
Executes at 2:00 AM
Sitemap Storage Location
When the scheduler runs, AEM generates sitemap files inside:
/var/sitemaps
Example structure:
/var/sitemaps/content/practice/us/sitemap.xml
Sitemap Servlet Configuration
Another required configuration is the Sitemap Servlet.
org.apache.sling.sitemap.impl.SitemapServlet~practice.cfg.json
{
"sling.servlet.extensions": "xml",
"sling.servlet.resourceTypes": [
"practice/components/structure/homepage",
"practice/components/structure/profile",
"practice/components/ea/structure/search"
],
"sling.servlet.selectors": [
"sitemap",
"sitemap-index"
]
}
This configuration allows the sitemap servlet to respond to requests like:
/content/practice/en.sitemap.xml
Generate Sitemap on Publish
Once configurations are completed:
-
Enable sitemap on the root page
-
Publish the page
-
Access sitemap via
http://localhost:4503/content/practice/en.sitemap.xml
Dispatcher Configuration
To allow sitemap requests through the Dispatcher, update the following configuration.
1. Dispatcher Filter
File:
dispatcher/src/conf.dispatcher.d/filters/filters.any
Add the following rule:
/0200 {
/type "allow"
/path "/content/*"
/selectors "(sitemap-index|sitemap)"
/extension "xml"
}
2. Rewrite Rule
File:
dispatcher/src/conf.d/rewrites/rewrite.rules
Update rule:
RewriteCond %{REQUEST_URI} (.html|.jpe?g|.png|.svg|.xml)$
This ensures .xml files are allowed through the dispatcher.
Additional Best Practices (Important)
1. Use Sitemap Index for Large Sites
If your website has more than 50,000 URLs, create multiple sitemaps and reference them in a sitemap index.
Example:
sitemap-index.xml
2. Add Sitemap in robots.txt
Example:
Sitemap: https://example.com/sitemap.xml
This helps search engines discover the sitemap faster.
3. Submit Sitemap to Search Engines
Submit your sitemap to:
-
Google Search Console
-
Bing Webmaster Tools
4. Cache Sitemap in Dispatcher
To improve performance, allow caching of sitemap files in the dispatcher cache.
5. Use Canonical URLs
Ensure that the URLs generated in sitemap match your canonical URLs to avoid duplicate indexing.
Final Sitemap URL
Your final sitemap URL should look like:
https://www.example.com/sitemap.xml
or
https://www.example.com/sitemap-index.xml
Conclusion
Sitemaps play a crucial role in improving website SEO and ensuring search engines can efficiently crawl your content.
In this guide, we covered:
-
Sitemap fundamentals
-
Author and Publish implementation
-
Required OSGi configurations
-
Dispatcher updates
-
Scheduler setup
-
SEO best practices
Implementing sitemaps correctly in AEM ensures better indexing and improved search visibility.
If you found this article helpful, feel free to share it and help others learn about AEM Sitemap implementation.
Advanced Sitemap Topics in AEM
1. Dynamic Sitemap Generation using Sling Models
In some cases, websites contain dynamic pages generated from external systems, APIs, or Content Fragments. These pages may not exist as traditional AEM pages under /content. In such scenarios, you may need to generate sitemap entries dynamically.
A common approach is to use Apache Sling models or custom services to extend the default sitemap generator.
Steps
-
Create a custom sitemap generator service.
-
Implement the
SitemapGeneratorinterface. -
Register the service using OSGi.
-
Add dynamic URLs programmatically.
Example Custom Sitemap Generator
@Component(
service = SitemapGenerator.class,
property = {
SitemapGenerator.PROPERTY_NAME + "=custom-dynamic-sitemap"
}
)
public class CustomSitemapGenerator implements SitemapGenerator {
@Override
public void generate(@NotNull Resource resource, @NotNull Sitemap sitemap) {
String dynamicUrl = "https://example.com/products/sample-product";
Url url = sitemap.addUrl(dynamicUrl);
url.setLastModified(Instant.now());
}
}
This approach is useful when generating sitemap entries for:
-
Product pages
-
Search results pages
-
Content Fragment pages
-
Headless API driven pages
2. Multisite (MSM) Sitemap Strategy
Many enterprise AEM implementations use multi-site or multi-language architectures. Each site should typically have its own sitemap.
For example:
/content/practice/us
/content/practice/uk
/content/practice/in
Each site should generate a separate sitemap:
/content/practice/us.sitemap.xml
/content/practice/uk.sitemap.xml
/content/practice/in.sitemap.xml
You can also generate a global sitemap index referencing all regional sitemaps.
Example Sitemap Index
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/us.sitemap.xml</loc>
</sitemap>
<sitemap>
<loc>https://example.com/uk.sitemap.xml</loc>
</sitemap>
</sitemapindex>
This approach helps search engines efficiently crawl multi-region sites.
3. Language Alternate Support (hreflang)
For multilingual websites, sitemaps can include language alternate references using hreflang.
These references help search engines understand the relationship between translated pages.
Example:
<url>
<loc>https://example.com/us/en/home.html</loc>
<xhtml:link
rel="alternate"
hreflang="en-us"
href="https://example.com/us/en/home.html"/>
<xhtml:link
rel="alternate"
hreflang="fr-fr"
href="https://example.com/fr/fr/home.html"/>
</url>
This improves international SEO and ensures the correct language page appears in search results.
4. Sitemap Implementation in AEM as a Cloud Service
When working with Adobe Experience Manager as a Cloud Service, sitemap generation follows the same architecture but with a few considerations:
Key Differences
| Area | Cloud Service |
|---|---|
| Configuration | Managed via Git repository |
| Deployment | Through CI/CD pipeline |
| Scheduler | Runs in publish tier |
| Storage | Generated under /var/sitemaps |
Best Practices
-
Always use scheduled sitemap generation
-
Avoid on-demand generation
-
Keep sitemap generation limited to root site paths
-
Cache sitemap files via dispatcher
5. Handling Large Websites (Sitemap Splitting)
Search engines impose limits on sitemap files:
| Limit | Value |
|---|---|
| Maximum URLs | 50,000 |
| Maximum file size | 50 MB |
If a site exceeds these limits, AEM automatically generates multiple sitemaps and a sitemap index file.
Example:
sitemap-index.xml
sitemap-1.xml
sitemap-2.xml
sitemap-3.xml
6. Verifying Sitemap Using Search Tools
After deployment, verify your sitemap using:
-
Google Search Console
-
Bing Webmaster Tools
Steps:
-
Open Search Console
-
Navigate to Sitemaps
-
Submit your sitemap URL
Example:
https://example.com/sitemap.xml
This allows search engines to crawl your site more efficiently.
7. Common Sitemap Issues and Troubleshooting
| Issue | Cause | Solution |
|---|---|---|
| Sitemap not loading | Dispatcher blocked XML | Update dispatcher filters |
| Pages missing | Sitemap not enabled on root page | Enable sitemap in page properties |
| Incorrect URLs | Improper domain mapping | Configure externalizer |
| Large sitemap delay | On-demand generation | Use scheduled generation |
Final Recommended Sitemap Architecture
For a production AEM site:
sitemap-index.xml
├── us-sitemap.xml
├── uk-sitemap.xml
├── in-sitemap.xml
Each sitemap should contain pages belonging to that site only.
Final Thoughts
Implementing sitemaps correctly in Adobe Experience Manager improves SEO, discoverability, and crawl efficiency.
A production-ready sitemap implementation should include:
-
Scheduler-based generation
-
Dispatcher caching
-
Multisite support
-
hreflang alternates
-
Search console verification
Following these best practices ensures your AEM site is fully optimized for search engine indexing.