Expand my Community achievements bar.

Get ready! An upgraded Experience League Community experience is coming in January.

I want to add pdf urls (pdf are present inside my dam folder) in my sitemap xml

Avatar

Level 1

Hi,

I have a use case, I want to add pdf urls in my sitemap.xml. The pdfs are present inside my dam folder as asset. Please help me with the a working approach for this. For reference, I am working on aem as a cloud service.

Topics

Topics help categorize Community content and increase your ability to discover relevant content.

3 Replies

Avatar

Level 2

Hi @YashSi7 ,

Adding PDF URLs from DAM to your sitemap.xml in AEM is doable, but it requires customizing the sitemap generation logic because the default AEM Sitemap feature typically includes only page URLs.
  1. Implement a custom SitemapGenerator service that extends com.adobe.cq.wcm.core.components.sitemap.SitemapGenerator. Query or fetch all pdf's path from static location and render. 
  2. If your using ACS commons, explore https://adobe-consulting-services.github.io/acs-aem-commons/features/sitemap/index.html

Avatar

Level 10

hi @YashSi7

ACS AEM Commons’ SiteMapServlet can be configured to include assets from specific DAM folders, filtered by MIME type:

  • Install ACS AEM Commons compatible with AEM as a Cloud Service and enable the Sitemap feature.

  • On your site’s root page (or another config page), add a page property that holds one or more DAM folder paths to include in the sitemap (for example /content/dam/my-site/pdfs). The property name must match the servlet’s damassets.property OSGi configuration.

  • In the OSGi configuration for com.adobe.acs.commons.wcm.impl.SiteMapServlet, set:

    • damassets.property to the page property name you chose (for example damSitemapFolders).

    • damassets.types to application/pdf (and any other asset MIME types you want indexed).

  • Access your sitemap (for example /content/my-site/en.sitemap.xml) and you will see <url> entries for each PDF asset in the configured folders with their published URLs.

This approach is good if you already rely on ACS AEM Commons or want a configuration‑driven solution with minimal custom code.

Another approach is to create a new sitemap serving another URL, which you can integrate into your sitemap-index.xml:

package com.mysite.core;

import com.day.cq.commons.Externalizer;
import com.day.cq.dam.api.Asset;
import org.apache.sling.api.resource.Resource;
import org.apache.sling.api.resource.ResourceResolver;
import org.apache.sling.sitemap.SitemapException;
import org.apache.sling.sitemap.builder.Sitemap;
import org.apache.sling.sitemap.builder.Url;
import org.apache.sling.sitemap.spi.generator.SitemapGenerator;
import org.osgi.service.component.annotations.Component;
import org.osgi.service.component.annotations.Reference;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import javax.jcr.query.Query;
import java.util.Arrays;
import java.util.Calendar;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;

@Component(
        service = SitemapGenerator.class,
        property = {
                "service.ranking:Integer=90000",
        }
)
public class PDFSitemapGenerator implements SitemapGenerator {

    private static final Logger LOG = LoggerFactory.getLogger(PDFSitemapGenerator.class);

    // Configure your DAM path
    private static final String DAM_ROOT_PATH = "/content/dam/wknd";

    @Reference
    private Externalizer externalizer;

    @Override
    public void generate(Resource sitemapRoot, String name, Sitemap sitemap, Context context)
            throws SitemapException {

        ResourceResolver resolver = sitemapRoot.getResourceResolver();

        try {
            // Query to find all PDF assets in DAM
            String query = "SELECT * FROM [dam:Asset] AS asset " +
                    "WHERE ISDESCENDANTNODE(asset, '" + DAM_ROOT_PATH + "') " +
                    "AND [jcr:content/metadata/dc:format] = 'application/pdf'";

            Iterator<Resource> pdfResources = resolver.findResources(query, Query.JCR_SQL2);

            while (pdfResources.hasNext()) {
                Resource pdfResource = pdfResources.next();
                Asset asset = pdfResource.adaptTo(Asset.class);

                if (asset != null && shouldIncludeInSitemap(asset)) {
                    addPdfToSitemap(sitemap, asset, resolver);
                }
            }

        } catch (Exception e) {
            LOG.error("Error generating PDF sitemap entries", e);
            throw new SitemapException("Failed to generate PDF sitemap", e);
        }
    }

    private void addPdfToSitemap(Sitemap sitemap, Asset asset, ResourceResolver resolver)
            throws SitemapException {

        String assetPath = asset.getPath();

        // Externalize the URL (converts internal path to external URL)
        String externalUrl = externalizer.publishLink(resolver, assetPath);

        // Get last modified date
        long lastModifiedMillis = asset.getLastModified();
        Calendar lastModified = null;
        if (lastModifiedMillis > 0) {
            lastModified = Calendar.getInstance();
            lastModified.setTimeInMillis(lastModifiedMillis);
        }

        // Add to sitemap with metadata
        sitemap.addUrl(externalUrl)
                .setLastModified(lastModified != null ? lastModified.toInstant() : null)
                .setChangeFrequency(Url.ChangeFrequency.MONTHLY)
                .setPriority(0.5);


        LOG.debug("Added PDF to sitemap: {}", externalUrl);
    }

    private boolean shouldIncludeInSitemap(Asset asset) {
        // Add your business logic here to filter PDFs
        // For example, check metadata, activation status, etc.

        // Check if asset is activated/published
        Resource metadataResource = asset.adaptTo(Resource.class)
                .getChild("jcr:content/metadata");

        if (metadataResource != null) {
            // Example: Check for a custom property
            Boolean includeInSitemap = metadataResource.getValueMap()
                    .get("includeInSitemap", Boolean.class);

            if (includeInSitemap != null && !includeInSitemap) {
                return false;
            }
        }

        return true;
    }

    @Override
    public Set<String> getNames(Resource sitemapRoot) {
        return new HashSet<>(Arrays.asList("pdfs"));
    }

    @Override
    public Set<String> getOnDemandNames(Resource sitemapRoot) {
        return getNames(sitemapRoot);
    }
}

You can see the output at the URL http://localhost:450X/content/wknd/us.sitemap.pdfs-sitemap.xml.

You have also to select the AEM Site tree for which the sitemap needs to be generated, as seen in the image below:

giuseppebaglio_0-1765992879671.png

Enable option "All on-demand" in OSGI configuration "Apache Sling Sitemap - Sitemap Generator Manager" - this is useful for testing purposes locally.

Have a look at this page for more info about a Sitemap scheduler via OSGi configuration: https://experienceleague.adobe.com/en/docs/experience-manager-learn/sites/seo/sitemaps#sitemap-sched... 

Avatar

Level 3

@YashSi7 - PDFs from DAM can be added to the sitemap either by extending sitemap generation with a custom service or using ACS AEM Commons’ sitemap configuration. Both approaches work in AEM Cloud Service, but a custom sitemap generator is generally recommended for better control over filtering, URL handling, and long-term maintainability