AEM 6.5 sitemap remove page extension (.html) | Community
Skip to main content
Clodo
Adobe Employee
Adobe Employee
July 6, 2023
Solved

AEM 6.5 sitemap remove page extension (.html)

  • July 6, 2023
  • 6 replies
  • 4377 views

Hi AEM Community

 

How can the .html extension be removed from the sitemap.xml generated ?

 

Is there a out of the box feature to support such ?

or this must be customizable ? any guidance to share? 

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.
Best answer by EstebanBustamante

Hi @clodo 

 

As mentioned by others, this is not possible if you are using the AEM WCM Core Components Sitemap feature, in that scenario you will need to extend and customize to remove the extension, here is an example of that. Hope it helps https://www.theaemmaven.com/post/aem-apache-sling-sitemap 

 

 

6 replies

Preetpal_Bindra
Community Advisor
Community Advisor
July 6, 2023

Hello @clodo ,

As you might know there are a few ways that Adobe provide and depending on which one are you using the answer might differ.

For example: ACS Commons offer a sitemap generator (now deprecated). Refer link [1] below.

Look for the property:

extensionless.urls

   

This property allows the AEM developer/administrator to choose if they would like to generate pages in the sitemap with or without extension via a simple configuration. This does not call for code customization.

 

Adobe recommended approach is [2]. Haven't seen a similar option for that though. May be its there in the latest version of that module. If its not, then that means customization.

 

So, again, it really depends on which solution is being used/planning to be pursued.

 

[1]: https://adobe-consulting-services.github.io/acs-aem-commons/features/sitemap/index.html 

[2]: https://experienceleague.adobe.com/docs/experience-manager-learn/sites/seo/sitemaps.html?lang=en#absolute-sitemap-urls 

 

thanks,

Preetpal

Clodo
Adobe Employee
ClodoAdobe EmployeeAuthor
Adobe Employee
July 6, 2023

Hi Preetpal

 

Yes my team is using the approach 2 ( https://experienceleague.adobe.com/docs/experience-manager-learn/sites/seo/sitemaps.html ), and seems we need to customize the piece to remove the file extension from the sitemap.xml

rawvarun
Community Advisor
Community Advisor
July 6, 2023
Clodo
Adobe Employee
ClodoAdobe EmployeeAuthor
Adobe Employee
July 6, 2023

Thank you Rawvarun

really appreciate that

Pallavi_Shukla_
Community Advisor
Community Advisor
July 6, 2023

hi @clodo  are you using acs commons sitemap generater https://adobe-consulting-services.github.io/acs-aem-commons/features/sitemap/index.html or one offered by Adobe core ? - https://experienceleague.adobe.com/docs/experience-manager-learn/sites/seo/sitemaps.html

 

In Acs commons one I see below settings which could be useful to you :

Since v3.14.0

  • extensionless.urls This property controls whether page links included in sitemap should be generated with or without .html extension. If not specified or specified as false (default), page links will end with .html. If specified as true, path is included with a trailing slash, e.g. /content/geometrixx/en/
Clodo
Adobe Employee
ClodoAdobe EmployeeAuthor
Adobe Employee
July 6, 2023
EstebanBustamante
Community Advisor and Adobe Champion
EstebanBustamanteCommunity Advisor and Adobe ChampionAccepted solution
Community Advisor and Adobe Champion
July 6, 2023

Hi @clodo 

 

As mentioned by others, this is not possible if you are using the AEM WCM Core Components Sitemap feature, in that scenario you will need to extend and customize to remove the extension, here is an example of that. Hope it helps https://www.theaemmaven.com/post/aem-apache-sling-sitemap 

 

 

Esteban Bustamante
Clodo
Adobe Employee
ClodoAdobe EmployeeAuthor
Adobe Employee
July 7, 2023

Thank you Esteban 

Tanika02
Level 7
July 6, 2023

Hello @clodo  - 

 

Unfortunately, there is no out-of-the-box (OOTB) solution in AEM to remove the ".html" extension from the URLs in the sitemap.xml.

 

However, here are two solution approaches that you may consider : 

 

FIRST

 

  • In AEM, you can enable URL rewriting by configuring the Apache Sling Rewriter. This can be done through the Apache Sling Rewriter configuration files, such as org.apache.sling.rewriter.impl.HtmlParserFactory and org.apache.sling.rewriter.impl.RewriterTransformerFactory.
  • Once URL rewriting is enabled, you can define a rewrite rule to remove the ".html" extension from the URLs. This can be done using regular expressions or a similar pattern matching mechanism. The rewrite rule should target the sitemap.xml URLs specifically and remove the ".html" extension from them.
  • After setting up the rewrite rule, test to ensure that it correctly removes the ".html" extension from the sitemap.xml URLs.

 

SECOND

 

  • You'll need to create a custom component that extends the default SitemapGenerator component in AEM. This component will be responsible for generating the sitemap.xml file.
  • In your custom SitemapGenerator component, modify the logic for generating URLs to exclude the ".html" extension. This might involve using AEM's resource resolver or rewriting mechanisms to generate clean URLs without the extension.
  • Update the AEM configuration to use your custom SitemapGenerator component instead of the default one.
  • Generate & validate the sitemap.xml.

 

Clodo
Adobe Employee
ClodoAdobe EmployeeAuthor
Adobe Employee
July 7, 2023

Hi Tanika

 

this it really helps 

Preetpal_Bindra
Community Advisor
Community Advisor
July 7, 2023

Hello @clodo I'm curious to know why does the business team want to strip the extension from the pages in the sitemap? 

How do they plan to handle incoming page requests that could be sourced from the sitemap, lets say, when Google or other search bot picks from the sitemap? OR when the company's internal search engine looks at the sitemap to index pages?

How would the incoming requests without the extension be mapped to AEM pages with extension? 

 

Pardon my ignorance if its just me wondering about this. And I wish there was a 1-to-1 chat option but there isn't one to my knowledge, I would've definitely avoided spamming 😞 and sorry for the multiple edits. 🙂

 

thx,

Preetpal

Clodo
Adobe Employee
ClodoAdobe EmployeeAuthor
Adobe Employee
July 8, 2023

Hi Preetpal

 

The customer url pages are all rendered without .html pages today, its SPA react in AEM and all the links are like

www.customerdomain.com/home    and www.customerdomain.com/products/creditcard 

 

Reason why the requirement is without the .html on the sitemap.xml

 

My team found an example to extend the abstract class ResourceTreeSitemapGenerator from Sling so we need customize this part , lets see