Expand my Community achievements bar.

SOLVED

AEM 6.5 sitemap remove page extension (.html)

Avatar

Level 3

Hi AEM Community

 

How can the .html extension be removed from the sitemap.xml generated ?

 

Is there a out of the box feature to support such ?

or this must be customizable ? any guidance to share? 

1 Accepted Solution

Avatar

Correct answer by
Community Advisor

Hi @_Clodo_ 

 

As mentioned by others, this is not possible if you are using the AEM WCM Core Components Sitemap feature, in that scenario you will need to extend and customize to remove the extension, here is an example of that. Hope it helps https://www.theaemmaven.com/post/aem-apache-sling-sitemap 

 

 



Esteban Bustamante

View solution in original post

15 Replies

Avatar

Community Advisor

Hello @_Clodo_ ,

As you might know there are a few ways that Adobe provide and depending on which one are you using the answer might differ.

For example: ACS Commons offer a sitemap generator (now deprecated). Refer link [1] below.

Look for the property:

extensionless.urls

   

This property allows the AEM developer/administrator to choose if they would like to generate pages in the sitemap with or without extension via a simple configuration. This does not call for code customization.

 

Adobe recommended approach is [2]. Haven't seen a similar option for that though. May be its there in the latest version of that module. If its not, then that means customization.

 

So, again, it really depends on which solution is being used/planning to be pursued.

 

[1]: https://adobe-consulting-services.github.io/acs-aem-commons/features/sitemap/index.html 

[2]: https://experienceleague.adobe.com/docs/experience-manager-learn/sites/seo/sitemaps.html?lang=en#abs... 

 

thanks,

Preetpal

Avatar

Level 3

Hi Preetpal

 

Yes my team is using the approach 2 ( https://experienceleague.adobe.com/docs/experience-manager-learn/sites/seo/sitemaps.html ), and seems we need to customize the piece to remove the file extension from the sitemap.xml

Avatar

Community Advisor

hi @_Clodo_  are you using acs commons sitemap generater https://adobe-consulting-services.github.io/acs-aem-commons/features/sitemap/index.html or one offered by Adobe core ? - https://experienceleague.adobe.com/docs/experience-manager-learn/sites/seo/sitemaps.html

 

In Acs commons one I see below settings which could be useful to you :

Since v3.14.0

  • extensionless.urls This property controls whether page links included in sitemap should be generated with or without .html extension. If not specified or specified as false (default), page links will end with .html. If specified as true, path is included with a trailing slash, e.g. /content/geometrixx/en/

Avatar

Correct answer by
Community Advisor

Hi @_Clodo_ 

 

As mentioned by others, this is not possible if you are using the AEM WCM Core Components Sitemap feature, in that scenario you will need to extend and customize to remove the extension, here is an example of that. Hope it helps https://www.theaemmaven.com/post/aem-apache-sling-sitemap 

 

 



Esteban Bustamante

Avatar

Employee

Hello, could you give us a hint about where we should create the new class in the repository? @EstebanBustamante 

 

 

Avatar

Community Advisor

@ffriaslopez Sorry I am not sure about your question, are you asking where you need to implement a new custom sitemap in your code base? 



Esteban Bustamante

Avatar

Community Advisor

Hello @_Clodo_  - 

 

Unfortunately, there is no out-of-the-box (OOTB) solution in AEM to remove the ".html" extension from the URLs in the sitemap.xml.

 

However, here are two solution approaches that you may consider : 

 

FIRST

 

  • In AEM, you can enable URL rewriting by configuring the Apache Sling Rewriter. This can be done through the Apache Sling Rewriter configuration files, such as org.apache.sling.rewriter.impl.HtmlParserFactory and org.apache.sling.rewriter.impl.RewriterTransformerFactory.
  • Once URL rewriting is enabled, you can define a rewrite rule to remove the ".html" extension from the URLs. This can be done using regular expressions or a similar pattern matching mechanism. The rewrite rule should target the sitemap.xml URLs specifically and remove the ".html" extension from them.
  • After setting up the rewrite rule, test to ensure that it correctly removes the ".html" extension from the sitemap.xml URLs.

 

SECOND

 

  • You'll need to create a custom component that extends the default SitemapGenerator component in AEM. This component will be responsible for generating the sitemap.xml file.
  • In your custom SitemapGenerator component, modify the logic for generating URLs to exclude the ".html" extension. This might involve using AEM's resource resolver or rewriting mechanisms to generate clean URLs without the extension.
  • Update the AEM configuration to use your custom SitemapGenerator component instead of the default one.
  • Generate & validate the sitemap.xml.

 

Avatar

Community Advisor

Hello @_Clodo_ I'm curious to know why does the business team want to strip the extension from the pages in the sitemap? 

How do they plan to handle incoming page requests that could be sourced from the sitemap, lets say, when Google or other search bot picks from the sitemap? OR when the company's internal search engine looks at the sitemap to index pages?

How would the incoming requests without the extension be mapped to AEM pages with extension? 

 

Pardon my ignorance if its just me wondering about this. And I wish there was a 1-to-1 chat option but there isn't one to my knowledge, I would've definitely avoided spamming and sorry for the multiple edits.

 

thx,

Preetpal

Avatar

Level 3

Hi Preetpal

 

The customer url pages are all rendered without .html pages today, its SPA react in AEM and all the links are like

www.customerdomain.com/home    and www.customerdomain.com/products/creditcard 

 

Reason why the requirement is without the .html on the sitemap.xml

 

My team found an example to extend the abstract class ResourceTreeSitemapGenerator from Sling so we need customize this part , lets see