Expand my Community achievements bar.

Guidelines for the Responsible Use of Generative AI in the Experience Cloud Community.

AEM as a Cloud Service sitemap subdomains

Avatar

Level 3

Hi AEM Community

 

I have the following scenario

 

Brief description:

Customer has AEM as a Cloud Service with multiple subdomains configured.

 

The sitemap feature is enabled and also a customized class (MySiteMap.java) that extend the ResourceTreeSitemapGenerator abstract class from Sling.

 

One of the responsibility of the MySiteMap.java class is to remove the www. from the subdomains via code through custom regular expression

 

Customer has the following domain and subdomains configure:

1 - www.mycompany.com 

2 - ensurance.mycompany.com

3 - supplychain.mycompany.com

 

Problem:

When the sitemaps are generated for these multiple domains the output produced is the following

 

for: 1 - www.mycompany.com  

 

<url>
<lastmod>YYYY-MM-DDT HH:MM:SS</lastmod>
<priority>1.0</priority>
</url>

 

for: 2 - ensurance.mycompany.com

 

<url>
<loc>https://www.ensurance.mycompany.com</loc>
<lastmod>YYYY-MM-DDT HH:MM:SS</lastmod>
<priority>1.0</priority>
</url>

 

for: 3 -supplychain.mycompany.com

 

<url>
<loc>https://www.supplychain.mycompany.com</loc>
<lastmod>YYYY-MM-DDT HH:MM:SS</lastmod>
<priority>1.0</priority>
</url>
 
Note that for 2 and 3 the substring www. is being added
 
Ask:
from where this www. is coming from ?
Is possible to remove the www. substring through AEM configuration instead of do via code as mentioned early ?
 
Thanks community
 

 

7 Replies

Avatar

Employee

Hi arunpatidar

 

I have 2 questions though 

 

1 - 

You think if configure a regular expression to remove the www. for the 2 subdomains mentioned previously  for  /etc/map/ensurance.mycompany.com and /etc/map/supplychain.mycompany.com would resolve the issue on the sitemap generated ? printing our the expected  

 

for: 2 - ensurance.mycompany.com

 

<url>
<lastmod>YYYY-MM-DDT HH:MM:SS</lastmod>
<priority>1.0</priority>
</url>

 

for: 3 -supplychain.mycompany.com

 

<url>
<lastmod>YYYY-MM-DDT HH:MM:SS</lastmod>
<priority>1.0</priority>
</url>
 
2 - From where the www. substring is coming from? regardless any setup of configuration done

Avatar

Community Advisor

Hi,

I believe its is coming from some configuration, can you check if your implementation adding those

example

https://github.com/arunpatidar02/aemaacs-aemlab/blob/f96ce5316dfa4798c72d2e87d3a0b41fc49791a4/core/s... 



Arun Patidar

Avatar

Administrator

@_Clodo_  Did you find the suggestions from Arun helpful? Please let us know if more information is required. Otherwise, please mark the answer as correct for posterity. If you have found out solution yourself, please share it with the community.



Kautuk Sahni

Avatar

Level 3

Hi @arunpatidar 

 

Here is the structure of the /etc/map from the scenario described previously.

 

/etc/map

/etc/map/www.mycompany.com 

------>sling:match = (.+)$

 

/etc/map/ensurance.mycompany.com

------>sling:match = (.+)$

 

/etc/map/supplychain.mycompany.com

------>sling:match = (.+)$

 

 

Avatar

Administrator

@_Clodo_ 

Not Tried, but can you try:

To remove the "www." from subdomain URLs, you can configure "Apache Sling Sitemap - Site Configuration" OSGi configuration to use a different hostname for each subdomain:

ensurance.mycompany.com = https://ensurance.mycompany.com
supplychain.mycompany.com = https://supplychain.mycompany.com
Reference link: https://www.theaemmaven.com/post/aem-apache-sling-sitemap Or https://www.tothenew.com/blog/exploring-apache-sling-sitemap-generator-with-customization-in-aem/ 
 


Kautuk Sahni