Hi All,
I am using sling:sitemapRoot property to generate the sitemap and the target root is set to same page /content/brandA/us/en.
However, the sitemap.xml for the two environments are generating different outputs -
https://dev.brandA.com/sitemap.xml
Output -
This XML file does not appear to have any style information associated with it. The document tree is shown below.
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:video="http://www.google.com/schemas/sitemap-video/1.1" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1" xmlns:news="http://www.google.com/schemas/sitemap-news/0.9">
<url>
<loc>https://dev.brandA.com/</loc>
</url>
<url>
<loc>https://dev.brandA.com/support/</loc>
</url>
</urlset>
https://stage.brandA.com/sitemap.xml
Output -
This XML file does not appear to have any style information associated with it. The document tree is shown below.
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml">
<url>
<loc>https://stage.brandA.com/home.html</loc>
</url>
</urlset>
Query 1 - With same configuration, why the output would be different ?
We are targeting extensionless URLs in sitemap.xml which is working fine on DEV but not getting the same output on STAGE.
Any pointers ?
@Gaurav-Behl, @arunpatidar, @B_Sravan , @Mohit_KBansal
Thanks,
Rohan Garg
Solved! Go to Solution.
Hi Gaurav,
The Cloud SDK version for DEV and STAGE were different which caused this issue.
The STAGE version was 2022.1 while DEV was at 2022.9.8722.20220912T101352Z.
Updating the SDK version solved the issue.
Thanks,
Rohan Garg
Hi Rohan_Garg,
I ran into a similar issue in the past and observed that when 'Robots Tags' dropdown under Advanced tab (Page properties) has 'noindex' value selected (and may be a couple of more combinations that I'm unable to recollect at the moment) then it gets filtered out from the sitemap which is correct behavior. You can check the java code to validate its implementation. That said, pls. validate if the missing urls in sitemap has exactly same meta in both environments.
Extensionless urls can be configured by extending org.apache.sling.sitemap.spi.generator.ResourceTreeSitemapGenerator, SitemapLinkExternalizer and/or com.adobe.aem.wcm.seo.sitemap.PageTreeSitemapGenerator based on your use case.
HTH
Hi Gaurav,
Thanks for the quick response. We have not configured any Robots Tags for any of the pages.
Both the environments robots.txt generate same result as below -
#Any search crawler can crawl our site
User-agent: *
#Disallow everything else
Disallow: /
The problem is not with the filtering out with sitemap.xml. The issue is with the extensions of the URL.
My reverse mapping is also configured such that home.html is exposed as /.
Hence DEV exposes /content/brandA/us/en/home.html as https://brandA.com/
The same behavior on STAGE was expected but is not happening.
You had mentioned 2 urls in the DEV sitemap and just 1 url in STG sitemap in your example which is why I thought this is a robots issue.
You mentioned reverse mapping so making an assumption that your current solution is dependent on /etc mappings.
For the STG env. if your mapping works fine for everything except the urls in sitemap.xml then the issue could be with the ranking order of your filters/services and how they get applied. You haven't mentioned what version are you using but a common issue back in the days was how your bundles are resolved - here you're dealing with 3+ bundles, slingresolver for mapping, sitemap impl bundle and your custom code bundle so make sure you enforce the bindings in the correct order.
It is possible that the mapping gets applied after the sitemap extracts the urls hence the long urls in sitemap. You may write custom code to ensure that urls fed to Sitemap generation have the mapping (or the transformation applied) before hand in case OOB implementation doesn't suffice.
You may compare the ranking/execution order in debug logs of Externalizer or LinkTransformer or SitemapLinkExternalizer whatever your solution uses from both env.
A dirty hack - try to bounce the server (or restart bundles) and see if it helps.
Hi Gaurav,
The Cloud SDK version for DEV and STAGE were different which caused this issue.
The STAGE version was 2022.1 while DEV was at 2022.9.8722.20220912T101352Z.
Updating the SDK version solved the issue.
Thanks,
Rohan Garg
Views
Like
Replies