Sitemap (SEO) best practices | Community
Skip to main content
gmisura
December 14, 2016

Sitemap (SEO) best practices

  • December 14, 2016
  • 2 replies
  • 7657 views

Per: https://docs.adobe.com/docs/en/aem/6-1/manage/seo-and-url-management.html

"To programmatically generate a sitemap, register a Sling Servlet listening for a sitemap.xml call. The servlet can then use the resource provided via the servlet API to look at the current page and its children, outputting XML."

I believe this is talking about mysite.com/sitemap.xml (or whatever other specific sitemap XML file I want to use). Someone else thinks that is supposed to mean that every page should be able to change from

mysite.com/home.html to mysite.com/home.sitemap.xml

And then you can see the sitemap for the entire site.

That makes no sense to me. What is really meant by the quote?

This post is no longer active and is closed to new replies. Need help? Start a new post to ask your question.

2 replies

kautuk_sahni
Community Manager
Community Manager
December 15, 2016

Hi

I have asked the documentation team have a look at this.

Thanks and regards

Kautuk Sahni

Kautuk Sahni
kautuk_sahni
Community Manager
Community Manager
December 19, 2016

Hi

Ian Reasor is our internal expert. 

I have again asked him to get back to you soon on this.

~kautuk

Kautuk Sahni
ireasor-adobe
Adobe Employee
Adobe Employee
December 15, 2016

"Someone else thinks that is supposed to mean that every page should be able to change from mysite.com/home.html to mysite.com/home.sitemap.xml And then you can see the sitemap for the entire site."

This is accurate.  You can register a Sling Servlet to listen for the selector 'sitemap' with the extension 'xml'.  This will cause the servlet to process the request any time a URL is requested that ends in /path/to/page.sitemap.xml.  You can then get the requested resource from the request and generate a sitemap from that point in the content tree by using the JCR APIs.  The benefit to an approach like this is when you have multiple sites being served from the same instance.  A request to /content/siteA.sitemap.xml would generate a sitemap for siteA while a request for /content/siteB.sitemap.xml would generate a sitemap for siteB without the need for writing additional code.

gmisura
gmisuraAuthor
December 17, 2016

ireasor, SEO says:

robots.txt points to your 'main' sitemap.xml (usually /sitemap.xml)

If you have "sub sitemaps" you can point to those via loc.

At no point in time does anything SEO (and therefore sitemap) related say that _every single page_ on your site should be able to return a sitemap.xml

We have multiple (AEM) sites, for locales: en_us, es_es, etc, etc so it makes sense to have those 'sites' return a unique sitemap.xml via es_es.sitemap.xml 

But why in the world allow every html to also return the same sitemap content?

You've doubled the number of files dispatcher could cache (.html and sitemap.xml for every page). You've nearly doubled your storage requirements (depending on the number of pages you have your sitemap.xml might be as big or bigger than your page HTML's).

I can't understand the rationalization for the interpretation of the documentation. I'm _very_ interested in kautuksahni response.

ireasor-adobe
Adobe Employee
Adobe Employee
December 20, 2016

The dispatcher will only cache a page if it has actually been requested.  The way that I have managed this in the past is by using Apache mod_rewrite rules to redirect sitemap requests to the AEM paths that will handle the request.  Following your example of a multilingual site with en_us and es_es, I would configure Apache such that a request for mysite.com/en_us/sitemap.xml would rewrite to /content/my_company/en_us.sitemap.xml and mysite.com/es_es/sitemap.xml would rewrite to /content/my_company/es_es.sitemap.xml.  In theory, nobody would ever request /content/my_company/en_us/some_other_page.sitemap.xml directly and thus this content would never be cached.  If this was a concern, however, you could rewrite any sitemap.xml requests to the language root node or block them entirely at the dispatcher.