Expand my Community achievements bar.

SOLVED

apache slign sitemap oak index

Avatar

Level 3

To use ootb functionality to generate sitemap using apache sling module should I create some oak indexes?

Right now only on demand option is working for me and scheduled generation only express itself in this warning

14.04.2022 16:20:00.120 *WARN* [sling-default-4-we-retail en-US Sitemaps] org.apache.jackrabbit.oak.query.QueryImpl Traversal query (query without index): select [jcr:path], [jcr:score], * from [nt:base] as a where [sling:sitemapRoot] = true and isdescendantnode(a, '/content/we-retail/global/en') option(index tag [slingSitemaps]) /* xpath: /jcr:root/content/we-retail/global/en//*[@sling:sitemapRoot=true] option(index tag slingSitemaps) */; consider creating an index

 Is there a suggested index that I should create to make it work?

I added this one with name as suggested

{
"jcr:primaryType": "oak:QueryIndexDefinition",
"compatVersion": 2,
"includedPaths": [
"/content/we-retail"
],
"seed": -8084877133496368591,
"type": "lucene",
"async": [
"async"
],
"evaluatePathRestrictions": true,
"reindex": false,
"reindexCount": 3,
"indexRules": {
"jcr:primaryType": "nt:unstructured",
"nt:base": {
"jcr:primaryType": "nt:unstructured",
"properties": {
"jcr:primaryType": "nt:unstructured",
"sitemapRoot": {
"jcr:primaryType": "nt:unstructured",
"propertyIndex": true,
"name": "sling:sitemapRoot"
}
}
}
}
}

 but nothing change

1 Accepted Solution

Avatar

Correct answer by
Employee Advisor

I have used Sitemap scheduler here and Apache Sling Sitemap - Sitemap Generator Manager is disabled as shown below -

DEBAL_DAS_2-1650081476612.png

 

DEBAL_DAS_1-1650079952072.png

 

Generated sitemap is available /var/sitemaps/content/we-retail/us/sitemap.xml on publish instance -

 

DEBAL_DAS_0-1650079782906.png

 

View solution in original post

9 Replies

Avatar

Employee Advisor

After seeing your post , when I tried I was facing the similar issue. 

I have referred following article: Apache Sling Sitemap for AEM 6.5.11 and AEMaaCs – AEM Queries & Solutions (wordpress.com) and created scheduler configuration at /apps/weretail/config.publish/org.apache.sling.sitemap.impl.SitemapScheduler~weretail.cfg.json and published

 

DEBAL_DAS_0-1650027149458.png

 

Now , I am not getting the above warning in error.log file and here is my sitemap.xml file -

 

DEBAL_DAS_1-1650027210614.png

 

Hope this will help. Please review.

Avatar

Level 3

Thing is that in your sample site map is generated on demand only. When you disable this option in "Apache Sling Sitemap - Sitemap Generator Manager" you will stop seeing your site map

Sitemaps generated by scheduler should be visible in /var/sitemaps

Avatar

Correct answer by
Employee Advisor

I have used Sitemap scheduler here and Apache Sling Sitemap - Sitemap Generator Manager is disabled as shown below -

DEBAL_DAS_2-1650081476612.png

 

DEBAL_DAS_1-1650079952072.png

 

Generated sitemap is available /var/sitemaps/content/we-retail/us/sitemap.xml on publish instance -

 

DEBAL_DAS_0-1650079782906.png

 

Avatar

Level 3

Thanks it is working on publish. Should it work same on author?

Is there any suggested oak index we should apply to avoid those long query warnings?

And on dispatcher we should allow access to paths in /var/sitemaps folder? Like /var/sitemaps/content/we-retail/us/es/sitemap.xml ?

Thanks

 

Avatar

Employee Advisor

When I did this exercise I didn't notice sitemap in author.

 

If we try to access following index link: localhost:4503/content/we-retail/us.sitemap-index.xml , it will give sitemap location something like : <loc>http://localhost:4503/content/we-retail/us.sitemap.xml</loc> as shown below -

DEBAL_DAS_0-1650513308682.png

Here is my sitemap details -

DEBAL_DAS_1-1650513381947.png

 

Now to Allow HTTP requests for the sitemap index and sitemap files. We will do following configuration in dispatcher/src/conf.dispatcher.d/filters/filters.any file.

...

# Allow AEM WCM Core Components sitemaps
/0200 { /type "allow" /path "/content/*" /selectors '(sitemap-index|sitemap)' /extension "xml" }

As a next step we will have the rewrite rules in place to ensure.xml sitemap HTTP requests are routed to the correct underlying AEM page. If URL shortening is not used, or Sling Mappings are used to achieve URL shortening, then this configuration is not needed.

Rewrite Rule in dispatcher/src/conf.d/rewrites/rewrite.rules

...
RewriteCond %{REQUEST_URI} (.html|.jpe?g|.png|.svg|.xml)$
RewriteRule ^/(.*)$ /content/${CONTENT_FOLDER_NAME}/$1 [PT,L]

 Apache Sling Sitemap for AEM 6.5.11 and AEMaaCs – AEM Queries & Solutions (wordpress.com) is having all the details.

Avatar

Level 3

Hi @broman__pl and @DEBAL_DAS ,

I'm also facing the same issue. Can you please share details on how these transverse queries got fixed in your case?

*WARN* [sling-default-1-My Scheduler] org.apache.jackrabbit.oak.plugins.index.Cursors$TraversingCursor Traversed 10000 nodes with filter Filter(query=select [jcr:path], [jcr:score], * from [nt:base] as a where [sling:sitemapRoot] = true and isdescendantnode(a, '/content/we-retail') option(index tag [slingSitemaps]) /* xpath: /jcr:root/content/we-retail//*[@sling:sitemapRoot=true] option(index tag slingSitemaps) */, path=/content/we-retail//*, property=[:indexTag=[slingSitemaps], sling:sitemapRoot=[true]]); consider creating an index or changing the query



I've created scheduler configuration at /apps/weretail/config.publish/org.apache.sling.sitemap.impl.SitemapScheduler~weretail.cfg.json in my publish instance but still I can these warnings in error.log. 

Also, I see that sitemaps are created under var/sitemap folder but http://localhost:4503/content/we-retail/us.sitemap.xml is still not accessible. Is there any other configurations we've to do to make it work?
Regards,
Radha

Avatar

Level 5

Please use index  if no of node traversal is large, index is now available along with  SP set up and release notes as well