Dear Community members,
I have a situation where there is a large number of sites, and each site contains a significant number of pages. Additionally, some sites follow a specific content path pattern (/content/mySite/americas/us/en_us/home), while others have a different pattern (/content/myOtherSite/brand1/us/en_us/home).
I am seeking ideas/suggestions in finding a solution that can generate a sitemap for each site. I have already attempted one approach that involves making requests to the publisher, querying the data, and dynamically populating the sitemap. However, this approach proves to be costly as it consumes a considerable amount of server resources. Therefore, I am looking for an alternative solution that would prevent the request from reaching the publisher altogether.
Any ideas or suggestions are greatly appreciated.
Thank you,
-Bilal
Solved! Go to Solution.
Views
Replies
Total Likes
Hello @bilal_ahmad -
Here's an alternative approach that can help improve performance:
1. Use a Scheduled Job:
- Create a scheduled job in AEM that runs at a specific interval (e.g., daily, hourly) to generate the sitemap.
- The scheduled job will execute the sitemap generation logic in a background process, reducing the impact on server resources during regular user requests.
2. Maintain a Sitemap Cache:
- Implement a caching mechanism to store the generated sitemap XML.
- Upon generating the sitemap, store it in the cache and set an expiration time based on your requirements.
3. Serve Sitemap from Cache:
- Configure the servlet to serve the sitemap XML directly from the cache, rather than regenerating it for each request.
- Check if the sitemap XML exists in the cache and is still valid based on the expiration time.
- If the cached sitemap is valid, retrieve it from the cache and serve it as the servlet response. Otherwise, proceed to regenerate the sitemap.
Summarizing :
By utilizing a scheduled job and caching mechanism, you can reduce the performance impact of generating the sitemap on each request. The sitemap generation is performed at regular intervals, and the cached version is served for subsequent requests within the validity period. This helps minimize the traversal of the site hierarchy and improves the overall performance of serving the sitemap.
Hi,
One possible solution is to leverage the AEM Dispatcher module to generate and serve the sitemaps. You can configure the Dispatcher to intercept specific URLs matching the sitemap paths and respond with pre-generated static files instead of forwarding the request to the publisher. This approach reduces the load on the publisher and improves performance.
To implement this solution, you would need to:
By following this approach, you can offload the sitemap generation process from the publisher and serve the pre-generated sitemap files directly from the Dispatcher, resulting in improved performance and reduced resource consumption.
Hello @bilal_ahmad -
Here's an alternative approach that can help improve performance:
1. Use a Scheduled Job:
- Create a scheduled job in AEM that runs at a specific interval (e.g., daily, hourly) to generate the sitemap.
- The scheduled job will execute the sitemap generation logic in a background process, reducing the impact on server resources during regular user requests.
2. Maintain a Sitemap Cache:
- Implement a caching mechanism to store the generated sitemap XML.
- Upon generating the sitemap, store it in the cache and set an expiration time based on your requirements.
3. Serve Sitemap from Cache:
- Configure the servlet to serve the sitemap XML directly from the cache, rather than regenerating it for each request.
- Check if the sitemap XML exists in the cache and is still valid based on the expiration time.
- If the cached sitemap is valid, retrieve it from the cache and serve it as the servlet response. Otherwise, proceed to regenerate the sitemap.
Summarizing :
By utilizing a scheduled job and caching mechanism, you can reduce the performance impact of generating the sitemap on each request. The sitemap generation is performed at regular intervals, and the cached version is served for subsequent requests within the validity period. This helps minimize the traversal of the site hierarchy and improves the overall performance of serving the sitemap.
Hey @Tanika02 Followed this approach and this seemed to be the best so far however, in order to further lighten up the load on publish server I'm running the schedule job on author and publishing it. Thanks!