Expand my Community achievements bar.

SOLVED

Need inputs for Sitemap.xml creation

Avatar

Level 7

Dear Community members,

 

I have a situation where there is a large number of sites, and each site contains a significant number of pages. Additionally, some sites follow a specific content path pattern (/content/mySite/americas/us/en_us/home), while others have a different pattern (/content/myOtherSite/brand1/us/en_us/home).

 

I am seeking ideas/suggestions in finding a solution that can generate a sitemap for each site. I have already attempted one approach that involves making requests to the publisher, querying the data, and dynamically populating the sitemap. However, this approach proves to be costly as it consumes a considerable amount of server resources. Therefore, I am looking for an alternative solution that would prevent the request from reaching the publisher altogether.

 

Any ideas or suggestions are greatly appreciated.

 

Thank you,

-Bilal

1 Accepted Solution

Avatar

Correct answer by
Level 10

Hello @bilal_ahmad - 

 

Here's an alternative approach that can help improve performance:

 

1. Use a Scheduled Job:
- Create a scheduled job in AEM that runs at a specific interval (e.g., daily, hourly) to generate the sitemap.
- The scheduled job will execute the sitemap generation logic in a background process, reducing the impact on server resources during regular user requests.

 

2. Maintain a Sitemap Cache:
- Implement a caching mechanism to store the generated sitemap XML.
- Upon generating the sitemap, store it in the cache and set an expiration time based on your requirements.

 

3. Serve Sitemap from Cache:
- Configure the servlet to serve the sitemap XML directly from the cache, rather than regenerating it for each request.
- Check if the sitemap XML exists in the cache and is still valid based on the expiration time.
- If the cached sitemap is valid, retrieve it from the cache and serve it as the servlet response. Otherwise, proceed to regenerate the sitemap.

 

Summarizing

 

By utilizing a scheduled job and caching mechanism, you can reduce the performance impact of generating the sitemap on each request. The sitemap generation is performed at regular intervals, and the cached version is served for subsequent requests within the validity period. This helps minimize the traversal of the site hierarchy and improves the overall performance of serving the sitemap.

 

View solution in original post

5 Replies

Avatar

Employee Advisor

Hi,

 

One possible solution is to leverage the AEM Dispatcher module to generate and serve the sitemaps. You can configure the Dispatcher to intercept specific URLs matching the sitemap paths and respond with pre-generated static files instead of forwarding the request to the publisher. This approach reduces the load on the publisher and improves performance.

To implement this solution, you would need to:

  1. Configure the Dispatcher to intercept requests for sitemap URLs.
  2. Set up a separate process to generate the sitemaps periodically or on demand.
  3. Save the generated sitemap files in a specific location that is accessible to the Dispatcher.
  4. Ensure that the sitemap URLs are mapped to the corresponding static files in the Dispatcher configuration.

By following this approach, you can offload the sitemap generation process from the publisher and serve the pre-generated sitemap files directly from the Dispatcher, resulting in improved performance and reduced resource consumption.

Avatar

Level 7

Thank you @ManviSharma appreciate your inputs here. I'll give this a try!

 

-Bilal

Avatar

Correct answer by
Level 10

Hello @bilal_ahmad - 

 

Here's an alternative approach that can help improve performance:

 

1. Use a Scheduled Job:
- Create a scheduled job in AEM that runs at a specific interval (e.g., daily, hourly) to generate the sitemap.
- The scheduled job will execute the sitemap generation logic in a background process, reducing the impact on server resources during regular user requests.

 

2. Maintain a Sitemap Cache:
- Implement a caching mechanism to store the generated sitemap XML.
- Upon generating the sitemap, store it in the cache and set an expiration time based on your requirements.

 

3. Serve Sitemap from Cache:
- Configure the servlet to serve the sitemap XML directly from the cache, rather than regenerating it for each request.
- Check if the sitemap XML exists in the cache and is still valid based on the expiration time.
- If the cached sitemap is valid, retrieve it from the cache and serve it as the servlet response. Otherwise, proceed to regenerate the sitemap.

 

Summarizing

 

By utilizing a scheduled job and caching mechanism, you can reduce the performance impact of generating the sitemap on each request. The sitemap generation is performed at regular intervals, and the cached version is served for subsequent requests within the validity period. This helps minimize the traversal of the site hierarchy and improves the overall performance of serving the sitemap.

 

Avatar

Level 7

Hey @Tanika02 Followed this approach and this seemed to be the best so far however, in order to further lighten up the load on publish server I'm running the schedule job on author and publishing it. Thanks!