Expand my Community achievements bar.

SOLVED

Optimizing Daily API Calls in AEM: A Better Solution?

Avatar

Level 5

In my current AEMaaCS setup, I’ve created a scheduled Sling job to make daily calls to an external API, storing the API response in the repository at /var/api-data. Here’s a breakdown of how it works:

  1. A scheduled job runs once a day to call the external API.
  2. The API response is stored under /var/api-data, which minimizes the daily load on the API server.
  3. A custom service reads this data from /var/api-data, making it accessible to the Sling Model, which then provides the data to the frontend components.

This setup has effectively reduced API calls and ensured data availability throughout the day.

However, I’m curious if there might be a more efficient or best-practice approach within AEM for this type of implementation. Are there other strategies for handling scheduled API data updates in AEM that could offer improved performance, persistence, or scalability?

Topics

Topics help categorize Community content and increase your ability to discover relevant content.

1 Accepted Solution

Avatar

Correct answer by
Community Advisor

@narendiran_ravi  In my opinion its the right approach as long as you know the data you are serving from these API calls can be updated every 24 hours. Thats the SLA you need to define. Other things you need to think about after every run(daily) are you clearing the cache of the pages which are serving these data because the page might have stale data from API. Are you setting TTL for pages? Think about caching strategy. 
if you want the latest data from API all the time then think about client side API calls instead of server side but every time the page loads API call will happen. 

You need to find the balance between the consistency of the data vs performance and correctness of the page based on the importance of data which only you and your business know. My 2 cents    

 

 

 

View solution in original post

5 Replies

Avatar

Correct answer by
Community Advisor

@narendiran_ravi  In my opinion its the right approach as long as you know the data you are serving from these API calls can be updated every 24 hours. Thats the SLA you need to define. Other things you need to think about after every run(daily) are you clearing the cache of the pages which are serving these data because the page might have stale data from API. Are you setting TTL for pages? Think about caching strategy. 
if you want the latest data from API all the time then think about client side API calls instead of server side but every time the page loads API call will happen. 

You need to find the balance between the consistency of the data vs performance and correctness of the page based on the importance of data which only you and your business know. My 2 cents    

 

 

 

Thanks for your feedback. Initially, the API calls were made client-side, but we were asked to move them to the backend to reduce the load on the API server. Now, scheduled sling jobs updates the data once a day, which has proven to be efficient. Since the API data doesn’t change frequently, caching isn’t a concern, and we’ve set a TTL for the pages, ensuring that any cached content remains fresh and consistent

Avatar

Community Advisor

Perfect! As per my experience, you have all bases covered. You can always revisit as and when use case changes.  

Avatar

Level 5

My two cents of this:

 

  1. I would usually avoid to use AEM as database to store import data. Specially when data is very large or has a deep nested structure. It can cause issues to your code performance if you try to lookup site, can easily become a nightmare to manage, and can even block AEM. I would use a MongoDB or Redis to store data. (Persistence, Portability, Reusability, Scalability)
  2. I would investigate to see if external API can provide delta data. This would improve your import execution time. (Performance)
  3. I would schedule the import during low traffic time windows, maybe outside business hours, eventually during the night. (Performance)
  4. I would keep the imported data and the places where it should be used (like sites, components, CFs etc) decouples. But I guess this is common sense and you already considered it (Maintainability)
  5. I would analyze based on requirements to see if I have the import job run in both author and publish tier, or only on author and the publish it (in case it needs some business validation). (Maintainability)
  6. I would think to expose the data to FE components through REST APIs, to be headless-ready in case plans for the future is to move to that. (Extensibility)

But this is only my reasoning when I have to solution smth like this. Of course I can think of more aspects, cause' it always depends on the context (type of data, amount, consumers etc)

Avatar

Level 8

Hi @narendiran_ravi,

the solution you did is solid and has been used in AEM for years. Just one more, in case the data set is not too large I would store it in-memory, not in JCR. A more efficient approach that comes to mind is to implement the caching logic in the CDN, not in AEM, that is in case you have your own CDN and access to Edge Workers/Compute.

 

Daniel