Expand my Community achievements bar.

Enhance your AEM Assets & Boost Your Development: [AEM Gems | June 19, 2024] Improving the Developer Experience with New APIs and Events
SOLVED

AemaaCS - dynamic page generation and robots/sitemap

Avatar

Level 10

Hi all,

Was hoping to seek suggestions from the group here on thoughts 

I need pages dynamically generated from a source and authoring is not an ask , update can happen to the source every min as part of content refresh from source 

given this I see an option and noted questions 

on loading a page with query param 

 

1. have a single template laid out and replicated

2. use query param and based on it get a response that will be consumed and using sly include include a resource passing independent page component as  models as req object to the components for rendering the content 

3.- I don’t believe selectors is used these days ? Like a.12344.html and selector 12344 is used to generate content ? 
4.- what happens to the OOTB sitemap generator ? It definitely would not show the query param based url in the sitemap since these are dynamic pages . Am I mistaken you think ?

 

kind regards,

Topics

Topics help categorize Community content and increase your ability to discover relevant content.

1 Accepted Solution

Avatar

Correct answer by
Community Advisor

Hi @NitroHazeDev , you should be able to achieve given ask following the below steps - 

1. Automated Data Retrieval:

To ensure your content stays up-to-date, consider setting up a scheduled service. This service will run at regular intervals (e.g., hourly, daily) and automatically fetch the latest data from your external source and store it in AEM Repo. 
2. Event-Driven Updates ():

User workflow launcher or event listener to trigger workflow or servlet, based on add/Modification of source property in AEM
3. Dynamic Page Management:

Servlet or workflow process will take care of below - 
Creating new AEM pages based on the retrieved data.
Updating existing pages to reflect any changes in the data.

View solution in original post

21 Replies

Avatar

Community Advisor

Hi,

 

Sorry, I don’t fully understand your question, but a query string is definitely not the best approach. I suggest creating a servlet that receives a JSON payload containing all the details for page creation. The page creation process should be straightforward. As for the sitemap, it should be updated using a scheduled job to ensure that your new page is added the next time it runs, for example, at midnight.

 

Hope this helps.



Esteban Bustamante

Avatar

Level 10

No worries , Why would query param not be the right approach ?
The need is to not store in AEM , no authoring is the ask . Also the content needs a refresh every sec not worth running scheduler on publisher (or author) and persisting nodes or creating pages. Think of it as a single page that displays dynamic data. Imagine components that retrieve dynamic data on a page  . Given this the question arises with sitemap 

Avatar

Correct answer by
Community Advisor

Hi @NitroHazeDev , you should be able to achieve given ask following the below steps - 

1. Automated Data Retrieval:

To ensure your content stays up-to-date, consider setting up a scheduled service. This service will run at regular intervals (e.g., hourly, daily) and automatically fetch the latest data from your external source and store it in AEM Repo. 
2. Event-Driven Updates ():

User workflow launcher or event listener to trigger workflow or servlet, based on add/Modification of source property in AEM
3. Dynamic Page Management:

Servlet or workflow process will take care of below - 
Creating new AEM pages based on the retrieved data.
Updating existing pages to reflect any changes in the data.

Avatar

Level 10

The need is to not store in AEM , no authoring I the ask . Also the content needs a refresh every sec not worth running scheduler on publisher (or author) and persisting nodes . Besides with running scheduler on pub , not sure what caveats we would run into on cloud . 

Avatar

Employee Advisor

I understand that you have an external source, and AEM needs to deliver the content based on that source. And as that source can receive changes at any time, these changes need to be reflected on the content generated by AEM as well.

 

A few questions:

  1. Is there any personalization on top of that content source? In other words: Is there only 1 valid version of the content AEM needs to generate at any time?
  2. What is the requirement regarding the latency from "change on source" to "AEM displays this change"? Real-time (that means less than 5 seconds)? 5 minutes? 1 hour?
  3. What's the expected traffic hitting for this content? Total number of requests per day plus number of requests per minute in peak?

With these additional information it's possible to come up with a few ideas and build an application design for it.

 

 

 

Avatar

Level 10

Thanks @Jörg_Hoh  as always , please find the responses below. I have used selectors in the past combined with resource type to ensure component design is brought in with dynamic data but pls let me know. The components exist in AEM and I planned to bring in a model on load that would have child model resources like text title etc that I would include on the main page via includes and res type feeding in data via setters for backend.i know AEMaaCS  can do it and want to suggest its use 

 

 

Is there any personalization on top of that content source? In other words: Is there only 1 valid version of the content AEM needs to generate at any time?
- [NZ] No personalization needed , the page might be generated from a search results link with an id as a query param that page will query and display and all of data on it is dynamic . The layout is fixed with placeholders for content that might or might not exist in json that gets fed into it. Now search can provide like 60 80 200 results and some or all of these on click takes to a page that displays content dynamically. There will be analytics tracking and for SEO- robots or sitemap I guess I will have to write a separate sitemap xml to have these query param urls for crawler to crawl and track 

 

What is the requirement regarding the latency from "change on source" to "AEM displays this change"? Real-time (that means less than 5 seconds)? 5 minutes? 1 hour?
[NZ] - Real time is the ask , immediate on load for user . So if a user clicks the search page and json was updated this should be reflected even if it updated as the click occurred .

 

What's the expected traffic hitting for this content? Total number of requests per day plus number of requests per minute in peak?

[NZ]- I don’t have the answer for this yet since the folks haven’t gotten back to me on this one , but this can be huge , I can get back on this.Since it is a public site so the traffic is assumed to be heavy wrt requests. There appears no limit on api requests.   It’s a new page that didn exist and migration to AEM is the ask 

Avatar

Employee Advisor

Thanks.

 

Having everything public definitely makes things easier.

 

Using a selector instead of a query string is definitely good to have, it gives you at least the chance to cache the resulting page.

 

The requirement that changes must be updated and then delivered in real-time to a potentially very large audience is definitely challenging. Because if I implement in a naive way, AEM is just the proxy, and then every enduser-request creates a direct request to your backend system. That's definitely not a good idea, and to reduce the number of backend calls you need to implement a caching layer to reduce these requests.

 

On the other hand side, can't you implement this using a frontend application, deliver the static frame via AEM and then have Javascript which is fetching this data directly from the backend and embedding it live? Using AEM as a proxy for a potentially very large amount of requests is something I would avoid as good as I can.

 

 

 

Avatar

Level 10

There’s not going to be caching unfortunately here,so we can’t implement a cache layer. With that query param was the route I had in mind.

If we are thinking selectors help with caching ,if needed as in , they can be enabled or disabled via dispatcher, I see that the same can be done for query param as well ,so would it be ok to use it still ? Or are there other benefits to selectors that I am missing perhaps (sorry)?

update -  if latest content is to be placed into placeholders do we still need caching ? Layout alone as I think can be cached but in AEM there is no way I guess to cache layout with Empty component containers and yet have data dynamic displayed ?

 

 

 

can't you implement this using a frontend application, deliver the static frame via AEM and then have Javascript which is fetching this data directly from the backend and embedding it live? Using AEM as a proxy for a potentially very large amount of requests is something I would avoid as good as I can.

 

NZ- Front end app - would this be like an SPA we are thinking - react based ?

Wouldn’t it be same as getting the content from api via js within AEM and putting it into components within AEM ?

 

sorry if I am missing something here. 

Avatar

Level 10

@Jörg_Hoh So I am posting this incase I could have provided unknowingly a different view . I didn keep selectors in my original thought cause of this 

 

selectors - (reading blog trying to avoid ) http://test.com/content/abc/test.12345.html

query param -

http://test.com/content/abc/test.html?ssid=12345

use this ssid to query the json from 3rd party and display data 

all of the data is dynamic except placeholders on the template that will be static 

template can have hero , carousel, text , etc as blank placeholders

 

@Jörg_Hoh 

Avatar

Employee Advisor

Yes, using React (or any other Javascript/Frontend framework) could be the solution. Then the integration between "static" (AEM) and "dynamic" (backend) content would happen in the browser, not in AEM anymore. Then each system can excel what it is good at: AEM can deliver the authorable frame of the page, and the backend delivers the actual data.

(Also the page does not need to be a full-blown SPA, if you just need to make a JSON request to the backend, parse the data and fill it into defined parts of the DOM, a few lines of pure Javascript might already be enough.

This is just one option, but it could solve a number of problems. Of course it requires you to make your backend available to endusers and deliver at the expected request numbers.

 

Avatar

Level 10

But the question would be how would i send the id of the data to be retrieved.

By backend u mean the source serving the Josn?


Approach something like say search results page that shows dynamic content fetched from sources via api

If i skip react , and use the below approach to have single aem app user

- how do i retrieve the id to get data for display if query param is not the way, is still in my mind

- have js as part of clientlib 

- js to load the content via ajax 

- place content in placeholders using

      a. Servlet to invoke json and put in models and use resource types to include resources passing data for display

 

OR

    b . Have JS place content retrieved into ids (similar to react i would imagine ?)

 

@Jörg_Hoh  with the above aem JS fetches content or server can same as front end app would fetch content ? 
Both would get up to date content like say search 

Avatar

Employee Advisor

I am not a frontend architect, so I cannot give you good recommendations how to integrate that dynamic data into the resulting DOM. Probably depends quite a bit on the nature of the data you integrate and if they are simple key-value pairs or needs to include structure on its own.

Of course if you do the frontend integration, the URL should look like this: /path/to/page.html#id=value

In this case the CDN/dispatcher will always cache the "/path/to/page.html" response, and the anchor ("id=value") is only handled client-side. That means that

/path/to/page.html#id=1
/path/to/page.html#id=2

will always only fetch /path/to/page.html from the CDN/dispatcher and not cause any load to AEM.

Avatar

Level 10

@Jörg_Hoh Apparently I can’t inbox due to the limit for private messages set 

 

totally understand I am not a front end person as well and so trying within AEM

I planned to use backend only to reuse the existing components which is the ask and pass data massaged through backend into respective component placeholders via resource include , which wouldn’t be necessarily heavy. The ask is that these pages generated from search with unique id needed to render pages from AEM do not need to be cached

I would imagine if it is a single static template with no content and with content brought in like with script or minor backend, we do not need caching at all. I understand cache hits but can be evaded with #id= but yet it would be something like components having Ajax calls independently where content is dynamic ?

 This page is rendered (dedicated to the request id) through some means of page click( there can be around 20-30 page variations in terms of content not layout) . Considering this as rendering would query param be ideal like the search that’s dynamic in nature and not cached ?
This link contain id  like  ?id=1234 or #id=1234 where Ajax does the logic of getting data from source and either options would make it dynamic. Wouldn’t this function like the front end except that js would get the data and place into the placeholders ?
iWhat if I used say #id=1234 to please caching and do it all in AEM yet ?

Avatar

Level 10

@Jörg_Hoh  I am confused I guess cause I just feel this would function like the dynamic components with backend integration on the page Ofcourse minus the query param in url . I can use # to please caching or use selector for id based on your idea but would it be that bad in terms of cache hit ratio for this use case if query Param is used based on logic above ?

update - we can enable caching for a url with a defined query param say id= to enable page cache while data is rendered dynamically from code as Ajax . Another idea perhaps ? @Jörg_Hoh 

Avatar

Employee Advisor

With your idea of merging the data on the backend: The problem is not necessarily that this is a hard task or takes a lot of time. It's rather that each of these requests cannot be cached, so you have additional latency and therefor impact on the enduser. 
Also this approach does not scale, because when you have 2 times the traffic you need to 2 times the AEM backend capacity. Not necessarily your problem, but that might have side effects on other pages, like if there's a problem with the backend connectivity, AEM can get stuck because of blocked requests pretty fast, especially at a high rate of incoming requests.

 

For that reason my idea is to look into the possibility of a frontend integration as well, next to the backend integration.

Now you need to go into proper solution design, taking into account the details and specifics of your requirements, and weighing the pros and cons. Not sure if this forum is the right place for it, as you would need to share a lot of context to make the right decision. 

Avatar

Level 10

@Jörg_Hoh thanks for your patience and agreed on the context bit but let me know with the questions from backend and dispatcher standpoint  that would help me get a clarity on this 

 

1. Is it possible to cache the page for a url with below scenarios and have data dynamic like the front end but as part of client lib 

eg http://../test.<id>.html or http ://../test.html#<id>  

http://../test.html?id=<id>( ignore id query param for caching)

2. if above answer is yes what part of it is cached just the static html or content of id=x ? Id value can be anything

 

3. which option to you works best in a typical scenario of data say now partially coming in from backend like say 2 components on the page where we have typically used Ajax 

 

Assuming connectivity - wouldn’t not necessarily be a concern and will be handled to avoid bottlenecks 

Avatar

Employee Advisor

https://www.site.com/page.id1.html is cacheable, but of course you have a cache copy per ID.

https://www.site.com/page.html?id=1 is not cacheable on CDNS because of the query string
https://www.site.com/page.html?#id=1 is cacheable; you have also just 1 copy of the page.html in the CDN cache (as the dynamic content is filled in on the client).

 

I would try to avoid to use both backend and frontend integration, as you then have the complexity and problems of both.

 

Avatar

Level 10

Thanks @Jörg_Hoh reread the post by you and updated below , this is helpful, last thought on this input wrt caching you’ve provided and

yes if clientlibs (not front end app since we might have to write new react based components) is used to retrieve content for both below options -  js in AEM client libs , btw SDI is another option perhaps  for each comp ? 

 

options

https://www.site.com/page.id.html

https://www.site.com/page.html?#id=1

 

sitemap Seo

- how would sitemap Seo respond to either ? Custom sitemap with urls hardcoded in robots and have rest generated by Adobe sitemap ?

for the selector , cached copy per id is stored,so any id retrieves data for it
Questions For selector

1. if data at source changes would the cached copy get updated with the fresh response from json? 
2. is it a good practice to pass id in selector as a parameter for displaying data ? 

 

for the one with # , if id =123 is triggered first and data for id=123 gets cached at cdn or dispatcher or both

 

Questions for #

1. Now if a new id =456 is keyed in , the fresh data retrieved by say Ajax for id=456 , would be displayed to the user or would cached version id=123? 

 

2. Again if someone keys in id=123, and if data for id=123 that was originally cached has now changed at the source and someone tried accessing  id=123 again , what’s displayed cached data or latest Ajax data is retrieved ? 

 

 

I believe the latest content retrieved by Ajax is displayed since it would be querying every time page or component loads and is dynamic and cache now contains the latest version if that id is persisted ?

 

update - added a point to seo original question asked

 

Avatar

Employee Advisor

When you add the topic sitemap and search engines, I think that you might end up with implementing both approaches:

* Use the selector-based approach to render a version for the search engines. It will also include a JS snippet, which switches to the "anchor-based approach" ("#") if a browser visits that URL (and it will silently ignore the values in the page).

* A use the "anchor-based approach" for browsers, which then can directly reach out to the backend.

 

Then you can reference the selector-based URLs in the sitemap. Depending on the capabilities of the search engine crawler it either behaves like a browser and fetches live data as well from the backend, or it will fetch the embedded values (as determined by the selector).

 

For the request of your question: I don't get what my mean with the different ID values, switching between them and what to display to the user. 

 

When you use the anchor-based approach ("#"), AEM will never know what kind of value you will display, and for what reason it only render the frame of the page. It's then the task of the Javascript in the browser to obtain these values from the backend and display them within the page. No page with values is cached anywhere.

 

 

 

Avatar

Level 10

Thank you @Jörg_Hoh 

That’s the approach I think works best  .having selector makes things a lil easier with backend 

 

my question was more towards below statement but I guess I get the jist 

 

 with https://www.site.com/page.id1.html#id=id1

the above caches id1 version due to selector but js silently switches to values with #id1, however id1 is not gonna do anything like a different rendition 

 

You sir are the best! I just didn think of both