Hi,
I have a requirement where I need to call and API to trigger crawler when a page is published. Purpose is to trigger crawler as soon as page is published so the content changes can be crawled.
I can do a custom workflow to trigger crawler but not sure how to trigger that custom workflow when editor hits publish.
I read two ways of doing it but not clear which option is better than the other.
First is to write custom event handler which will be triggered at page activation and executes custom workflow.
Second is to create a model and set the custom workflow and then trigger that by adding launcher with condition "jcr:content/cq:lastReplicationAction == Activate" on modified.
I am not sure if both are doable and which is the right way of doing to yield desired results without affecting performance.
Any help on this would be greatly appreciated.
Thanks in advance!
Solved! Go to Solution.
Views
Replies
Total Likes
Hello @hptarora,
I prefer using a workflow solution (where a launcher is triggered by events to execute logic via a custom model under a specific path on the publish instance) instead of an event listener, handler, or preprocessor for the following reasons:
The workflow solution is inherently scalable and works seamlessly on both AEM as a Cloud Service (AEMaaCS) and Adobe Managed Services (AMS), regardless of the number of instances from which it needs to trigger. I also assume that the crawler can manage or bypass any already running scheduled jobs.
Based on my experience, the AEM workflow engine is significantly more stable compared to event handlers or listeners.
Let me know if this approach aligns with your understanding.
Hi @hptarora ,
Both of the options you mentioned are viable, but there are key differences in terms of flexibility, complexity, and performance.
For most use cases, Option 2 (Workflow Launcher) is the preferred option because it leverages AEM's optimized workflow engine for asynchronous processing, which is better suited for handling triggers like page activation with minimal performance impact.
Option 1 (Custom Event Handler) can be used when you need more control over the event trigger, but it may require more careful performance management, especially with high traffic or frequent page activations.
Thanks,
Madhur
Hi @hptarora
In case you are using AEMaaCS you could leverage Cloud Events: https://developer.adobe.com/experience-cloud/experience-manager-apis/guides/events/. I believe one of its use-cases is exactly the one you need. You have a Publish event webhook specially for sites, among others, which you can use: https://developer.adobe.com/experience-cloud/experience-manager-apis/api/stable/sites/
In case you are not on AEM on-prem you can implement something that observes page changes, like publishing activity in your case, either using listener or handler: https://medium.com/@toimrank/aem-handler-and-listener-12b6c8b5a3d3
Hello @hptarora,
I prefer using a workflow solution (where a launcher is triggered by events to execute logic via a custom model under a specific path on the publish instance) instead of an event listener, handler, or preprocessor for the following reasons:
The workflow solution is inherently scalable and works seamlessly on both AEM as a Cloud Service (AEMaaCS) and Adobe Managed Services (AMS), regardless of the number of instances from which it needs to trigger. I also assume that the crawler can manage or bypass any already running scheduled jobs.
Based on my experience, the AEM workflow engine is significantly more stable compared to event handlers or listeners.
Let me know if this approach aligns with your understanding.
Hi @hptarora ,
To trigger an event when a page is published in AEM, you have three main approaches:
Thanks
Ritesh Mittal
@hptarora Did you find the suggestions helpful? Please let us know if you require more information. Otherwise, please mark the answer as correct for posterity. If you've discovered a solution yourself, we would appreciate it if you could share it with the community. Thank you!
Views
Replies
Total Likes
Regarding "jcr:content/cq:lastReplicationAction == Activate"
My guess is that the workflow would trigger on any Modification event, if lastReplicationAction=Active. So, its basically on "first publish" + "any modification after publish".
Thus, its more than what you need.
Event handler:
Its sample is available here: https://experienceleaguecommunities.adobe.com/t5/adobe-experience-manager/aem-as-cs-handle-event-on-...
It should work for your requirements. But, events can slow down the system when executed in Bulk
Replication Preprocessor:
If you can call the API before replication, then you can also use Preprocessor. A sample is available on https://medium.com/tech-learnings/aem-as-a-cloud-managing-and-tracking-asset-metadata-changes-over-t...
Preprocessor will also provide the benefit, where if API fails, Replication wouldn't go through . Details here: https://dileepakv.blogspot.com/2018/01/aem-replication-preprocessor.html
Thank you every one for your reply.
I think I would go with option 2 to use AEM workflow model to trigger the custom workflow.
I have another question here. As custom workflow is to trigger crawler to crawl the page or the modification done on page after the page is published. Sometimes page is published but changes take a bit of time to show up on publisher due to CDN caching. Would there be a possibility that crawler triggers after page publish but could not crawl the modifications because of the delay in showing up changes on publisher due to caching?
Has anyone have any idea if this can happen and if there is a better way or time to trigger custom workflow to make sure that crawler crawls the modification?
Thanks in advance.
Views
Replies
Total Likes
There are few options available to solve your requirement.
Option A:
After page publish, you can add a step to invoke the servlet which can clear the CDN cache. If you are using Akamai cache, you can integrate Akamai Fast Purge API. Once the cache gets cleared the page crawl step can be added in the workflow. But there is a constraint for using this since it might take some time to clear the cache due to multiple Akamai edge node servers.
Option B:
Instead of triggering the workflow on page publish, you can add a scheduler with the preferred frequency to crawl the page.