Expand my Community achievements bar.

Trigger custom workflow on page publish

Avatar

Level 3

Hi,

 

I have a requirement where I need to call and API to trigger crawler when a page is published. Purpose is to trigger crawler as soon as page is published so the content changes can be crawled.

 

I can do a custom workflow to trigger crawler but not sure how to trigger that custom workflow when editor hits publish.

 

I read two ways of doing it but not clear which option is better than the other.

 

First is to write custom event handler which will be triggered at page activation and executes custom workflow.

 

Second is to create a model and set the custom workflow and then trigger that by adding launcher with condition "jcr:content/cq:lastReplicationAction == Activate" on modified.

 

I am not sure if both are doable and which is the right way of doing to yield desired results without affecting performance. 

 

Any help on this would be greatly appreciated.

 

Thanks in advance!

 

8 Replies

Avatar

Community Advisor

Hi @hptarora ,
Both of the options you mentioned are viable, but there are key differences in terms of flexibility, complexity, and performance.

For most use cases, Option 2 (Workflow Launcher) is the preferred option because it leverages AEM's optimized workflow engine for asynchronous processing, which is better suited for handling triggers like page activation with minimal performance impact.

Option 1 (Custom Event Handler) can be used when you need more control over the event trigger, but it may require more careful performance management, especially with high traffic or frequent page activations.

Thanks,
Madhur

Avatar

Level 7

Hi @hptarora 

In case you are using AEMaaCS you could leverage Cloud Events: https://developer.adobe.com/experience-cloud/experience-manager-apis/guides/events/. I believe one of its use-cases is exactly the one you need. You have a Publish event webhook specially for sites, among others, which you can use: https://developer.adobe.com/experience-cloud/experience-manager-apis/api/stable/sites/

 

In case you are not on AEM on-prem you can implement something that observes page changes, like publishing activity in your case, either using listener or handler: https://medium.com/@toimrank/aem-handler-and-listener-12b6c8b5a3d3

Avatar

Community Advisor

Hello @hptarora,

I prefer using a workflow solution (where a launcher is triggered by events to execute logic via a custom model under a specific path on the publish instance) instead of an event listener, handler, or preprocessor for the following reasons:

  • The workflow solution is inherently scalable and works seamlessly on both AEM as a Cloud Service (AEMaaCS) and Adobe Managed Services (AMS), regardless of the number of instances from which it needs to trigger. I also assume that the crawler can manage or bypass any already running scheduled jobs.

  • Based on my experience, the AEM workflow engine is significantly more stable compared to event handlers or listeners.

Let me know if this approach aligns with your understanding.

Avatar

Community Advisor

Hi @hptarora ,

To trigger an event when a page is published in AEM, you have three main approaches:

 

  1. Custom Event Handler:
    For immediate triggers upon page activation (publish). This approach is suitable for small-to-medium sites where performance impact is minimal. However, it can affect performance if not handled asynchronously.
  2. Workflow Launcher with Condition:
    For triggering actions asynchronously after page publish. This approach is more suitable for scalable workflows in AEM, without blocking the publishing process. Note that it introduces some delay, as it runs after the page is published.
  3. Cloud Events (AEM as a Cloud Service):
    For scalable, low-latency, cloud-native integrations. This approach is best for AEM as a Cloud Service, especially for large-scale environments. While it requires a more complex setup, it offers decoupled, asynchronous processing.

Thanks

Ritesh Mittal

Avatar

Administrator

@hptarora Did you find the suggestions helpful? Please let us know if you require more information. Otherwise, please mark the answer as correct for posterity. If you've discovered a solution yourself, we would appreciate it if you could share it with the community. Thank you!



Kautuk Sahni

Avatar

Community Advisor

@hptarora :

 

Regarding  "jcr:content/cq:lastReplicationAction == Activate"

My guess is that the workflow would trigger on any Modification event, if lastReplicationAction=Active. So, its basically on "first publish" + "any modification after publish".

Thus, its more than what you need. 

 

Event handler:

Its sample is available here: https://experienceleaguecommunities.adobe.com/t5/adobe-experience-manager/aem-as-cs-handle-event-on-...

It should work for your requirements. But, events can slow down the system when executed in Bulk

 

Replication Preprocessor:

If you can call the API before replication, then you can also use Preprocessor. A sample is available on https://medium.com/tech-learnings/aem-as-a-cloud-managing-and-tracking-asset-metadata-changes-over-t... 

https://github.com/Adobe-Consulting-Services/acs-aem-samples/blob/master/core/src/main/java/com/adob... 

 

Preprocessor will also provide the benefit, where if API fails, Replication wouldn't go through . Details here: https://dileepakv.blogspot.com/2018/01/aem-replication-preprocessor.html 


Aanchal Sikka

Avatar

Level 3

Thank you every one for your reply. 

I think I would go with option 2 to use AEM workflow model to trigger the custom workflow.

 

I have another question here. As custom workflow is to trigger crawler to crawl the page or the modification done on page after the page is published. Sometimes page is published but changes take a bit of time to show up on publisher due to CDN caching. Would there be a possibility that crawler triggers after page publish but could not crawl the modifications because of the delay in showing up changes on publisher due to caching?

 

Has anyone have any idea if this can happen and if there is a better way or time to trigger custom workflow to make sure that crawler crawls the modification?

 

Thanks in advance.

Avatar

Community Advisor

@hptarora 

There are few options available to solve your requirement.

Option A:

After page publish, you can add a step to invoke the servlet which can clear the CDN cache. If you are using Akamai cache, you can integrate Akamai Fast Purge API. Once the cache gets cleared the page crawl step can be added in the workflow. But there is a constraint for using this since it might take some time to clear the cache due to multiple Akamai edge node servers.

 

Option B:

Instead of triggering the workflow on page publish, you can add a scheduler with the preferred frequency to crawl the page.