Expand my Community achievements bar.

Guidelines for the Responsible Use of Generative AI in the Experience Cloud Community.

Scheduled jobs not executed in AEMaaCS

Avatar

Level 1

Hello community,

for our new project we need scheduled jobs for running e.g. importer tasks during the night. I am using sling jobs for many years now - but the scheduled jobs in AEMaaCS make me crazy - they sometimes run, sometimes not. I believe this is due to the architecture of AEMaaCS using an author cluster and depending on which instance the job was created and which is active it gets triggered or not. My question, how can I schedule sling jobs by using a cron expression which will guaranteed run?

Which is the golden approach to be used in AEMaaCS?

Code snippets:
Job executor:

 

(service = JobExecutor.class, property = {
JobConsumer.PROPERTY_TOPICS + "=" + PimImportRunnerJob.PIM_IMPORT_JOB_TOPIC})
public class PimImportRunnerJob implements JobExecutor {
public JobExecutionResult process (final Job job, JobExecutionContext context) {
...
}
}

 

the job gets scheduled in the activate method of my osgi service

 

(service = PimImporterV2RuntimeService.class, immediate = true)
@Designate(ocd = PimImporterV2RuntimeServiceConfiguration.class)
public class PimImporterV2RuntimeServiceImpl implements PimImporterV2RuntimeService {
...
@Activate
@Modified
public void activate(final PimImporterV2RuntimeServiceConfiguration config) {
JobBuilder builder = jobManager.createJob(PimImportRunnerJob.PIM_IMPORT_JOB_TOPIC);
JobBuilder.ScheduleBuilder scheduleBuilder = builder.schedule();
final String expr = ...
scheduleBuilder.cron(expr);
ScheduledJobInfo scheduledJobInfo = scheduleBuilder.add();
if (scheduledJobInfo == null) {
LOG.info("Error adding job '{}' to scheduler", PimImportRunnerJob.PIM_IMPORT_JOB_TOPIC);
} else {
LOG.info("Scheduler successfully added job '{}'", PimImportRunnerJob.PIM_IMPORT_JOB_TOPIC);
}
}

 

When deployed to AEMaaCS the log says: "Scheduler successfully added..."

For testing i set my cron to execute the job every 15 min. The logs showed me, that the job ran 12 hours ago for the last time...

Locally, by using AEM SDK, everything's fine.

 

Any ideas?


Thanks and best regards,
Martin

5 Replies

Avatar

Level 7

In a cluster environment, a common use-case involves the execution of scheduled jobs. However, if a node goes down while a job is running, it can lead to interruptions and potential inconsistencies in data or content. To handle such scenarios, the Jobs feature in the system provides the ability to retry failed executions.

Here's a simple explanation of how it works: When a job is assigned, its details are persisted at the path /var/eventing/jobs. During execution, if the job encounters a failure, it returns false to signal that it has failed and should be rescheduled. Conversely, a successful execution returns true. If a maximum retry limit is not set, the job will be automatically rescheduled to run the next time the system is up and running.

This retry mechanism ensures that jobs are robustly executed, even in the event of node failures, minimizing the impact of disruptions and maintaining data consistency.

Refer here to see the sample code for Sling job.

 

Code Must Be Cluster-Aware

Code running in AEM as a Cloud Service must be aware of the fact that it is always running in a cluster. This means that there is always more than one instance running. The code must be resilient especially as an instance might be stopped at any point in time.

During the update of AEM as a Cloud Service, there are instances with old and new code running in parallel. Therefore, old code must not break with content created by new code and new code must be able to deal with old content.

If there is the need to identify the primary in the cluster, the Apache Sling Discovery API can be used to detect it.

trying using - 

@Property(name="scheduler.runOn", value="LEADER");

Refer here for information on how to write cluster aware code. 

Avatar

Employee Advisor

@martinm3484449 

In addition to what has already been replied. 
You might have to tweak your code, to write JobConsumer instead of JobExecutor. Please check the sample code here:

https://sudeshaempodcast.wordpress.com/2021/07/04/aemaacs-slingjobs/

 

Also, localsdk does not replicate the Cloud Service environment. LocalSDK is running on single server, which can execute the Job almost synchronously.

Avatar

Level 1

thanks for your answers,

I am using many sling jobs in other places which work fine - only the scheduled ones make problems. And there it is the same behavior whether I use JobConsumer or JobExecutor.

I always get the link to:

https://blog.developer.adobe.com/handling-sling-schedulers-in-aem-as-a-cloud-service-cb59d5e59e9

that I should use scheduleBuilder,... - but that's exactly what I do...

I will try to identify the "primary" cluster - will see if this helps...

 

One more thing I noticed: I can see the schedules job in /var/eventing/jobs - then it also gets executed. When the author switches I can't see my entry. - and it is not executed. But it should be scheduled on every node, isn't it?