Expand my Community achievements bar.

Don’t miss the AEM Skill Exchange in SF on Nov 14—hear from industry leaders, learn best practices, and enhance your AEM strategy with practical tips.

Sling job Consumer: Job Result Lifecycle

Avatar

Level 2

Hello,

 

It would be nice to know few details on below queries.

 

1.  If the returned Job Result type is 'CANCEL', then would it indefinitely remain persisted in AEM JCR under ''/var/eventing/jobs/cancelled/<JOB_TOPIC>/<yyyy>.... path. Or is there any OOTB maintenance activity that cleans up these periodically ?

2. As per the document it says, "processing failed permanently and must not be retried." for CANCEL types. So, does all cancel returned items are saved in JCR with slingevent:finishedState as ERROR?

3. In the loggers we can see, 'org.apache.sling.event.impl.jobs.queues.JobQueueImpl.<main queue> Failed job Sling Job' messages with multiple retrial attempts (default 10 in main queue if a dedicated sling queue not configured). If we want to avoid retries for any job, what should be the ideal thing to perform?

4. If we do NOT want any jobs to remain pending/stuck in main queue through application logic, then will the below snippet be good? If the job be removed/not exists, still It would return CANCEL. So, again point 1 comes here.

 

private JobResult removeJob(Job job) {
        if (jobManager.removeJobById(job.getId())) {
            return JobResult.CANCEL;
        }
        return JobResult.OK;
    }

 

 

So, if we simply return JobResult.OK in every use case irrespective of custom application logic outcome, will it ensure that no sling jobs for that topic will remain stuck.

If it's an AEM instance issue, the obviously it may happen but not in other happy scenarios.

 

Any help would be great there!  Thanks

1 Reply

Avatar

Adobe Champion

By default the details of successful jobs aren't kept as these don't generally give any useful information, though you can enable it if needed in the queue config.  Failed jobs are kept as this enables you to review the failure and resolve it.

 

As you've touched on, the difference between a CANCEL and FAILED status is that FAILED will be retried, if the job hasn't reached the limit of the queue.    In my view the simpler approach to not having jobs retried if they fail for some internal reason is to configure a queue for those jobs with retries set to 0.  This would also allow a simpler process for enabling retries in future if this makes sense as you only need to re-configure the queue, not change any job processing logic to stop it cancelling failed jobs.  You can also configured the default queue, but setting it's retry limit to 0 would affect all jobs which don't have their own queue, so wouldn't be a recommended approach.

 

It would be useful to understand if you have some other process by which you would capture and resolve any failures of the jobs as the reason for having a job marked as FAILED or CANCEL status is so that you can know it didn't succeed and can resolve the issue.  The snippet in point 4 essentially would throw away all info about the job, meaning you lose any data about the job within AEM itself.

 

It is also worth considering what constitutes a "failure" as far as the AEM job itself - you could say that whether or not some application logic completes or not, the job was successful as it triggered the logic, but only if you have some other external process to detect and resolve any issues.  In this case returning an OK status could be seen as valid as the AEM job system would not need to keep the job to enable analysis or retrying.