I've recently gone through training and started doing some of my first OSGi workflows (and have some prior experience working with J2EE/Workbench processes).
I'm running into problems trying to use the out-of-box "Export PDF to specified type" process. I know how to view the running instance via:
lc/libs/cq/workflow/admin/console/content/instances.html and find the details there lacking for troubleshooting.
I know you can view details related to the workflow instance steps via CRX (i.e. var --> workflows --> instances --> expand my instance), and I can see errors related to the instance in our server logs, but I am not finding any clear information to help me fix the issue with this process.
Instead of erroring out/failing, it just hangs on the step that it cannot complete - I've let it run for errors before and the step with errors in the log just shows "Active" in the AEM --> Workflows --> Instances history.
I was hoping to get more information around:
1.) Is there any place I can view more detailed logs about what's happening with individual workflow steps? If there is, is there any way to turn on verbose logging for them?
2.) How does error handling work here? In workbench, a number of the built-in processes have fault routes you can use to branch off steps that fail. I can't tell how to do this, or if it's possible, with this process. I don't see any components related to error handling either. Seems really weird we wouldn't be able to do this with out-of-box processes.
3.) Can you force a hanging step to time out?
If there's some documentation around these things that might help, please share it too. I've gone through some of the stuff around workflows. One thing seems to be indicating we'd need to make a custom event handler for errors?: How to Catch and Process Workflow Events
That's the closest thing I've found to an answer for #2.
Thanks for any help or ideas you can provide around this.
By default, all logging information is available in the error.log file at the /crx-repository/logs/ directory. Kindly refer to the following workflow documentation which provide you answer for your query 1 and 2.
#3 Some of the workflows like the Participant, Dialog Participant step and Forms Participant steps workflow provides the option to handle the timeout. OOB timeout handling is not provided for all the workflows.
Hope this helps.
Thanks, I'll be reaching out to our internal admins to get help with setting up debugger on one of our less-used test instances.
What would be the expected behavior of a workflow if there's a step that's failing on all 10 attempts in the server logs?
i.e. there's a warn line (cut out some details):
19.12.2022 12:24:48.604 *WARN* [sling-threadpool-48b1ce6a-2fb3-4662-a287-92cd181da15f-(apache-sling-job-thread-pool)-13-Granite Workflow Queue(com/adobe/granite/workflow/job/var/workflow/models/Megan-Excel-Test-3)] org.apache.sling.event.impl.jobs.queues.JobQueueImpl.Granite Workflow Queue Failed job Sling Job
event.job.retries=10,jcr:primaryType=slingevent:Job,event.job.retrycount=10,:sling:jobs:asynchandler=org.apache.sling.event.impl.jobs.JobConsumerManager$JobConsumerWrapper$1@51ca08fd,com.adobe.granite.workflow.jobid=VolatileWorkItem_node1_var_workflow_instances_server1_2022-12-18_Megan-Excel-Test-3_19,com.adobe.granite.workflow.job=com.adobe.granite.workflow.job.WorkflowJob@4f96b146], will retry 0 more time(s), retryCount=10
After the last #10 failure, the workflow instance continues to remain active on the step that is failing until it is manually terminated? Is that expected behavior or a bug in workflow?
That's the correct behavior workflow would remain in failed state till the time it's failing. It needs to be manually terminated.
Just to be clear since I am not sure I was, it's expected behavior for a workflow process that has failed all of its retry attempts to hang and remain in "Active" state until someone notices, and manually goes in and terminates it?
Technically it never goes to a "failed state" from the Workflow Instances area (i.e.: lc/libs/cq/workflow/admin/console/content/instances.html )
Technically it remains "Active" until the workflow is terminated (and then has "Aborted" status).
Going into the individual instance, the "Open history" shows the particular step as "Active" even when it has been minutes or hours after the last retry attempt was exhausted, and it's not actually doing anything. The only reason I know it failed is because of checking the server logs + some detail in CRX also mentions it failed.
Is the only way to make a workflow actually show as failed, is to make a custom process to handle the failure? Or can you not even do that?