Expand my Community achievements bar.

Don’t miss the AEM Skill Exchange in SF on Nov 14—hear from industry leaders, learn best practices, and enhance your AEM strategy with practical tips.

Options for troubleshooting/getting detailed logs/adding something similar to breakpoints for OSGi workflows

Avatar

Level 1

Hello,

I've recently gone through training and started doing some of my first OSGi workflows (and have some prior experience working with J2EE/Workbench processes). 

 

I'm running into problems trying to use the out-of-box "Export PDF to specified type" process. I know how to view the running instance via:

lc/libs/cq/workflow/admin/console/content/instances.html and find the details there lacking for troubleshooting. 

I know you can view details related to the workflow instance steps via CRX (i.e. var --> workflows --> instances --> expand my instance), and I can see errors related to the instance in our server logs, but I am not finding any clear information to help me fix the issue with this process. 

Instead of erroring out/failing, it just hangs on the step that it cannot complete - I've let it run for errors before and the step with errors in the log just shows "Active" in the AEM --> Workflows --> Instances history. 

 

I was hoping to get more information around:

 

1.) Is there any place I can view more detailed logs about what's happening with individual workflow steps? If there is, is there any way to turn on verbose logging for them?

2.) How does error handling work here? In workbench, a number of the built-in processes have fault routes you can use to branch off steps that fail. I can't tell how to do this, or if it's possible, with this process. I don't see any components related to error handling either. Seems really weird we wouldn't be able to do this with out-of-box processes. 

3.) Can you force a hanging step to time out? 

 

If there's some documentation around these things that might help, please share it too. I've gone through some of the stuff around workflows. One thing seems to be indicating we'd need to make a custom event handler for errors?: How to Catch and Process Workflow Events
https://helpx.adobe.com/experience-manager/kb/CatchAndProcessWorkflowEvents.html 

 

That's the closest thing I've found to an answer for #2. 

 

Thanks for any help or ideas you can provide around this. 

4 Replies

Avatar

Community Advisor

Hi @mcanderson2 

 

By default, all logging information is available in the error.log file at the /crx-repository/logs/ directory. Kindly refer to the following workflow documentation which provide you answer for your query 1 and 2.

 

https://experienceleague.adobe.com/docs/experience-manager-65/forms/workflows/forms-workflow-logs.ht... 

 

#3 Some of the workflows like the Participant, Dialog Participant step and Forms Participant steps workflow provides the option to handle the timeout. OOB timeout handling is not provided for all the workflows.

 

AvinashGupta01_0-1671230948533.png

 

Hope this helps.

Avatar

Level 1

Thanks, I'll be reaching out to our internal admins to get help with setting up debugger on one of our less-used test instances. 

 

What would be the expected behavior of a workflow if there's a step that's failing on all 10 attempts in the server logs?

i.e. there's a warn line (cut out some details): 

 

19.12.2022 12:24:48.604 *WARN* [sling-threadpool-48b1ce6a-2fb3-4662-a287-92cd181da15f-(apache-sling-job-thread-pool)-13-Granite Workflow Queue(com/adobe/granite/workflow/job/var/workflow/models/Megan-Excel-Test-3)] org.apache.sling.event.impl.jobs.queues.JobQueueImpl.Granite Workflow Queue Failed job Sling Job

....

event.job.retries=10,jcr:primaryType=slingevent:Job,event.job.retrycount=10,:sling:jobs:asynchandler=org.apache.sling.event.impl.jobs.JobConsumerManager$JobConsumerWrapper$1@51ca08fd,com.adobe.granite.workflow.jobid=VolatileWorkItem_node1_var_workflow_instances_server1_2022-12-18_Megan-Excel-Test-3_19,com.adobe.granite.workflow.job=com.adobe.granite.workflow.job.WorkflowJob@4f96b146], will retry 0 more time(s), retryCount=10

 

After the last #10 failure, the workflow instance continues to remain active on the step that is failing until it is manually terminated? Is that expected behavior or a bug in workflow? 

Avatar

Community Advisor

That's the correct behavior workflow would remain in failed state till the time it's failing. It needs to be manually terminated. 

Avatar

Level 1

Just to be clear since I am not sure I was, it's expected behavior for a workflow process that has failed all of its retry attempts to hang and remain in "Active" state until someone notices, and manually goes in and terminates it?

 

Technically it never goes to a "failed state" from the Workflow Instances area (i.e.: lc/libs/cq/workflow/admin/console/content/instances.html )

Technically it remains "Active" until the workflow is terminated (and then has "Aborted" status).

Going into the individual instance, the "Open history" shows the particular step as "Active" even when it has been minutes or hours after the last retry attempt was exhausted, and it's not actually doing anything. The only reason I know it failed is because of checking the server logs + some detail in CRX also mentions it failed. 

 

Is the only way to make a workflow actually show as failed, is to make a custom process to handle the failure? Or can you not even do that?