A lot of manual overhead is restarting the workflow.
For example, for my most recent manual run I had to restart it 2-3 times before it completed.
Can we setup a configurable retry system that can handle both non-deterministic failures and lsf load failures. This can be done using various wait times such as:
immediate restart, restart after 30 minutes, restart after 2 hours.
A lot of manual overhead is restarting the workflow.
For example, for my most recent manual run I had to restart it 2-3 times before it completed.
Can we setup a configurable retry system that can handle both non-deterministic failures and lsf load failures. This can be done using various wait times such as:
immediate restart, restart after 30 minutes, restart after 2 hours.