Skip to content

refactor: move local subworkflows behind runner#2247

Merged
yohamta0 merged 3 commits into
mainfrom
refactor-inproc-subworkflow-runner
Jun 1, 2026
Merged

refactor: move local subworkflows behind runner#2247
yohamta0 merged 3 commits into
mainfrom
refactor-inproc-subworkflow-runner

Conversation

@yohamta0
Copy link
Copy Markdown
Collaborator

@yohamta0 yohamta0 commented Jun 1, 2026

Summary

  • move local CLI child workflow execution out of runtime/executor into internal/subflow
  • add a subworkflow router that prefers coordinator-backed execution and falls back to local CLI execution
  • tighten cancellation ownership and boundary types with router fallback coverage

Testing

  • go test ./internal/subflow ./internal/runtime/executor -count=1
  • go test -race ./internal/subflow ./internal/runtime/executor -count=1
  • go test ./internal/runtime/... -count=1
  • go test ./internal/engine ./internal/cmd ./internal/service/worker -count=1
  • go test ./internal/intg -run 'TestInlineSubDAG|TestExternalSubDAG|TestCallSubDAG|TestInlineParams_LocalSubDAGRuntimeCoercion|TestSubDAGParamsReferencedInChildEnv|TestWorkingDir' -count=1
  • go test ./internal/intg/distr -run 'TestSubDAG|TestParams_DistributedSubDAG|TestBaseConfig_SubDAGPropagation|TestCancellation_SubDAG|TestCustomStepTypes_SubDAGBaseConfigPropagation' -count=1
  • make test
  • make lint
  • git diff --check

Summary by cubic

Moved local sub-workflow execution behind a router and LocalCLI runner, removing process management from the executor. Added strict request and workspace validation with cleaner cancellation mapping for safer child runs.

  • Refactors

    • Added internal/subflow router that selects the first runner that accepts a request.
    • Implemented subflow.NewLocalCLI(); internal/runtime/executor no longer spawns/tracks OS processes.
    • Forwarded cancellation intent (graceful/force) and signals to runners; executor tracks only context cancels.
    • Moved action workspace materialization and trace-context injection into LocalCLI.
    • Updated factories to use subflow.NewRouter(subflow.New(...), subflow.NewLocalCLI()) across context, engine, worker, and tests.
  • Bug Fixes

    • Hardened LocalCLI with validation for DAG location and workspace DAG path (normalized and bounds-checked to prevent traversal).
    • Improved cancellation fallback: if no local process is found, request child cancel via DB; error when run DB is missing.

Written for commit e1a2b56. Summary will update on new commits.

Review in cubic

Summary by CodeRabbit

  • New Features
    • Added local CLI execution as an alternative for running child workflows.
  • Improvements
    • Introduced structured cancellation semantics (graceful vs force with signal support).
    • Smarter routing to pick the appropriate sub-workflow execution mode per request.
    • Simplified active-run tracking for more reliable stop/kill behavior.
  • Tests
    • Updated and added tests covering routing, local-run fallback, execution errors, and cancellation scenarios.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 30b981cb-cec3-47c2-bf9a-d0f4252c9fcf

📥 Commits

Reviewing files that changed from the base of the PR and between 2388aba and e1a2b56.

📒 Files selected for processing (5)
  • internal/runtime/executor/dag_runner.go
  • internal/runtime/executor/subworkflow.go
  • internal/subflow/local_cli.go
  • internal/subflow/router_test.go
  • internal/subflow/runner.go
✅ Files skipped from review due to trivial changes (1)
  • internal/subflow/runner.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • internal/runtime/executor/subworkflow.go
  • internal/runtime/executor/dag_runner.go

📝 Walkthrough

Walkthrough

This PR introduces a router-based dispatch pattern for child workflow execution, consolidating distributed and local process handling through the SubWorkflowRunner interface. It adds structured cancellation semantics, implements a subprocess-backed local CLI runner, refactors the DAG executor to use runners, and updates wiring across the codebase.

Changes

Subworkflow Dispatch and Execution

Layer / File(s) Summary
Cancellation Contract and Intent Types
internal/runtime/executor/subworkflow.go
Adds SubWorkflowCancelMode (graceful/force), SubWorkflowCancelIntent (mode + signal), and extends SubWorkflowCancelRequest with an Intent field.
Router Pattern and Ownership Tracking
internal/subflow/router.go
Implements Router that selects the first matching runner, tracks RunID ownership, and delegates or fans out cancellation based on owner knowledge.
Router Tests and Test Doubles
internal/subflow/router_test.go
Tests router selection, fallback to local runner, ownership-aware cancellation, and unknown-owner broadcast; includes stubRunner and blockingRunner test doubles.
Local CLI Runner Implementation
internal/subflow/local_cli.go
Implements LocalCLI subprocess runner with workspace materialization, CLI command building, process tracking, env injection, and DB fallback for cancellation/status.
DAG Executor Refactoring to SubWorkflowRunner
internal/runtime/executor/dag_runner.go
Refactors SubDAGExecutor to use injected SubWorkflowRunner, replaces process/distributed tracking with activeRuns map, centralizes track/clear lifecycle, and maps termination intents to cancel requests.
DAG Runner Tests with Router Pattern
internal/runtime/executor/dag_runner_test.go
Updates tests to verify router injection, activeRuns tracking, and cancellation routed through runner or fallback to DB.
Wiring Updates and Documentation
internal/cmd/context.go, internal/engine/engine.go, internal/service/worker/remote_handler.go, internal/test/helper.go, internal/runtime/agent/agent.go, internal/subflow/runner.go
Updates factory methods to return subflow.NewRouter(dispatcher, LocalCLI), and adjusts doc-comments to use “child workflows”.

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly Related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 5.26% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'refactor: move local subworkflows behind runner' accurately and concisely describes the main change: relocating local CLI subworkflow execution into a router abstraction.
Description check ✅ Passed The PR description covers all required template sections: summary explains the key changes, changes are listed, testing is documented, and a checklist is referenced via the auto-generated summary.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch refactor-inproc-subworkflow-runner

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/runtime/executor/dag_runner.go (1)

319-379: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Close killed before snapshotting activeRuns.

Stop snapshots activeRuns and only closes e.killed afterward. A concurrent Execute/Retry can register a new run after that snapshot, pass cancellationErr, and reach Run without ever being canceled by this stop request. Gate new dispatches first, then collect/cancel the active runs.

Suggested fix
 func (e *SubDAGExecutor) Stop(intent cmdutil.TerminationIntent) error {
+	e.cancelOnce.Do(func() {
+		close(e.killed)
+	})
+
 	type activeRun struct {
 		runID  string
 		cancel context.CancelFunc
 	}
 
 	e.mu.Lock()
 	activeRuns := make([]activeRun, 0, len(e.activeRuns))
 	for runID, cancel := range e.activeRuns {
 		activeRuns = append(activeRuns, activeRun{
 			runID:  runID,
 			cancel: cancel,
 		})
 	}
 	e.mu.Unlock()
@@
-	e.cancelOnce.Do(func() {
-		close(e.killed)
-	})
-
 	return errors.Join(errs...)
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/runtime/executor/dag_runner.go` around lines 319 - 379, Modify the
stop flow to gate new dispatches by closing e.killed (using e.cancelOnce.Do to
close the channel) before snapshotting e.activeRuns so no new runs can register
after the stop begins; specifically, call e.cancelOnce.Do(func(){
close(e.killed) }) at the start of the method, then lock e.mu and copy
e.activeRuns into the local activeRuns slice, unlock, and proceed to iterate and
cancel each run (calling subWorkflowRunner.Cancel, dagCtx.DB.RequestChildCancel,
and run.cancel as currently implemented), collecting errs and returning
errors.Join(errs...).
🧹 Nitpick comments (1)
internal/runtime/executor/subworkflow.go (1)

46-49: 💤 Low value

Add doc comments for SubWorkflowCancelModeGraceful / SubWorkflowCancelModeForce (consistency).

These constants are exported but lack doc comments; revive’s exported rule isn’t enabled in the current .golangci.yml configuration, so this likely won’t fail CI.

♻️ Proposed doc comments
 const (
+	// SubWorkflowCancelModeGraceful requests a graceful stop of the child workflow.
 	SubWorkflowCancelModeGraceful SubWorkflowCancelMode = "graceful"
+	// SubWorkflowCancelModeForce requests a forced stop of the child workflow.
 	SubWorkflowCancelModeForce    SubWorkflowCancelMode = "force"
 )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/runtime/executor/subworkflow.go` around lines 46 - 49, Add proper
doc comments for the exported constants SubWorkflowCancelModeGraceful and
SubWorkflowCancelModeForce: place a sentence starting with the constant name
that describes what the mode means/does (e.g., graceful cancels allowing
cleanup, force cancels immediately), matching the style of other exported docs
in the package and ensuring both constants have GoDoc comments for consistency.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/subflow/local_cli.go`:
- Around line 132-142: validateLocalRequest currently doesn't check that
req.DAG.Location is set, causing downstream malformed CLI target errors; update
validateLocalRequest (which takes executor.SubWorkflowRequest) to return a clear
error when req.DAG is non-nil but req.DAG.Location is empty (e.g., treat it like
a required field similar to RunID/RootDAGRun checks), adding a new error
constant or reusing an appropriate existing one and returning it from
validateLocalRequest so callers fail fast with a descriptive message.
- Around line 285-290: materializeLocalWorkspace dereferences
req.Workspace.Descriptor without checking for nil which can panic; before
calling workspacebundle.Extract and before building target with
filepath.FromSlash(req.Workspace.Descriptor.DAGPath) add a nil-check for
req.Workspace and req.Workspace.Descriptor and return a descriptive error
(including req.RunID) if nil (e.g., "missing workspace descriptor for run %q"),
then proceed to call workspacebundle.Extract and construct target only when
Descriptor is non-nil.
- Around line 289-290: The target path creation using filepath.Join(dest,
filepath.FromSlash(req.Workspace.Descriptor.DAGPath)) can be escaped by absolute
paths or `..` segments in req.Workspace.Descriptor.DAGPath; sanitize and
validate the DAGPath before joining: normalize with filepath.FromSlash and
filepath.Clean, reject or strip any leading path separators or absolute paths
and reject any path that is "." or starts with "..", then join with dest and
verify the result is within dest (use filepath.Rel to ensure the relative path
does not begin with ".."); if validation fails return an error instead of
returning a target outside dest.

---

Outside diff comments:
In `@internal/runtime/executor/dag_runner.go`:
- Around line 319-379: Modify the stop flow to gate new dispatches by closing
e.killed (using e.cancelOnce.Do to close the channel) before snapshotting
e.activeRuns so no new runs can register after the stop begins; specifically,
call e.cancelOnce.Do(func(){ close(e.killed) }) at the start of the method, then
lock e.mu and copy e.activeRuns into the local activeRuns slice, unlock, and
proceed to iterate and cancel each run (calling subWorkflowRunner.Cancel,
dagCtx.DB.RequestChildCancel, and run.cancel as currently implemented),
collecting errs and returning errors.Join(errs...).

---

Nitpick comments:
In `@internal/runtime/executor/subworkflow.go`:
- Around line 46-49: Add proper doc comments for the exported constants
SubWorkflowCancelModeGraceful and SubWorkflowCancelModeForce: place a sentence
starting with the constant name that describes what the mode means/does (e.g.,
graceful cancels allowing cleanup, force cancels immediately), matching the
style of other exported docs in the package and ensuring both constants have
GoDoc comments for consistency.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4841d24c-f3c8-4a5a-a3a0-c684265f9115

📥 Commits

Reviewing files that changed from the base of the PR and between a4344a2 and 2388aba.

📒 Files selected for processing (12)
  • internal/cmd/context.go
  • internal/engine/engine.go
  • internal/runtime/agent/agent.go
  • internal/runtime/executor/dag_runner.go
  • internal/runtime/executor/dag_runner_test.go
  • internal/runtime/executor/subworkflow.go
  • internal/service/worker/remote_handler.go
  • internal/subflow/local_cli.go
  • internal/subflow/router.go
  • internal/subflow/router_test.go
  • internal/subflow/runner.go
  • internal/test/helper.go

Comment thread internal/subflow/local_cli.go
Comment thread internal/subflow/local_cli.go Outdated
Comment on lines +285 to +290
if err := workspacebundle.Extract(req.Workspace.Archive, dest, req.Workspace.Descriptor, workspacebundle.DefaultLimits()); err != nil {
cleanup()
return "", "", nil, fmt.Errorf("materialize action workspace for run %q: %w", req.RunID, err)
}
target := filepath.Join(dest, filepath.FromSlash(req.Workspace.Descriptor.DAGPath))
return dest, target, cleanup, nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Guard against nil Workspace.Descriptor before dereference.

materializeLocalWorkspace dereferences req.Workspace.Descriptor without a nil check, which can panic on malformed requests.

Suggested fix
 func validateLocalRequest(req executor.SubWorkflowRequest) error {
 	if req.DAG == nil {
 		return errMissingChildDAG
 	}
 	if req.RunID == "" {
 		return errRunIDNotSet
 	}
 	if req.RootDAGRun.Zero() {
 		return errRootRunNotSet
 	}
+	if req.Workspace != nil && req.Workspace.Descriptor == nil {
+		return fmt.Errorf("workspace descriptor is required")
+	}
 	return nil
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if err := workspacebundle.Extract(req.Workspace.Archive, dest, req.Workspace.Descriptor, workspacebundle.DefaultLimits()); err != nil {
cleanup()
return "", "", nil, fmt.Errorf("materialize action workspace for run %q: %w", req.RunID, err)
}
target := filepath.Join(dest, filepath.FromSlash(req.Workspace.Descriptor.DAGPath))
return dest, target, cleanup, nil
func validateLocalRequest(req executor.SubWorkflowRequest) error {
if req.DAG == nil {
return errMissingChildDAG
}
if req.RunID == "" {
return errRunIDNotSet
}
if req.RootDAGRun.Zero() {
return errRootRunNotSet
}
if req.Workspace != nil && req.Workspace.Descriptor == nil {
return fmt.Errorf("workspace descriptor is required")
}
return nil
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/subflow/local_cli.go` around lines 285 - 290,
materializeLocalWorkspace dereferences req.Workspace.Descriptor without checking
for nil which can panic; before calling workspacebundle.Extract and before
building target with filepath.FromSlash(req.Workspace.Descriptor.DAGPath) add a
nil-check for req.Workspace and req.Workspace.Descriptor and return a
descriptive error (including req.RunID) if nil (e.g., "missing workspace
descriptor for run %q"), then proceed to call workspacebundle.Extract and
construct target only when Descriptor is non-nil.

Comment thread internal/subflow/local_cli.go Outdated
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 12 files

You’re at about 99% of the monthly reviewed-line limit. You may want to disable incremental reviews to conserve quota. Reviews will continue until that limit is exceeded. If you need help avoiding interruptions, please contact contact@cubic.dev.

Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic

Comment thread internal/subflow/local_cli.go Outdated
@yohamta0
Copy link
Copy Markdown
Collaborator Author

yohamta0 commented Jun 1, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@yohamta0 yohamta0 merged commit dd88f53 into main Jun 1, 2026
11 checks passed
@yohamta0 yohamta0 deleted the refactor-inproc-subworkflow-runner branch June 1, 2026 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant