Skip to content

feat(runtime): replan oversized tasks after repeated recovery failures #104

Description

@jafreck

Problem

Some Phase 4 tasks appear too large for a single code-migrator invocation to converge, even with failure recovery loops (failure-adjudicator -> code-migrator -> parity/build/test re-check). In these cases, retries can repeat without materially reducing the unresolved issue set.

Recent behavior shows task-001 repeatedly failing parity with non-minor issues despite multiple recovery attempts, eventually ending in terminal exhaustion.

Goal

Add an optional replanning stage that triggers after repeated recovery failures, so AAMF can split or re-scope an oversized task instead of repeatedly remediating the same broad unit.

Proposed behavior

When a task hits repeated recovery attempts (e.g., parity/build/test recovery loop) and unresolved non-minor issues persist:

  1. Detect likely non-convergence (same/similar unresolved issue set across attempts).
  2. Trigger replanning for the failing task scope (behind a config flag).
  3. Produce replacement sub-tasks (narrower line ranges / smaller file subsets).
  4. Inject sub-tasks into the Phase 4 queue with dependency-safe rewiring.
  5. Continue migration from the newly decomposed tasks.

Initial trigger criteria (proposal)

  • Recovery attempts >= configurable threshold (default 2 or 3)
  • AND unresolved non-minor issues remain
  • AND issue-set delta indicates low progress (high overlap with prior attempt)

Config proposal

Add options such as:

  • options.replanning.enabled: boolean (default false)
  • options.replanning.triggerAttempts: number
  • options.replanning.maxSubtasks: number
  • options.replanning.minIssueOverlapForTrigger: number

Implementation notes

  • Reuse existing task-decomposition contracts where possible (task-decomposer schema).
  • Record replanning events in checkpoint/progress for resume safety and observability.
  • Ensure deterministic IDs for generated sub-tasks (e.g., task-001a, task-001b or stable hashed suffixes).
  • Preserve existing behavior when replanning is disabled.

Acceptance criteria

  • Replanning can be enabled/disabled via config; default behavior unchanged.
  • On repeated recovery non-convergence, a failing task is decomposed into smaller tasks.
  • New tasks are inserted and executed with correct dependency semantics.
  • Checkpoint resume supports runs that have already replanned.
  • Progress/logging clearly indicates replanning trigger and resulting subtasks.
  • Tests cover trigger detection, queue mutation, resume behavior, and disabled mode.

Nice-to-have follow-ups

  • Surface replanning metrics (trigger count, avg subtasks created, convergence delta).
  • Add guardrails to avoid repeated replan loops on the same original task.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions