diff --git a/examples/interaction-pattern-catalog-smoke.py b/examples/interaction-pattern-catalog-smoke.py index 0b843e5b..e83d34b2 100644 --- a/examples/interaction-pattern-catalog-smoke.py +++ b/examples/interaction-pattern-catalog-smoke.py @@ -127,6 +127,7 @@ def main() -> int: "handoff_gate_state_projection_gap", "plan_todo_writeback_gap", "connector_runtime_boundary_gap", + "shell_pr_comment_command_substitution", "The correction stayed in chat/model belief", "Agent-scoped quota does not distinguish \"goal has runnable work\" from \"this agent has runnable work\"", "Deferred work was modeled as absence of todo instead of a gate-resume lifecycle condition", @@ -134,6 +135,8 @@ def main() -> int: "`todo_handoff_gate_v0`", "Agent used chat as memory after understanding the plan", "Connector safety lived in prose or posthoc packet fields", + "Markdown backticks were passed through a double-quoted `gh ... --body", + "Use a single-quoted body when safe, stdin or `--body-file`", "User-gate projection treated agent-scoped routing metadata as diagnostic text", "refresh state so `quota should-run` selects the corrected rule", ], diff --git a/skills/loopx-self-repair/references/repair-patterns.md b/skills/loopx-self-repair/references/repair-patterns.md index 9ff5e392..7c50aef4 100644 --- a/skills/loopx-self-repair/references/repair-patterns.md +++ b/skills/loopx-self-repair/references/repair-patterns.md @@ -28,6 +28,7 @@ teaches a reusable control-plane lesson. | `scheduler_liveness_backoff_gap` | `quota should-run` says `should_run=false`, `quiet_noop_allowed=true`, no spend, or `automation_action=keep_active_quiet`, but the host keeps polling at the bootstrap cadence for hours; Codex CLI TUI or Claude Code `/loop` also keeps repeating unchanged checks. | quota payload `automation_liveness`, `interaction_contract`, `heartbeat_recommendation`, host automation cadence, CLI/Claude loop logs or statusline, recent no-spend poll history. | LoopX separated execution permission from user notification, but did not expose a machine-readable next-wakeup/backoff/final-check/exit policy for host runtimes. | Add or repair `quota should-run.scheduler_hint`; host runtimes must apply Codex App cadence backoff and CLI/Claude unchanged-poll final quota/replan check before self-stop, without quota spend. Prompt/docs should treat fixed 3-minute cadence as bootstrap only. | | `scheduler_reset_application_gap` | Codex App heartbeat cadence backs off, but after user feedback, a new/reassigned todo, resolved gate, material transition, or `run_now` quota state, the automation remains at the slower RRULE. | current automation RRULE, quota payload `scheduler_hint.codex_app.recommended_rrule`, `scheduler_hint.reset_policy.reset_token`, `scheduler_hint.reset_policy.codex_app_initial_rrule`, recent user interaction/todo/history transition. | The quota payload exposed reset conditions, but the host or generated prompt did not give a compact reset token plus direct RRULE application path, so workers could lower cadence without knowing when to restore it. | Add `reset_policy.reset_token` and direct Codex App RRULE fields to `scheduler_hint`; make heartbeat prompts/skills tell agents to update automation RRULEs and restore to the profile initial RRULE when the token changes or active work is projected. Cover backoff-to-`run_now` with a smoke. | | `skill_cli_contract_drift` | A skill, installed release snapshot, generated prompt, smoke fixture, or agent handbook suggests a CLI form that the current parser rejects, such as `loopx status --goal-id`. | `loopx --help`, failing command stderr, `rg` over `skills/`, installed skill copies, release snapshot, generated prompts, and relevant smoke fixtures. | Process documentation drifted from the CLI contract, or a fixture preserved old command text without labeling it as legacy/non-executable. | Update the durable skill/source docs and installed skill copy, add or update a focused smoke that rejects the stale command in executable instructions, and keep legacy fixture strings only when the test explicitly asserts they are untrusted historical payload. | +| `shell_pr_comment_command_substitution` | A PR comment, issue body, or review body is posted with shell command substitution artifacts because Markdown backticks were passed through a double-quoted `gh ... --body "..."` argument. | shell command used for the PR/comment operation, rendered public comment body, recent agent transcript, and PR review/comment hygiene instructions. | Agent treated Markdown as inert text inside a shell string; the shell evaluated backticks before `gh` received the body. | Do not pass Markdown-rich bodies through double-quoted shell arguments. Use a single-quoted body when safe, stdin or `--body-file` for multiline text, or a GitHub connector/API call that bypasses shell interpolation; add a smoke or checklist item when the workflow recurs. | | `loop_surface_activation_gap` | A thread reports LoopX setup success because `register-agent` or `quota should-run --agent-id` works, but no Codex App heartbeat automation, Codex CLI `/goal`, or other host loop has been installed for that goal. | `upgrade-plan` / host-loop activation status, `$CODEX_HOME/automations/*/automation.toml`, generated `heartbeat-prompt`, target thread id, recent setup final answer, registry coordination fields. | The control-plane identity layer was treated as the host scheduler layer; setup success criteria stopped at registry/quota instead of proving `host_loop_activation.activated=true` or surfacing a concrete host-tool gate. | Project setup and agent registration must surface `host_loop_activation`; create/update the host loop from generated scoped `heartbeat-prompt`, or report the exact missing host capability. Do not claim "connected" as autonomous until the host loop is active. | | `tiny_turn_under_delivery` | Many heartbeats make only one small doc/prose edit despite quota permitting a coherent batch. | recent history durations, active priority stack, user feedback, todo completion rate. | Agent optimized for avoiding mistakes rather than delivering a bounded batch. | Run steering audit, choose a larger verifiable batch, and record why it remains within boundary. Consider cadence/budget policy updates if repeated. | | `todo_succession_gap` | A non-trivial feature todo is marked done after PR/smoke success, but rollout, product-path audit, docs, telemetry, benchmark proof, or operator-decision follow-up is only mentioned in prose or not recorded at all. | completed todo note/evidence, recent PR/run history, active `Agent Todo`, active `Next Action`, related design docs. | Agent treated an implementation slice as feature completion, or LoopX lacked a reminder to create the successor todo. | Keep the lifecycle simple: do not add many feature states. Complete the old todo and create the next concrete agent/user todo with `--next-agent-todo`/`todo add`, or record a compact no-follow-up rationale. |