awf-project · pocky · Jun 4, 2026 · Jun 3, 2026
diff --git a/.zpm/kb/pr_feature_f103_codex_provider_jsonl_output_parity/journal.wal b/.zpm/kb/pr_feature_f103_codex_provider_jsonl_output_parity/journal.wal
diff --git a/.zpm/kb/pr_feature_f103_codex_provider_jsonl_output_parity/knowledge.pl b/.zpm/kb/pr_feature_f103_codex_provider_jsonl_output_parity/knowledge.pl
@@ -0,0 +1,62 @@
+:- module(pr_feature_f103_codex_provider_jsonl_output_parity, []).
+% ─── PR Tracking Schema ──────────────────────────────────────────────────────
+% Memory segment: pr_<branch>
+% Lifecycle: created at implement start, gated before commit, archived on merge.
+%
+% Facts (asserted by scan scripts and LLM):
+%   pr_file(Path, ChangeType)        — file in PR scope (changed | added | test)
+%   todo(Id, File, Line, Desc)       — TODO/FIXME found in changed code
+%   stub(Id, File, Symbol)           — stub/placeholder implementation
+%   mock(Id, File, Symbol)           — mock that should be replaced with real impl
+%   not_impl(Id, File, Desc)         — "not yet implemented" marker
+%   resolved(Type, Id)               — marks a tracked issue as resolved
+%
+% Dynamic declarations (required by Trealla Prolog for runtime assertion).
+:- dynamic(pr_file/2).
+:- dynamic(todo/4).
+:- dynamic(stub/3).
+:- dynamic(mock/3).
+:- dynamic(not_impl/3).
+:- dynamic(resolved/2).
+
+% ─── Unresolved queries ─────────────────────────────────────────────────────
+% Convenience predicates for querying unresolved issues by type.
+unresolved_todo(Id, File, Line, Desc) :-
+    todo(Id, File, Line, Desc), \+ resolved(todo, Id).
+unresolved_stub(Id, File, Symbol) :-
+    stub(Id, File, Symbol), \+ resolved(stub, Id).
+unresolved_mock(Id, File, Symbol) :-
+    mock(Id, File, Symbol), \+ resolved(mock, Id).
+unresolved_not_impl(Id, File, Desc) :-
+    not_impl(Id, File, Desc), \+ resolved(not_impl, Id).
+
+% A blocking issue is any tracked issue that has not been resolved.
+blocking_issue(Id, todo, File, Desc) :-
+    todo(Id, File, _, Desc), \+ resolved(todo, Id).
+blocking_issue(Id, stub, File, Symbol) :-
+    stub(Id, File, Symbol), \+ resolved(stub, Id).
+blocking_issue(Id, mock, File, Symbol) :-
+    mock(Id, File, Symbol), \+ resolved(mock, Id).
+blocking_issue(Id, not_impl, File, Desc) :-
+    not_impl(Id, File, Desc), \+ resolved(not_impl, Id).
+
+% PR is ready ONLY when zero blocking issues remain.
+pr_ready :- \+ blocking_issue(_, _, _, _).
+
+% Health summary — counts by category.
+pr_health(blocking, N) :-
+    findall(I, blocking_issue(I, _, _, _), L), length(L, N).
+pr_health(resolved, N) :-
+    findall(I, resolved(_, I), L), length(L, N).
+pr_health(files, N) :-
+    findall(F, pr_file(F, _), L), length(L, N).
+
+% Coverage gap: source file changed without corresponding test file.
+coverage_gap(File) :-
+    pr_file(File, changed),
+    \+ pr_file(File, test),
+    \+ test_file(File, _).
+
+% List all blocking issues as Id-Type-File-Desc tuples.
+all_blockers(Blockers) :-
+    findall(blocker(Id, Type, File, Desc), blocking_issue(Id, Type, File, Desc), Blockers).
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+### Fixed
+
+- **F103**: Codex provider output parity — `state.Output` and `ConversationResult.Output` for the `codex` provider now contain clean aggregated assistant text instead of raw NDJSON, regardless of `output_format` value (`json`, `stream-json`, `text`, or absent), closing the gap left by B015 in 0.7.1 which only covered Claude, Gemini, and OpenCode. Aggregation is presence-aware: when the NDJSON stream contains ≥1 `assistant_message` event, the extracted text overwrites `Output` (empty `assistant_message` yields `Output == ""`); when no events are parsed (plain-text mocks, pre-result lifecycle events only), the base output is preserved. `state.Response` and `ConversationResult.Response` are now populated via `tryParseJSONResponse` on the aggregated assistant text when it is a valid JSON object — matching Gemini/OpenCode semantics. Estimated token recount runs on the extracted text rather than the raw NDJSON length. Conversation parity achieved through a Codex-targeted override in `ExecuteConversation` plus an `extractTextContent` hook wired into `cliProviderHooks`; base provider and other CLI providers (Claude, Gemini, OpenCode, GitHub Copilot) are bit-identical. NUL bytes, multi-event streams, empty assistant messages, and non-JSON plain text are all handled without panic or truncation. Workflow authors can now reference `{{.states.codex_step.Output}}` and `{{.states.codex_step.Response}}` with the same semantics as other providers, unblocking multi-step workflows that pipe Codex output downstream.
+
 ## [0.10.0] - 2026-06-01
 
 ### Changed

diff --git a/README.md b/README.md
@@ -10,7 +10,7 @@ A Go CLI tool for orchestrating AI agents (Claude, Gemini, Codex, GitHub Copilot
 - **State Machine Execution** - Define workflows as state machines with conditional transitions based on exit codes, command output, or custom expressions
 - **Inline Error Handling** - Specify error messages and exit codes directly on steps without creating separate terminal states
 - **Agent Steps** - Invoke AI agents via CLI tools (Claude, Codex, Gemini, GitHub Copilot) or direct HTTP (OpenAI, Ollama, vLLM, Groq) with prompt templates, response parsing, and accurate token tracking
-- **Output Formatting for Agent Steps** - Automatically strip markdown code fences and validate JSON output; human-readable streaming display controlled by `output_format` field (text vs raw NDJSON); unified display-event abstraction across all 6 providers with optional verbose mode showing tool-use markers (`[tool: Name(Arg)]`)
+- **Output Formatting for Agent Steps** - Automatically strip markdown code fences and validate JSON output; human-readable streaming display controlled by `output_format` field (text vs raw NDJSON); unified display-event abstraction across all 6 providers with optional verbose mode showing tool-use markers (`[tool: Name(Arg)]`); automatic `state.Response` population for structured JSON output (heuristic across all providers)
 - **Agent Skills** - Inject deterministic domain knowledge into agent steps via `skills:` declarations in workflow YAML; filesystem-based multi-directory discovery (project `.awf/skills/`, `.agents/skills/`, `.claude/skills/`, XDG global) with priority ordering; SKILL.md frontmatter stripping and agentskills.io-compliant `<skill_content>` structured wrapping with bundled resource enumeration; validated by `awf validate`
 - **Agent Roles** - Inject reusable personas into agent steps via `role:` field referencing AGENTS.md files; filesystem-based multi-directory discovery in a dedicated `roles/` namespace (project `.awf/roles/`, `.agents/roles/`, `$XDG_CONFIG_HOME/awf/roles/`, `~/.agents/roles/`) with priority ordering and `AWF_ROLES_PATH` exclusive override; AGENTS.md frontmatter stripping and system prompt injection with optional composition via inline `system_prompt` field; validated by `awf validate`
 - **External Prompt Files** - Load agent prompts from `.md` files with full template interpolation, helper functions, and local override support

diff --git a/docs/reference/interpolation.md b/docs/reference/interpolation.md
@@ -159,35 +159,44 @@ transitions:
 
 **Note**: `TokensUsed` replaced deprecated `states.step_name.Tokens` field. If migrating from earlier versions, update workflow YAML expressions from `{{.states.step_name.Tokens}}` to `{{.states.step_name.TokensUsed}}`.
 
-#### Response (Operation Outputs)
+#### Response (Structured Outputs)
 
-Operation steps (e.g., `github.get_issue`, `http.request`) return structured data accessible via `Response`:
+Both agent steps and operation steps return structured data accessible via `Response` when the output is valid JSON.
+
+**Agent steps** automatically populate `Response` when the agent's output contains valid JSON (regardless of `output_format` setting):
+
+```yaml
+{{.states.step_name.Response.field}}        # Parsed field from agent JSON output
+{{.states.step_name.Response}}              # Full parsed JSON object
+```
+
+**Operation steps** (e.g., `github.get_issue`, `http.request`) always return structured data via `Response`:
 
 ```yaml
 {{.states.step_name.Response.title}}       # Parsed field from operation result
 {{.states.step_name.Response.number}}      # Numeric field
 {{.states.step_name.Response.labels}}      # Array field
 ```
 
-Use `Output` for raw JSON, `Response.field` for parsed fields:
+Example with agent step returning JSON:
 
 ```yaml
 states:
-  initial: get_issue
-
-  get_issue:
-    type: operation
-    operation: github.get_issue
-    inputs:
-      number: "{{.inputs.issue_number}}"
-    on_success: show_title
+  initial: analyze
+
+  analyze:
+    type: agent
+    provider: claude
+    prompt: "Return JSON analysis with 'issues' and 'severity' fields"
+    on_success: process
     on_failure: error
 
-  show_title:
+  process:
     type: step
-    command: echo "Issue: {{.states.get_issue.Response.title}}"
+    command: |
+      echo "Severity: {{.states.analyze.Response.severity}}"
+      echo "Issues: {{.states.analyze.Response.issues}}"
     on_success: done
-    on_failure: error
 
   done:
     type: terminal
@@ -196,6 +205,15 @@ states:
     status: failure
 ```
 
+If agent returns:
+```json
+{"issues": ["buffer overflow", "memory leak"], "severity": "high"}
+```
+
+Then:
+- `{{.states.analyze.Response.severity}}` = `"high"`
+- `{{.states.analyze.Response.issues}}` = `["buffer overflow", "memory leak"]`
+
 **HTTP operation outputs** follow the same pattern:
 
 ```yaml
@@ -205,7 +223,12 @@ states:
 {{.states.step_name.Response.body_truncated}}    # true if body was truncated
 ```
 
-See [Workflow Syntax - Operation State](../user-guide/workflow-syntax.md#operation-state) for the full list of available operations and their output fields.
+**Difference from `JSON`:**
+- `Response` is populated automatically for all agent outputs if valid JSON is detected (heuristic)
+- `JSON` is only populated when `output_format: json` is explicitly set on the agent step
+- Use `Response.field` for automatic best-effort parsing; use `JSON.field` for explicit, validated JSON output
+
+See [Agent Steps - Output Formatting](../user-guide/agent-steps.md#output-formatting) for examples and [Workflow Syntax - Operation State](../user-guide/workflow-syntax.md#operation-state) for available operations.
 
 #### JSON (Explicit Output Formatting)
 
@@ -217,9 +240,13 @@ When an agent step uses `output_format: json`, the parsed JSON is accessible via
 ```
 
 **Key differences from `Response`:**
-- `JSON` is only populated when `output_format: json` is explicitly set on the agent step
-- `Response` is populated automatically for all agent outputs if valid JSON is detected (heuristic)
-- `JSON` represents explicitly formatted output; `Response` is automatic best-effort parsing
+- `JSON` is only populated when `output_format: json` is explicitly set on the agent step (and validation passes)
+- `Response` is populated automatically for all agent outputs if valid JSON is detected (heuristic, regardless of `output_format`)
+- Use `JSON.field` for strict, validated output; use `Response.field` for lenient, automatic parsing
+
+**When to use each:**
+- **`JSON.field`**: You explicitly requested `output_format: json` and want strict validation
+- **`Response.field`**: You want to access JSON from any agent output without requiring `output_format: json`
 
 Example with `output_format: json`:
 
@@ -238,8 +265,9 @@ states:
   process:
     type: step
     command: |
-      echo "Severity: {{.states.analyze.JSON.severity}}"
-      echo "Issues: {{.states.analyze.JSON.issues}}"
+      # Both work — JSON is strict, Response is lenient
+      echo "Severity (JSON): {{.states.analyze.JSON.severity}}"
+      echo "Severity (Response): {{.states.analyze.Response.severity}}"
     on_success: done
 
   done:
@@ -254,9 +282,14 @@ If agent returns:
 {"issues": ["buffer overflow", "memory leak"], "severity": "high"}
 ```
 
-Then:
+Then both work identically:
 - `{{.states.analyze.JSON.severity}}` = `"high"`
-- `{{.states.analyze.JSON.issues}}` = `["buffer overflow", "memory leak"]`
+- `{{.states.analyze.Response.severity}}` = `"high"`
+
+**Difference becomes apparent when `output_format: json` is omitted:**
+- Without `output_format: json`, JSON validation is skipped
+- `JSON` remains empty (never populated)
+- `Response` still populates if valid JSON is detected
 
 See [Agent Steps - Output Formatting](../user-guide/agent-steps.md#output-formatting) for detailed examples and best practices.
 

diff --git a/docs/user-guide/agent-steps.md b/docs/user-guide/agent-steps.md
@@ -1132,8 +1132,8 @@ Agent responses are automatically captured in the execution state:
 
 | Field | Type | Description |
 |-------|------|-------------|
-| `{{.states.step_name.Output}}` | string | Raw response text (or cleaned text if `output_format` is set) |
-| `{{.states.step_name.Response}}` | object | Parsed JSON response (automatic heuristic) |
+| `{{.states.step_name.Output}}` | string | Aggregated assistant text. For CLI providers emitting NDJSON (Claude, Codex, Gemini, OpenCode, GitHub Copilot), the raw stream is extracted to clean text regardless of `output_format`; for HTTP providers, the response body text. `output_format: json` additionally strips markdown code fences. |
+| `{{.states.step_name.Response}}` | object | Parsed JSON response (automatic heuristic — populated when the assistant text is a valid JSON object) |
 | `{{.states.step_name.JSON}}` | object | Parsed JSON from `output_format: json` (explicit, see [Output Formatting](#output-formatting)) |
 | `{{.states.step_name.TokensUsed}}` | int | Total tokens consumed by this step |
 | `{{.states.step_name.TokensInput}}` | int | Input tokens (prompt + context). `0` in single-turn mode. |
@@ -1165,10 +1165,11 @@ process_response:
 
 ## Output Formatting
 
-The `output_format` field serves two purposes:
+The `output_format` field serves three purposes:
 
-1. **Post-processing**: Strips markdown code fences and optionally validates JSON (F065)
-2. **Display filtering**: Controls how agent responses appear on terminal during streaming and buffered execution, with optional verbose tool-use markers (F082, F085)
+1. **Text Extraction**: For CLI providers, extracts clean assistant text from streaming event logs (F103, F082)
+2. **Post-processing**: Strips markdown code fences and optionally validates JSON (F065)
+3. **Display filtering**: Controls how agent responses appear on terminal during streaming and buffered execution, with optional verbose tool-use markers (F082, F085)
 
 When an agent wraps its output in markdown code fences (common with many LLMs), use `output_format` to automatically strip the fences and optionally validate the content:
 
@@ -1210,6 +1211,7 @@ process_results:
 - Strips outermost markdown code fences (e.g., ````json ... ``` ``)
 - Validates stripped content as valid JSON
 - Stores parsed JSON in `{{.states.step_name.JSON}}`
+- Automatically populates `{{.states.step_name.Response}}` with parsed JSON (available across all providers)
 - If validation fails, step fails with a descriptive error
 - Works with both objects and arrays
 
@@ -1225,6 +1227,8 @@ The analysis shows the following:
 - `{{.states.analyze.Output}}` = `{"issues": ["buffer overflow", "memory leak"], "severity": "high"}`
 - `{{.states.analyze.JSON.issues}}` = `["buffer overflow", "memory leak"]`
 - `{{.states.analyze.JSON.severity}}` = `"high"`
+- `{{.states.analyze.Response.issues}}` = `["buffer overflow", "memory leak"]` (same as JSON)
+- `{{.states.analyze.Response.severity}}` = `"high"` (same as JSON)
 
 #### `text` Format
 
@@ -1248,6 +1252,7 @@ save_code:
 - Strips outermost markdown code fences (e.g., ````python ... ``` ``)
 - Returns clean text in `{{.states.step_name.Output}}`
 - Does not populate `{{.states.step_name.JSON}}`
+- Automatically populates `{{.states.step_name.Response}}` if the output happens to be valid JSON (heuristic)
 
 **Example agent output:**
 ```
@@ -1262,6 +1267,8 @@ def fibonacci(n):
 
 **After processing:**
 - `{{.states.generate_code.Output}}` = `def fibonacci(n):\n    if n <= 1:\n        return n\n    return fibonacci(n-1) + fibonacci(n-2)`
+- `{{.states.generate_code.JSON}}` = empty/not populated (no `output_format: json`)
+- `{{.states.generate_code.Response}}` = populated only if output happens to be valid JSON
 
 #### No Format (Default)
 
@@ -1279,10 +1286,15 @@ analyze:
 
 The `output_format` field controls how agent responses appear on the terminal (F082). Additionally, the `--verbose` flag displays tool-use activity markers (F085) — showing which tools the agent invoked — alongside agent output when running with `awf run --output streaming` or `--output buffered`:
 
-| `output_format` | Streaming Display | Buffered Display | Raw Storage |
+| `output_format` | Streaming Display | Buffered Display | `state.Output` Storage |
 |---|---|---|---|
-| `text` (or omitted) | Human-readable filtered text | Filtered text in summary | Raw NDJSON |
-| `json` | Raw NDJSON (unfiltered) | Raw NDJSON (unfiltered) | Raw NDJSON |
+| `text` (or omitted) | Human-readable filtered text | Filtered text in summary | Extracted assistant text |
+| `json` | Raw NDJSON (unfiltered) | Raw NDJSON (unfiltered) | Extracted assistant text (markdown code fences stripped) |
+
+**Output Parity Across All Providers (F103):** For CLI-based providers (Claude, Codex, Gemini, OpenCode), the system automatically extracts clean assistant text from NDJSON events and stores it in `state.Output`. Additionally:
+- `state.Response` is automatically populated when the output is valid JSON (heuristic, regardless of `output_format`)
+- Both `output_format: json` and omitted formats produce semantically equivalent output shapes across all providers
+- Codex now has feature parity with Claude, Gemini, and OpenCode for output handling
 
 #### Streaming Mode (`--output streaming`)
 
@@ -1347,10 +1359,10 @@ Silent mode suppresses all display regardless of `output_format`:
 ```bash
 awf run code-review --output silent
 # No output displayed (silent mode is absolute)
-# state.Output still contains raw NDJSON for template interpolation
+# state.Output still contains the aggregated assistant text for template interpolation
 ```
 
-**Note:** `state.Output` always contains the raw NDJSON regardless of display filtering. Filtering only affects terminal display, not data storage.
+**Note:** `state.Output` is populated independently of display filtering. For CLI providers emitting NDJSON (Claude, Codex, Gemini, OpenCode, GitHub Copilot), the assistant text is extracted from the raw stream and stored — so `{{.states.step.Output}}` resolves to clean text regardless of `output_format`. Filtering only affects terminal display, not data storage.
 
 #### Provider Event Cadence