Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
166 changes: 166 additions & 0 deletions .zpm/kb/pr_feature_f103_codex_provider_jsonl_output_parity/journal.wal

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
:- module(pr_feature_f103_codex_provider_jsonl_output_parity, []).
% ─── PR Tracking Schema ──────────────────────────────────────────────────────
% Memory segment: pr_<branch>
% Lifecycle: created at implement start, gated before commit, archived on merge.
%
% Facts (asserted by scan scripts and LLM):
% pr_file(Path, ChangeType) — file in PR scope (changed | added | test)
% todo(Id, File, Line, Desc) — TODO/FIXME found in changed code
% stub(Id, File, Symbol) — stub/placeholder implementation
% mock(Id, File, Symbol) — mock that should be replaced with real impl
% not_impl(Id, File, Desc) — "not yet implemented" marker
% resolved(Type, Id) — marks a tracked issue as resolved
%
% Dynamic declarations (required by Trealla Prolog for runtime assertion).
:- dynamic(pr_file/2).
:- dynamic(todo/4).
:- dynamic(stub/3).
:- dynamic(mock/3).
:- dynamic(not_impl/3).
:- dynamic(resolved/2).

% ─── Unresolved queries ─────────────────────────────────────────────────────
% Convenience predicates for querying unresolved issues by type.
unresolved_todo(Id, File, Line, Desc) :-
todo(Id, File, Line, Desc), \+ resolved(todo, Id).
unresolved_stub(Id, File, Symbol) :-
stub(Id, File, Symbol), \+ resolved(stub, Id).
unresolved_mock(Id, File, Symbol) :-
mock(Id, File, Symbol), \+ resolved(mock, Id).
unresolved_not_impl(Id, File, Desc) :-
not_impl(Id, File, Desc), \+ resolved(not_impl, Id).

% A blocking issue is any tracked issue that has not been resolved.
blocking_issue(Id, todo, File, Desc) :-
todo(Id, File, _, Desc), \+ resolved(todo, Id).
blocking_issue(Id, stub, File, Symbol) :-
stub(Id, File, Symbol), \+ resolved(stub, Id).
blocking_issue(Id, mock, File, Symbol) :-
mock(Id, File, Symbol), \+ resolved(mock, Id).
blocking_issue(Id, not_impl, File, Desc) :-
not_impl(Id, File, Desc), \+ resolved(not_impl, Id).

% PR is ready ONLY when zero blocking issues remain.
pr_ready :- \+ blocking_issue(_, _, _, _).

% Health summary — counts by category.
pr_health(blocking, N) :-
findall(I, blocking_issue(I, _, _, _), L), length(L, N).
pr_health(resolved, N) :-
findall(I, resolved(_, I), L), length(L, N).
pr_health(files, N) :-
findall(F, pr_file(F, _), L), length(L, N).

% Coverage gap: source file changed without corresponding test file.
coverage_gap(File) :-
pr_file(File, changed),
\+ pr_file(File, test),
\+ test_file(File, _).

% List all blocking issues as Id-Type-File-Desc tuples.
all_blockers(Blockers) :-
findall(blocker(Id, Type, File, Desc), blocking_issue(Id, Type, File, Desc), Blockers).
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Fixed

- **F103**: Codex provider output parity — `state.Output` and `ConversationResult.Output` for the `codex` provider now contain clean aggregated assistant text instead of raw NDJSON, regardless of `output_format` value (`json`, `stream-json`, `text`, or absent), closing the gap left by B015 in 0.7.1 which only covered Claude, Gemini, and OpenCode. Aggregation is presence-aware: when the NDJSON stream contains ≥1 `assistant_message` event, the extracted text overwrites `Output` (empty `assistant_message` yields `Output == ""`); when no events are parsed (plain-text mocks, pre-result lifecycle events only), the base output is preserved. `state.Response` and `ConversationResult.Response` are now populated via `tryParseJSONResponse` on the aggregated assistant text when it is a valid JSON object — matching Gemini/OpenCode semantics. Estimated token recount runs on the extracted text rather than the raw NDJSON length. Conversation parity achieved through a Codex-targeted override in `ExecuteConversation` plus an `extractTextContent` hook wired into `cliProviderHooks`; base provider and other CLI providers (Claude, Gemini, OpenCode, GitHub Copilot) are bit-identical. NUL bytes, multi-event streams, empty assistant messages, and non-JSON plain text are all handled without panic or truncation. Workflow authors can now reference `{{.states.codex_step.Output}}` and `{{.states.codex_step.Response}}` with the same semantics as other providers, unblocking multi-step workflows that pipe Codex output downstream.

## [0.10.0] - 2026-06-01

### Changed
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ A Go CLI tool for orchestrating AI agents (Claude, Gemini, Codex, GitHub Copilot
- **State Machine Execution** - Define workflows as state machines with conditional transitions based on exit codes, command output, or custom expressions
- **Inline Error Handling** - Specify error messages and exit codes directly on steps without creating separate terminal states
- **Agent Steps** - Invoke AI agents via CLI tools (Claude, Codex, Gemini, GitHub Copilot) or direct HTTP (OpenAI, Ollama, vLLM, Groq) with prompt templates, response parsing, and accurate token tracking
- **Output Formatting for Agent Steps** - Automatically strip markdown code fences and validate JSON output; human-readable streaming display controlled by `output_format` field (text vs raw NDJSON); unified display-event abstraction across all 6 providers with optional verbose mode showing tool-use markers (`[tool: Name(Arg)]`)
- **Output Formatting for Agent Steps** - Automatically strip markdown code fences and validate JSON output; human-readable streaming display controlled by `output_format` field (text vs raw NDJSON); unified display-event abstraction across all 6 providers with optional verbose mode showing tool-use markers (`[tool: Name(Arg)]`); automatic `state.Response` population for structured JSON output (heuristic across all providers)
- **Agent Skills** - Inject deterministic domain knowledge into agent steps via `skills:` declarations in workflow YAML; filesystem-based multi-directory discovery (project `.awf/skills/`, `.agents/skills/`, `.claude/skills/`, XDG global) with priority ordering; SKILL.md frontmatter stripping and agentskills.io-compliant `<skill_content>` structured wrapping with bundled resource enumeration; validated by `awf validate`
- **Agent Roles** - Inject reusable personas into agent steps via `role:` field referencing AGENTS.md files; filesystem-based multi-directory discovery in a dedicated `roles/` namespace (project `.awf/roles/`, `.agents/roles/`, `$XDG_CONFIG_HOME/awf/roles/`, `~/.agents/roles/`) with priority ordering and `AWF_ROLES_PATH` exclusive override; AGENTS.md frontmatter stripping and system prompt injection with optional composition via inline `system_prompt` field; validated by `awf validate`
- **External Prompt Files** - Load agent prompts from `.md` files with full template interpolation, helper functions, and local override support
Expand Down
77 changes: 55 additions & 22 deletions docs/reference/interpolation.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,35 +159,44 @@ transitions:

**Note**: `TokensUsed` replaced deprecated `states.step_name.Tokens` field. If migrating from earlier versions, update workflow YAML expressions from `{{.states.step_name.Tokens}}` to `{{.states.step_name.TokensUsed}}`.

#### Response (Operation Outputs)
#### Response (Structured Outputs)

Operation steps (e.g., `github.get_issue`, `http.request`) return structured data accessible via `Response`:
Both agent steps and operation steps return structured data accessible via `Response` when the output is valid JSON.

**Agent steps** automatically populate `Response` when the agent's output contains valid JSON (regardless of `output_format` setting):

```yaml
{{.states.step_name.Response.field}} # Parsed field from agent JSON output
{{.states.step_name.Response}} # Full parsed JSON object
```

**Operation steps** (e.g., `github.get_issue`, `http.request`) always return structured data via `Response`:

```yaml
{{.states.step_name.Response.title}} # Parsed field from operation result
{{.states.step_name.Response.number}} # Numeric field
{{.states.step_name.Response.labels}} # Array field
```

Use `Output` for raw JSON, `Response.field` for parsed fields:
Example with agent step returning JSON:

```yaml
states:
initial: get_issue

get_issue:
type: operation
operation: github.get_issue
inputs:
number: "{{.inputs.issue_number}}"
on_success: show_title
initial: analyze

analyze:
type: agent
provider: claude
prompt: "Return JSON analysis with 'issues' and 'severity' fields"
on_success: process
on_failure: error

show_title:
process:
type: step
command: echo "Issue: {{.states.get_issue.Response.title}}"
command: |
echo "Severity: {{.states.analyze.Response.severity}}"
echo "Issues: {{.states.analyze.Response.issues}}"
on_success: done
on_failure: error

done:
type: terminal
Expand All @@ -196,6 +205,15 @@ states:
status: failure
```

If agent returns:
```json
{"issues": ["buffer overflow", "memory leak"], "severity": "high"}
```

Then:
- `{{.states.analyze.Response.severity}}` = `"high"`
- `{{.states.analyze.Response.issues}}` = `["buffer overflow", "memory leak"]`

**HTTP operation outputs** follow the same pattern:

```yaml
Expand All @@ -205,7 +223,12 @@ states:
{{.states.step_name.Response.body_truncated}} # true if body was truncated
```

See [Workflow Syntax - Operation State](../user-guide/workflow-syntax.md#operation-state) for the full list of available operations and their output fields.
**Difference from `JSON`:**
- `Response` is populated automatically for all agent outputs if valid JSON is detected (heuristic)
- `JSON` is only populated when `output_format: json` is explicitly set on the agent step
- Use `Response.field` for automatic best-effort parsing; use `JSON.field` for explicit, validated JSON output

See [Agent Steps - Output Formatting](../user-guide/agent-steps.md#output-formatting) for examples and [Workflow Syntax - Operation State](../user-guide/workflow-syntax.md#operation-state) for available operations.

#### JSON (Explicit Output Formatting)

Expand All @@ -217,9 +240,13 @@ When an agent step uses `output_format: json`, the parsed JSON is accessible via
```

**Key differences from `Response`:**
- `JSON` is only populated when `output_format: json` is explicitly set on the agent step
- `Response` is populated automatically for all agent outputs if valid JSON is detected (heuristic)
- `JSON` represents explicitly formatted output; `Response` is automatic best-effort parsing
- `JSON` is only populated when `output_format: json` is explicitly set on the agent step (and validation passes)
- `Response` is populated automatically for all agent outputs if valid JSON is detected (heuristic, regardless of `output_format`)
- Use `JSON.field` for strict, validated output; use `Response.field` for lenient, automatic parsing

**When to use each:**
- **`JSON.field`**: You explicitly requested `output_format: json` and want strict validation
- **`Response.field`**: You want to access JSON from any agent output without requiring `output_format: json`

Example with `output_format: json`:

Expand All @@ -238,8 +265,9 @@ states:
process:
type: step
command: |
echo "Severity: {{.states.analyze.JSON.severity}}"
echo "Issues: {{.states.analyze.JSON.issues}}"
# Both work — JSON is strict, Response is lenient
echo "Severity (JSON): {{.states.analyze.JSON.severity}}"
echo "Severity (Response): {{.states.analyze.Response.severity}}"
on_success: done

done:
Expand All @@ -254,9 +282,14 @@ If agent returns:
{"issues": ["buffer overflow", "memory leak"], "severity": "high"}
```

Then:
Then both work identically:
- `{{.states.analyze.JSON.severity}}` = `"high"`
- `{{.states.analyze.JSON.issues}}` = `["buffer overflow", "memory leak"]`
- `{{.states.analyze.Response.severity}}` = `"high"`

**Difference becomes apparent when `output_format: json` is omitted:**
- Without `output_format: json`, JSON validation is skipped
- `JSON` remains empty (never populated)
- `Response` still populates if valid JSON is detected

See [Agent Steps - Output Formatting](../user-guide/agent-steps.md#output-formatting) for detailed examples and best practices.

Expand Down
32 changes: 22 additions & 10 deletions docs/user-guide/agent-steps.md
Original file line number Diff line number Diff line change
Expand Up @@ -1132,8 +1132,8 @@ Agent responses are automatically captured in the execution state:

| Field | Type | Description |
|-------|------|-------------|
| `{{.states.step_name.Output}}` | string | Raw response text (or cleaned text if `output_format` is set) |
| `{{.states.step_name.Response}}` | object | Parsed JSON response (automatic heuristic) |
| `{{.states.step_name.Output}}` | string | Aggregated assistant text. For CLI providers emitting NDJSON (Claude, Codex, Gemini, OpenCode, GitHub Copilot), the raw stream is extracted to clean text regardless of `output_format`; for HTTP providers, the response body text. `output_format: json` additionally strips markdown code fences. |
| `{{.states.step_name.Response}}` | object | Parsed JSON response (automatic heuristic — populated when the assistant text is a valid JSON object) |
| `{{.states.step_name.JSON}}` | object | Parsed JSON from `output_format: json` (explicit, see [Output Formatting](#output-formatting)) |
| `{{.states.step_name.TokensUsed}}` | int | Total tokens consumed by this step |
| `{{.states.step_name.TokensInput}}` | int | Input tokens (prompt + context). `0` in single-turn mode. |
Expand Down Expand Up @@ -1165,10 +1165,11 @@ process_response:

## Output Formatting

The `output_format` field serves two purposes:
The `output_format` field serves three purposes:

1. **Post-processing**: Strips markdown code fences and optionally validates JSON (F065)
2. **Display filtering**: Controls how agent responses appear on terminal during streaming and buffered execution, with optional verbose tool-use markers (F082, F085)
1. **Text Extraction**: For CLI providers, extracts clean assistant text from streaming event logs (F103, F082)
2. **Post-processing**: Strips markdown code fences and optionally validates JSON (F065)
3. **Display filtering**: Controls how agent responses appear on terminal during streaming and buffered execution, with optional verbose tool-use markers (F082, F085)

When an agent wraps its output in markdown code fences (common with many LLMs), use `output_format` to automatically strip the fences and optionally validate the content:

Expand Down Expand Up @@ -1210,6 +1211,7 @@ process_results:
- Strips outermost markdown code fences (e.g., ````json ... ``` ``)
- Validates stripped content as valid JSON
- Stores parsed JSON in `{{.states.step_name.JSON}}`
- Automatically populates `{{.states.step_name.Response}}` with parsed JSON (available across all providers)
- If validation fails, step fails with a descriptive error
- Works with both objects and arrays

Expand All @@ -1225,6 +1227,8 @@ The analysis shows the following:
- `{{.states.analyze.Output}}` = `{"issues": ["buffer overflow", "memory leak"], "severity": "high"}`
- `{{.states.analyze.JSON.issues}}` = `["buffer overflow", "memory leak"]`
- `{{.states.analyze.JSON.severity}}` = `"high"`
- `{{.states.analyze.Response.issues}}` = `["buffer overflow", "memory leak"]` (same as JSON)
- `{{.states.analyze.Response.severity}}` = `"high"` (same as JSON)

#### `text` Format

Expand All @@ -1248,6 +1252,7 @@ save_code:
- Strips outermost markdown code fences (e.g., ````python ... ``` ``)
- Returns clean text in `{{.states.step_name.Output}}`
- Does not populate `{{.states.step_name.JSON}}`
- Automatically populates `{{.states.step_name.Response}}` if the output happens to be valid JSON (heuristic)

**Example agent output:**
```
Expand All @@ -1262,6 +1267,8 @@ def fibonacci(n):

**After processing:**
- `{{.states.generate_code.Output}}` = `def fibonacci(n):\n if n <= 1:\n return n\n return fibonacci(n-1) + fibonacci(n-2)`
- `{{.states.generate_code.JSON}}` = empty/not populated (no `output_format: json`)
- `{{.states.generate_code.Response}}` = populated only if output happens to be valid JSON

#### No Format (Default)

Expand All @@ -1279,10 +1286,15 @@ analyze:

The `output_format` field controls how agent responses appear on the terminal (F082). Additionally, the `--verbose` flag displays tool-use activity markers (F085) — showing which tools the agent invoked — alongside agent output when running with `awf run --output streaming` or `--output buffered`:

| `output_format` | Streaming Display | Buffered Display | Raw Storage |
| `output_format` | Streaming Display | Buffered Display | `state.Output` Storage |
|---|---|---|---|
| `text` (or omitted) | Human-readable filtered text | Filtered text in summary | Raw NDJSON |
| `json` | Raw NDJSON (unfiltered) | Raw NDJSON (unfiltered) | Raw NDJSON |
| `text` (or omitted) | Human-readable filtered text | Filtered text in summary | Extracted assistant text |
| `json` | Raw NDJSON (unfiltered) | Raw NDJSON (unfiltered) | Extracted assistant text (markdown code fences stripped) |

**Output Parity Across All Providers (F103):** For CLI-based providers (Claude, Codex, Gemini, OpenCode), the system automatically extracts clean assistant text from NDJSON events and stores it in `state.Output`. Additionally:
- `state.Response` is automatically populated when the output is valid JSON (heuristic, regardless of `output_format`)
- Both `output_format: json` and omitted formats produce semantically equivalent output shapes across all providers
- Codex now has feature parity with Claude, Gemini, and OpenCode for output handling

#### Streaming Mode (`--output streaming`)

Expand Down Expand Up @@ -1347,10 +1359,10 @@ Silent mode suppresses all display regardless of `output_format`:
```bash
awf run code-review --output silent
# No output displayed (silent mode is absolute)
# state.Output still contains raw NDJSON for template interpolation
# state.Output still contains the aggregated assistant text for template interpolation
```

**Note:** `state.Output` always contains the raw NDJSON regardless of display filtering. Filtering only affects terminal display, not data storage.
**Note:** `state.Output` is populated independently of display filtering. For CLI providers emitting NDJSON (Claude, Codex, Gemini, OpenCode, GitHub Copilot), the assistant text is extracted from the raw stream and stored — so `{{.states.step.Output}}` resolves to clean text regardless of `output_format`. Filtering only affects terminal display, not data storage.

#### Provider Event Cadence

Expand Down
Loading
Loading