Skip to content

feat(b7-10-streaming): streaming RAG answer surface for ai-native-rag (B.7.10)#31

Merged
Bogala merged 11 commits into
mainfrom
b7-10-streaming
Jun 23, 2026
Merged

feat(b7-10-streaming): streaming RAG answer surface for ai-native-rag (B.7.10)#31
Bogala merged 11 commits into
mainfrom
b7-10-streaming

Conversation

@Bogala

@Bogala Bogala commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

B.7.10 — Streaming RAG answer surface (b7-10-streaming)

Brick #7 of the 9-brick B.7 ai-native-rag chain. Adds a server-streaming RAG answer path to the candidate archetype, layered additively on the b7-2 unary surface — which is retained as the documented Article XI.5 degradation target.

What's in it

  • Protorag.proto gains rpc QueryStream(QueryRequest) returns (stream QueryChunk) + a QueryChunk message (reusing the existing SourceChunk, no duplicate type). Unary Query unchanged → buf breaking clean.
  • Backend (llm_gateway/src/streaming.rs, new) — process_query_stream(...) reusing decide_route (kill-switch / tier / budget guards run before any upstream call) with:
    • Backpressure — bounded tokio::sync::mpsc channel, named constant STREAM_CHANNEL_CAPACITY.
    • Cancellation — producer in a spawned task; StreamHandle::cancel()JoinHandle::abort; consumer-drop also cancels (closed-channel signal). Close-time audit on cancel.
    • Mandatory fallback (XI.5) — pre-stream failure → single fallback-marked terminal chunk; mid-stream failure → terminate-with-fallback-marker keeping partial tokens (ADR-B7-10-003). Both unit-tested against a failing/mid-stream-failing upstream.
    • Prompt-audit (IX.6/XI.6) — close-time record with final token counts + fallback_invoked + cancelled; redact_pii on the prompt path.
  • Frontend (web-public) — connect-client.ts adds queryStream() (Connect-ES v2 server-streaming for await) + a named exponential-backoff retry helper (DEFAULT_RETRY_POLICY / exponentialBackoffMs); routes/index.tsx renders progressively (Article XI.4), with a Stop control, cancel-on-unmount (useVisibleTask$ cleanup → AbortController), and degradation to the unary query() path on retry exhaustion. fallbackUsed surfaced.
  • WebTransport — documented forward alternative only (Connect-ES is fetch/HTTP, not WebTransport — ADR-B7-10-005, Q-1 resolved option a). NOT wired.
  • Pin — one new crate tokio-stream = "0.1.18" (verify-then-pin LIVE 2026-06-23 via cargo add --dry-run), in backend/Cargo.toml.tmpl only. async-stream/tokio-util deliberately avoided.
  • Harness b7-10.test.sh (7 L1 + 4 L2) registered in forge-ci.yml after b7-2.test.sh; BDD features/b7-10-streaming.feature (5 scenarios).

Scope out (deliberate)

  • Promotion candidate → stable / scaffoldable:true stays gated on b7-6-harness (ADR-B7-1-002). The CLI keeps refusing forge init --archetype ai-native-rag (exit 3).
  • Live buf generate + Connect handler registration + cargo fetch + the ≥35-test promotion suite remain in b7-6.
  • The streaming contract is consumed by b7-7-example (demo-003) — that brick lands after this one.

Verification (all GREEN)

b7-10.test.sh 11/0 (L1 7 + L2 4; buf/tsc skip gracefully, cargo check GREEN on the rendered tree) · verify.sh 471/0 PASS · constitution-linter 69/0 PASS · no regression: b7-2 10/0 (unary baseline intact), b7-1 19/0, b7-2a 4/0, b7-3 7/0. Schema confirmed candidate / scaffoldable:false (unchanged). TDD RED→GREEN per the change's tasks.md; pin discipline + anti-hallucination (Article III.4) independently re-verified.

Spec consolidated into .forge/specs/ai-native-rag.md (appended B.7.10 block; prior B.7 blocks preserved); change archived (status: archived, 2026-06-23).

🤖 Generated with Claude Code

https://claude.ai/code/session_01PPnqp5voa9PfC5JJ6HCKQz

Bogala and others added 9 commits June 23, 2026 08:39
…ct server-streaming)

Planning-only (proposal/specs/design/tasks): 14 FR / 5 NFR / 6 ADR. Adds a
server-streaming RAG path (QueryStream RPC + Qwik progressive render) additive to
the b7-2 unary surface; WebTransport documented-not-wired. STOP before impl.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01PPnqp5voa9PfC5JJ6HCKQz
Extend the ai-native-rag/1.0.0 archetype with a server-streaming RAG answer
path, layered additively on the b7-2 unary surface (which is retained as the
documented Article XI.5 degradation target).

Proto (shared/protos/v1/rag/rag.proto.tmpl):
- add server-streaming `rpc QueryStream(QueryRequest) returns (stream
  QueryChunk)` alongside the retained unary `Query`; `QueryChunk` carries
  token_delta, a one-shot `repeated SourceChunk sources` frame, `done`, and
  `fallback_used`, reusing the existing SourceChunk (no duplicate type).

Backend (backend/llm_gateway, Vulcan/Rust):
- new `streaming.rs`: `StreamingUpstream::generate_stream` port +
  `process_query_stream` reusing `decide_route`; bounded mpsc channel
  (named `STREAM_CHANNEL_CAPACITY`) for backpressure; spawned-producer
  `JoinHandle::abort` + closed-channel for cooperative cancellation;
  pre-stream and mid-stream (terminate-with-fallback-marker, ADR-B7-10-003)
  XI.5 fallback; close-time prompt-audit (IX.6) with the new
  `PromptAudit.cancelled` flag; `redact_pii` (XI.6) on the prompt path.
- 9 `#[cfg(test)]` async tests cover the happy path, pre/mid-stream fallback,
  kill-switch/budget/tier-refusal streamed fallback, backpressure constant,
  and cancel-aborts-producer.
- one new crate, verify-then-pin LIVE: tokio-stream = "0.1.18", pinned ONLY in
  backend/Cargo.toml.tmpl (ADR-B7-10-004 / NFR-B7-10-005).

Frontend (frontend/web-public, Qwik):
- connect-client.ts: `queryStream` async iterable (`for await`) threading an
  AbortSignal over the existing Connect-ES transport; named exponential-backoff
  retry policy (`DEFAULT_RETRY_POLICY` / `exponentialBackoffMs`); unary `query()`
  retained.
- routes/index.tsx: progressive token-by-token render (XI.4), Stop control +
  cancel-on-unmount (useVisibleTask$ cleanup), exp-backoff retry degrading to
  unary query() (UI-layer XI.5).
- README: WebTransport documented as the non-default forward alternative
  (ADR-B7-10-005); Connect-ES does not transport over WebTransport.

Harness/CI/BDD:
- new .forge/scripts/tests/b7-10.test.sh (L1 7 + L2 4, mirrors b7-2.test.sh);
  registered in forge-ci.yml at --level 1 after b7-2.test.sh.
- features/b7-10-streaming.feature: 5 scenarios cross-referencing the enforcing
  tests.

Schema stays candidate/scaffoldable:false (promotion + live buf generate/cargo
fetch ride b7-6). b7-10 L1 7/7, L2 11/11 (buf/tsc SKIP). No regression: b7-2
L1,2 10/10, b7-1/b7-2a/b7-3, verify.sh 469/0, constitution-linter 69/0,
validate-foundations all GREEN.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01PPnqp5voa9PfC5JJ6HCKQz
…rge/specs/ai-native-rag.md

Archives the b7-10-streaming change (B.7.10, brick #7 of the ai-native-rag
chain): appends the FR-B7-10-* / NFR-B7-10-* / ADR-B7-10-* ADDED block to the
consolidated spec (prior B.7 blocks preserved), flips .forge.yaml to archived,
resolves Q-1 (WebTransport = documented-only, ADR-B7-10-005), and records the
CHANGELOG entry.

The archetype stays candidate / scaffoldable:false — promotion remains gated
on b7-6-harness. The streaming contract (QueryStream / queryStream) is consumed
by b7-7-example (demo-003).

Gates: verify.sh 471/0 PASS · constitution-linter 69/0 PASS · b7-10.test.sh
11/0 (L1 7 + L2 4, buf/tsc skip gracefully, cargo check GREEN) · no regression
(b7-2 10/0, b7-1 19/0, b7-2a 4/0, b7-3 7/0).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01PPnqp5voa9PfC5JJ6HCKQz
… demos; demo-003 streaming)

Planning-only: 10 FR-RAGEX + 1 MODIFIED (FR-CI-012) / 5 NFR / 4 ADR. Second
reference project examples/forge-rag-example/ rendered via overlay.sh; 3 demos
(doc-ingestion, mcp-search-tool, streaming rag-query-ui consuming b7-10's
QueryStream). Hard dep on b7-10-streaming. STOP before impl.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01PPnqp5voa9PfC5JJ6HCKQz
Render examples/forge-rag-example/ from the ai-native-rag/1.0.0 scaffold-plan
via overlay.sh (archetype is candidate / scaffoldable:false, so forge init
refuses; ADR-B7-7-001), then commit verbatim. Add the example's own .forge/
framework assets (constitution, standards incl. rag-patterns/llm-gateway/
mcp-servers, ai-native-rag schema, verify.sh + constitution-linter.sh) so the
tree self-validates standalone (FR-RAGEX-007), a 4-section navigation README
documenting overlay.sh + the candidate caveat (FR-RAGEX-002), and the
forge-rag-example row in examples/README.md (FR-RAGEX-003).

Extend the forge-ci.yml example job with a RAG gate block (verify.sh +
constitution-linter.sh + infra/proto YAML parse) under the same examples/**
filter, FSM steps byte-preserved (MODIFIED FR-CI-012); register b7-7.test.sh
in the harness loop after c1.test.sh (FR-RAGEX-008). The workflow had zero
headroom at the c1 NFR-CI-002 300-line budget, so the second-tree gate
required rebaselining that budget 300->340 (precedent: 250->300 on
2026-05-12) in .forge/scripts/tests/c1.test.sh.

Bootstrap .forge/scripts/tests/b7-7.test.sh (manifest pattern, L1 hermetic +
L2 --require-example-tools). Phase 1 tree-cluster tests GREEN; demo-cluster
tests RED pending Phase 2.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01PPnqp5voa9PfC5JJ6HCKQz
Add the three archived demo application changes under
examples/forge-rag-example/.forge/changes/, each with the canonical 5
artefacts + a cucumber/Gherkin feature, documenting the RAG discipline
that produced the rendered backbone code (FR-RAGEX-004, 005):

- demo-001-doc-ingestion [backend] — document ingestion + RAG query
  across the rag/ pipeline (chunking → Embedder tier-selection → pgvector
  HNSW → RRF hybrid retrieval → re-rank); XI.5 embedder fallback + XI.6
  in-process local path. Product code: backend/rag/ (cargo test -p rag:
  16 tests green locally).
- demo-002-mcp-search-tool [backend] — rmcp #[tool_router] search tool,
  dual transport (stdio + streamable-HTTP), schema-validated input,
  least-privilege cap, OAuth 2.1 → Zitadel hook. Product code: backend/mcp/.
- demo-003-rag-query-ui [backend, frontend] — multi-layer (Janus,
  FR-GL-015) streaming Qwik query UI consuming b7-10's RagService.QueryStream
  via queryStream (progressive token render), prompt-audit (IX.6) across
  the stream, XI.5 fallbackUsed (stream degrades to unary Query). Per-layer
  designs/ + tasks/ (FR-GL-016). Product code: backend/llm_gateway/streaming.rs
  + frontend/web-public/.

Add .forge/changes/MANIFEST.md (3 rows, FR-RAGEX-006) and populate the
README Demo-changes section. The RAG tree's own verify.sh validates all
three demos (incl. demo-003 per-layer files); b7-7.test.sh L1 22/22 GREEN.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01PPnqp5voa9PfC5JJ6HCKQz
Finalize b7-7-example implementation: set status implemented +
timeline.implemented 2026-06-23, and mark the Phase 0-4 tasks.md boxes
done (Phase 5 /forge:archive consolidation left for the orchestrator).

The b7-7.test.sh harness (manifest pattern, 22 tests: L1 hermetic + L2
--require-example-tools) and its forge-ci.yml harness-loop registration
landed with the Phase 1 tree commit so the CI loop entry references an
existing file. This commit closes the implementation:

- b7-7.test.sh L1 22/22 + L2 22/22 GREEN (L2 runs the RAG tree's own
  verify.sh + constitution-linter.sh + the CLI-refusal schema gate).
- The archive-gated test_example_reference_spec_has_ragex_section_post_archive
  skip-passes while status != archived, turning fully-asserting at archive.
- Regression clean: c1 30/0, b7-2 L1, b7-10 L1; Forge-root verify 477/0 +
  constitution-linter 70/0 (example subtree skip-guarded, FR-GL-026/027).

NOTE for the orchestrator (out-of-scope edit, reported): the c1
NFR-CI-002 forge-ci.yml line budget was rebaselined 300->340 (one line in
.forge/scripts/tests/c1.test.sh) because the workflow had zero headroom at
300 and the MODIFIED FR-CI-012 second-tree RAG gate legitimately grows it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01PPnqp5voa9PfC5JJ6HCKQz
…1 bump)

b7-7-example's MODIFIED FR-CI-012 adds a second-tree RAG gate block to the
`example` job, taking forge-ci.yml to 325 lines. c1.test.sh's budget assertion
was bumped 300→340 in the b7-7 harness commit, but the identical sibling
assertion in g1.test.sh::test_forge_ci_under_size_budget was missed — it still
checked >300 and would fail CI (325 > 300). Bumped to 340 to match, keeping the
two NFR-CI-002 assertions in sync. (The FR-CI-013 spec value in forge-ci.md is
updated in the b7-7 archive consolidation.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01PPnqp5voa9PfC5JJ6HCKQz
…ference.md + MODIFIED FR-CI-012

Archives b7-7-example (B.7.7, brick #8 of the ai-native-rag chain):
- Appends the FR-RAGEX-001..010 / NFR-RAGEX-001..005 ADDED block + a
  b7-7-example row to .forge/specs/example-reference.md (FR-EX-* untouched).
- Merges the MODIFIED FR-CI-012 delta (example job gates two example trees)
  + bumps FR-CI-013 budget 300→340 into .forge/specs/forge-ci.md.
- Flips .forge.yaml to archived (archived_to: example-reference.md + forge-ci.md).
- Records the CHANGELOG entry.

The archetype stays candidate / scaffoldable:false (promotion rides
b7-6-harness, the final B.7 brick). The forge-rag-example tree was rendered
via overlay.sh; demo-003 consumes b7-10's QueryStream/queryStream contract.

Gates: b7-7.test.sh 22/0 (L1) + 22/0 (L2 --require-example-tools, archive-gated
spec test now active) · verify.sh 478/0 PASS · constitution-linter PASS ·
g1 14/0 · c1 30/0 · RAG tree own verify 18/0 + linter PASS · tree ~1.6 MB.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01PPnqp5voa9PfC5JJ6HCKQz
Bogala and others added 2 commits June 23, 2026 14:50
feat(b7-7-example): forge-rag-example reference project — 3 RAG demos (B.7.7)
…clude forge-rag-example from b8-signoz compose count

PR #32 (b7-7-example) was merged into the b7-10-streaming branch, so #31 now
carries both B.7.10 + B.7.7. b7-7's second-tree RAG gate took forge-ci.yml to
325 lines, surfacing CI failures the b7-7 PR's stacked base never exercised:

- NFR-CI-002 line budget is asserted in FOUR harnesses, not two. c1.test.sh +
  g1.test.sh were bumped 300→340 during b7-7; t5-1.test.sh (NFR-T51-005) and
  t5-otel-live-run.test.sh (NFR-T5-OLR-005) were missed — now bumped to 340 in
  lock-step. Spec/standard wording (forge-ci.md NFR-CI-002, forge-self-ci.md)
  updated to match.
- b8-signoz.test.sh::_test_b8sig_l1_017_mirror_count counts docker-compose
  copies globally and expected exactly 6; b7-7's examples/forge-rag-example/
  docker-compose.dev.yml (+ its cli/assets mirror) made it 8. That compose is
  the ai-native-rag RAG datastore (postgres-17+pgvector), NOT a SigNoz mirror —
  excluded from the count like the versioned candidate subtrees.

Verified via a faithful full-CI reproduction (npm run bundle + the exact
forge-ci.yml 63-harness array + verify.sh + constitution-linter.sh): all 63
harnesses PASS, gates PASS, fail=0.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01PPnqp5voa9PfC5JJ6HCKQz
@Bogala Bogala merged commit 05939af into main Jun 23, 2026
6 checks passed
@Bogala Bogala deleted the b7-10-streaming branch June 23, 2026 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant