feat(b7-10-streaming): streaming RAG answer surface for ai-native-rag (B.7.10)#31
Merged
Conversation
…ct server-streaming) Planning-only (proposal/specs/design/tasks): 14 FR / 5 NFR / 6 ADR. Adds a server-streaming RAG path (QueryStream RPC + Qwik progressive render) additive to the b7-2 unary surface; WebTransport documented-not-wired. STOP before impl. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01PPnqp5voa9PfC5JJ6HCKQz
Extend the ai-native-rag/1.0.0 archetype with a server-streaming RAG answer path, layered additively on the b7-2 unary surface (which is retained as the documented Article XI.5 degradation target). Proto (shared/protos/v1/rag/rag.proto.tmpl): - add server-streaming `rpc QueryStream(QueryRequest) returns (stream QueryChunk)` alongside the retained unary `Query`; `QueryChunk` carries token_delta, a one-shot `repeated SourceChunk sources` frame, `done`, and `fallback_used`, reusing the existing SourceChunk (no duplicate type). Backend (backend/llm_gateway, Vulcan/Rust): - new `streaming.rs`: `StreamingUpstream::generate_stream` port + `process_query_stream` reusing `decide_route`; bounded mpsc channel (named `STREAM_CHANNEL_CAPACITY`) for backpressure; spawned-producer `JoinHandle::abort` + closed-channel for cooperative cancellation; pre-stream and mid-stream (terminate-with-fallback-marker, ADR-B7-10-003) XI.5 fallback; close-time prompt-audit (IX.6) with the new `PromptAudit.cancelled` flag; `redact_pii` (XI.6) on the prompt path. - 9 `#[cfg(test)]` async tests cover the happy path, pre/mid-stream fallback, kill-switch/budget/tier-refusal streamed fallback, backpressure constant, and cancel-aborts-producer. - one new crate, verify-then-pin LIVE: tokio-stream = "0.1.18", pinned ONLY in backend/Cargo.toml.tmpl (ADR-B7-10-004 / NFR-B7-10-005). Frontend (frontend/web-public, Qwik): - connect-client.ts: `queryStream` async iterable (`for await`) threading an AbortSignal over the existing Connect-ES transport; named exponential-backoff retry policy (`DEFAULT_RETRY_POLICY` / `exponentialBackoffMs`); unary `query()` retained. - routes/index.tsx: progressive token-by-token render (XI.4), Stop control + cancel-on-unmount (useVisibleTask$ cleanup), exp-backoff retry degrading to unary query() (UI-layer XI.5). - README: WebTransport documented as the non-default forward alternative (ADR-B7-10-005); Connect-ES does not transport over WebTransport. Harness/CI/BDD: - new .forge/scripts/tests/b7-10.test.sh (L1 7 + L2 4, mirrors b7-2.test.sh); registered in forge-ci.yml at --level 1 after b7-2.test.sh. - features/b7-10-streaming.feature: 5 scenarios cross-referencing the enforcing tests. Schema stays candidate/scaffoldable:false (promotion + live buf generate/cargo fetch ride b7-6). b7-10 L1 7/7, L2 11/11 (buf/tsc SKIP). No regression: b7-2 L1,2 10/10, b7-1/b7-2a/b7-3, verify.sh 469/0, constitution-linter 69/0, validate-foundations all GREEN. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01PPnqp5voa9PfC5JJ6HCKQz
…rge/specs/ai-native-rag.md Archives the b7-10-streaming change (B.7.10, brick #7 of the ai-native-rag chain): appends the FR-B7-10-* / NFR-B7-10-* / ADR-B7-10-* ADDED block to the consolidated spec (prior B.7 blocks preserved), flips .forge.yaml to archived, resolves Q-1 (WebTransport = documented-only, ADR-B7-10-005), and records the CHANGELOG entry. The archetype stays candidate / scaffoldable:false — promotion remains gated on b7-6-harness. The streaming contract (QueryStream / queryStream) is consumed by b7-7-example (demo-003). Gates: verify.sh 471/0 PASS · constitution-linter 69/0 PASS · b7-10.test.sh 11/0 (L1 7 + L2 4, buf/tsc skip gracefully, cargo check GREEN) · no regression (b7-2 10/0, b7-1 19/0, b7-2a 4/0, b7-3 7/0). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01PPnqp5voa9PfC5JJ6HCKQz
… demos; demo-003 streaming) Planning-only: 10 FR-RAGEX + 1 MODIFIED (FR-CI-012) / 5 NFR / 4 ADR. Second reference project examples/forge-rag-example/ rendered via overlay.sh; 3 demos (doc-ingestion, mcp-search-tool, streaming rag-query-ui consuming b7-10's QueryStream). Hard dep on b7-10-streaming. STOP before impl. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01PPnqp5voa9PfC5JJ6HCKQz
Render examples/forge-rag-example/ from the ai-native-rag/1.0.0 scaffold-plan via overlay.sh (archetype is candidate / scaffoldable:false, so forge init refuses; ADR-B7-7-001), then commit verbatim. Add the example's own .forge/ framework assets (constitution, standards incl. rag-patterns/llm-gateway/ mcp-servers, ai-native-rag schema, verify.sh + constitution-linter.sh) so the tree self-validates standalone (FR-RAGEX-007), a 4-section navigation README documenting overlay.sh + the candidate caveat (FR-RAGEX-002), and the forge-rag-example row in examples/README.md (FR-RAGEX-003). Extend the forge-ci.yml example job with a RAG gate block (verify.sh + constitution-linter.sh + infra/proto YAML parse) under the same examples/** filter, FSM steps byte-preserved (MODIFIED FR-CI-012); register b7-7.test.sh in the harness loop after c1.test.sh (FR-RAGEX-008). The workflow had zero headroom at the c1 NFR-CI-002 300-line budget, so the second-tree gate required rebaselining that budget 300->340 (precedent: 250->300 on 2026-05-12) in .forge/scripts/tests/c1.test.sh. Bootstrap .forge/scripts/tests/b7-7.test.sh (manifest pattern, L1 hermetic + L2 --require-example-tools). Phase 1 tree-cluster tests GREEN; demo-cluster tests RED pending Phase 2. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01PPnqp5voa9PfC5JJ6HCKQz
Add the three archived demo application changes under examples/forge-rag-example/.forge/changes/, each with the canonical 5 artefacts + a cucumber/Gherkin feature, documenting the RAG discipline that produced the rendered backbone code (FR-RAGEX-004, 005): - demo-001-doc-ingestion [backend] — document ingestion + RAG query across the rag/ pipeline (chunking → Embedder tier-selection → pgvector HNSW → RRF hybrid retrieval → re-rank); XI.5 embedder fallback + XI.6 in-process local path. Product code: backend/rag/ (cargo test -p rag: 16 tests green locally). - demo-002-mcp-search-tool [backend] — rmcp #[tool_router] search tool, dual transport (stdio + streamable-HTTP), schema-validated input, least-privilege cap, OAuth 2.1 → Zitadel hook. Product code: backend/mcp/. - demo-003-rag-query-ui [backend, frontend] — multi-layer (Janus, FR-GL-015) streaming Qwik query UI consuming b7-10's RagService.QueryStream via queryStream (progressive token render), prompt-audit (IX.6) across the stream, XI.5 fallbackUsed (stream degrades to unary Query). Per-layer designs/ + tasks/ (FR-GL-016). Product code: backend/llm_gateway/streaming.rs + frontend/web-public/. Add .forge/changes/MANIFEST.md (3 rows, FR-RAGEX-006) and populate the README Demo-changes section. The RAG tree's own verify.sh validates all three demos (incl. demo-003 per-layer files); b7-7.test.sh L1 22/22 GREEN. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01PPnqp5voa9PfC5JJ6HCKQz
Finalize b7-7-example implementation: set status implemented + timeline.implemented 2026-06-23, and mark the Phase 0-4 tasks.md boxes done (Phase 5 /forge:archive consolidation left for the orchestrator). The b7-7.test.sh harness (manifest pattern, 22 tests: L1 hermetic + L2 --require-example-tools) and its forge-ci.yml harness-loop registration landed with the Phase 1 tree commit so the CI loop entry references an existing file. This commit closes the implementation: - b7-7.test.sh L1 22/22 + L2 22/22 GREEN (L2 runs the RAG tree's own verify.sh + constitution-linter.sh + the CLI-refusal schema gate). - The archive-gated test_example_reference_spec_has_ragex_section_post_archive skip-passes while status != archived, turning fully-asserting at archive. - Regression clean: c1 30/0, b7-2 L1, b7-10 L1; Forge-root verify 477/0 + constitution-linter 70/0 (example subtree skip-guarded, FR-GL-026/027). NOTE for the orchestrator (out-of-scope edit, reported): the c1 NFR-CI-002 forge-ci.yml line budget was rebaselined 300->340 (one line in .forge/scripts/tests/c1.test.sh) because the workflow had zero headroom at 300 and the MODIFIED FR-CI-012 second-tree RAG gate legitimately grows it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01PPnqp5voa9PfC5JJ6HCKQz
…1 bump) b7-7-example's MODIFIED FR-CI-012 adds a second-tree RAG gate block to the `example` job, taking forge-ci.yml to 325 lines. c1.test.sh's budget assertion was bumped 300→340 in the b7-7 harness commit, but the identical sibling assertion in g1.test.sh::test_forge_ci_under_size_budget was missed — it still checked >300 and would fail CI (325 > 300). Bumped to 340 to match, keeping the two NFR-CI-002 assertions in sync. (The FR-CI-013 spec value in forge-ci.md is updated in the b7-7 archive consolidation.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01PPnqp5voa9PfC5JJ6HCKQz
…ference.md + MODIFIED FR-CI-012 Archives b7-7-example (B.7.7, brick #8 of the ai-native-rag chain): - Appends the FR-RAGEX-001..010 / NFR-RAGEX-001..005 ADDED block + a b7-7-example row to .forge/specs/example-reference.md (FR-EX-* untouched). - Merges the MODIFIED FR-CI-012 delta (example job gates two example trees) + bumps FR-CI-013 budget 300→340 into .forge/specs/forge-ci.md. - Flips .forge.yaml to archived (archived_to: example-reference.md + forge-ci.md). - Records the CHANGELOG entry. The archetype stays candidate / scaffoldable:false (promotion rides b7-6-harness, the final B.7 brick). The forge-rag-example tree was rendered via overlay.sh; demo-003 consumes b7-10's QueryStream/queryStream contract. Gates: b7-7.test.sh 22/0 (L1) + 22/0 (L2 --require-example-tools, archive-gated spec test now active) · verify.sh 478/0 PASS · constitution-linter PASS · g1 14/0 · c1 30/0 · RAG tree own verify 18/0 + linter PASS · tree ~1.6 MB. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01PPnqp5voa9PfC5JJ6HCKQz
feat(b7-7-example): forge-rag-example reference project — 3 RAG demos (B.7.7)
…clude forge-rag-example from b8-signoz compose count PR #32 (b7-7-example) was merged into the b7-10-streaming branch, so #31 now carries both B.7.10 + B.7.7. b7-7's second-tree RAG gate took forge-ci.yml to 325 lines, surfacing CI failures the b7-7 PR's stacked base never exercised: - NFR-CI-002 line budget is asserted in FOUR harnesses, not two. c1.test.sh + g1.test.sh were bumped 300→340 during b7-7; t5-1.test.sh (NFR-T51-005) and t5-otel-live-run.test.sh (NFR-T5-OLR-005) were missed — now bumped to 340 in lock-step. Spec/standard wording (forge-ci.md NFR-CI-002, forge-self-ci.md) updated to match. - b8-signoz.test.sh::_test_b8sig_l1_017_mirror_count counts docker-compose copies globally and expected exactly 6; b7-7's examples/forge-rag-example/ docker-compose.dev.yml (+ its cli/assets mirror) made it 8. That compose is the ai-native-rag RAG datastore (postgres-17+pgvector), NOT a SigNoz mirror — excluded from the count like the versioned candidate subtrees. Verified via a faithful full-CI reproduction (npm run bundle + the exact forge-ci.yml 63-harness array + verify.sh + constitution-linter.sh): all 63 harnesses PASS, gates PASS, fail=0. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01PPnqp5voa9PfC5JJ6HCKQz
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
B.7.10 — Streaming RAG answer surface (
b7-10-streaming)Brick #7 of the 9-brick B.7
ai-native-ragchain. Adds a server-streaming RAG answer path to the candidate archetype, layered additively on the b7-2 unary surface — which is retained as the documented Article XI.5 degradation target.What's in it
rag.protogainsrpc QueryStream(QueryRequest) returns (stream QueryChunk)+ aQueryChunkmessage (reusing the existingSourceChunk, no duplicate type). UnaryQueryunchanged →buf breakingclean.llm_gateway/src/streaming.rs, new) —process_query_stream(...)reusingdecide_route(kill-switch / tier / budget guards run before any upstream call) with:tokio::sync::mpscchannel, named constantSTREAM_CHANNEL_CAPACITY.StreamHandle::cancel()→JoinHandle::abort; consumer-drop also cancels (closed-channel signal). Close-time audit on cancel.fallback_invoked+cancelled;redact_piion the prompt path.web-public) —connect-client.tsaddsqueryStream()(Connect-ES v2 server-streamingfor await) + a named exponential-backoff retry helper (DEFAULT_RETRY_POLICY/exponentialBackoffMs);routes/index.tsxrenders progressively (Article XI.4), with a Stop control, cancel-on-unmount (useVisibleTask$cleanup →AbortController), and degradation to the unaryquery()path on retry exhaustion.fallbackUsedsurfaced.tokio-stream = "0.1.18"(verify-then-pin LIVE 2026-06-23 viacargo add --dry-run), inbackend/Cargo.toml.tmplonly.async-stream/tokio-utildeliberately avoided.b7-10.test.sh(7 L1 + 4 L2) registered inforge-ci.ymlafterb7-2.test.sh; BDDfeatures/b7-10-streaming.feature(5 scenarios).Scope out (deliberate)
scaffoldable:truestays gated onb7-6-harness(ADR-B7-1-002). The CLI keeps refusingforge init --archetype ai-native-rag(exit 3).buf generate+ Connect handler registration +cargo fetch+ the ≥35-test promotion suite remain inb7-6.b7-7-example(demo-003) — that brick lands after this one.Verification (all GREEN)
b7-10.test.sh11/0 (L1 7 + L2 4;buf/tscskip gracefully,cargo checkGREEN on the rendered tree) ·verify.sh471/0 PASS ·constitution-linter69/0 PASS · no regression:b7-210/0 (unary baseline intact),b7-119/0,b7-2a4/0,b7-37/0. Schema confirmed candidate / scaffoldable:false (unchanged). TDD RED→GREEN per the change'stasks.md; pin discipline + anti-hallucination (Article III.4) independently re-verified.Spec consolidated into
.forge/specs/ai-native-rag.md(appended B.7.10 block; prior B.7 blocks preserved); change archived (status: archived, 2026-06-23).🤖 Generated with Claude Code
https://claude.ai/code/session_01PPnqp5voa9PfC5JJ6HCKQz