Summary
The Smoke Tests CI job (npm run test:smoke, defined in .github/workflows/ci.yml) is currently failing on main, independent of any open PR. As of commit 3fd0613 (tip of main), the suite reports 34 failures / 804 pass out of 838.
This was surfaced while reviewing #617, whose red Smoke Tests check turned out to be inherited from main — #617 itself introduces zero new failures and fixes one (test 540). Tracking the backlog here so the red check doesn't get mistaken for a per-PR regression.
How to reproduce
cd vessel
npm ci
NODE_ENV=test OPENAI_API_KEY="sk-dummy-key-for-build" npm run test:smoke
Failing tests on main (34)
222 - recovery string [buildRelationalMappingUnavailableMessage(notice)] survives deterministic reply hardening
223 - recovery string [buildRelationalMappingUnavailableMessage(notice)] does not trigger needsProtocolRepair
369 - Static fallback: STATIC_DOCTRINE_BUNDLE contains all 8 required critical doctrine entries
540 - composer no longer emits Uranus driver "rupture" texture line # FIXED by #617
602 - planner page includes course-block planning copy
619 - Raven chat astrology integration pins the current Best Astrology v3 host + deterministic endpoints
620 - symbolic moments auto-promote into the operational field-report lane
621 - creator mirror mode routes through Sherlog artifacts instead of astrology telemetry
625 - Raven field-report law treats section structure as a guide, not a per-turn ritual
626 - field report doctrine requires named sky detail plus plain-language translation
629 - composer dispatch releases parked scope for counterpart-aware requests
633 - field-report formatting drift guards ban markdown emphasis and list structure by default
636 - protocol repair rejects follow-up turns that shift the committed house or chamber map
637 - field-report route rejects stale scaffold hedge phrasing and repeated compression-point wording
640 - observer solo mirror source keeps operator distinct from chart subject
648 - relational degraded LLM fallback reply uses operational recovery copy
654 - relationship mapping route defaults context instead of hard-gating and fails closed on missing counterpart binding
657 - streamed raven replies still pass through hardening and protocol repair before final chunk
658 - symbolic moment deterministic repair fallbacks receive live BlueprintContext
664 - flight recorder keeps manual wrap-up available before phase four
669 - symbolic-moments intent detection recognizes plain-language field/moment asks
670 - pre-session greeting keeps relationship mechanics latent when no counterpart is staged
672 - pre-session greeting surfaces relationship mechanics only when a counterpart is staged
677 - performRavenRecall returns empty array when DB/API key unavailable (fail-open)
681 - injectProtocols returns empty blocks when retrieval unavailable (no noisy injection)
695 - main chat wires one-turn semantic depth controls and request payload fields
698 - balance meter details are attached to the lower pill instead of the response card top
700 - read submission can hold basic lanes in snapshot mode while still stamping the target lane
702 - alignment corridor primes on request dispatch instead of generic loading start
704 - astrology explanation turns preserve transit logic and rotated house framing
705 - corridor readout stays center-lane and two-clock reuse remains field-report aware
824 - SHERLOG/LLM_VOICE_INVERSION: valid LLM reply → rendererUsed is llm, not deterministic_fallback
825 - SHERLOG/LLM_VOICE_INVERSION: valid LLM reply → degradedRead is not set on canonical path
826 - SHERLOG/LLM_VOICE_INVERSION: valid LLM reply → output text is the LLM text, not fallback prose
(Test 540 is struck from the live backlog once #617 merges, leaving 33.)
Likely clusters (for triage, not yet root-caused)
These group into a few themes worth investigating as units rather than one-by-one:
- Environment / fail-open (677, 681, 369): tests that assert behavior when DB / API keys / retrieval are unavailable. May be genuinely failing or may be CI-environment artifacts — confirm whether they pass locally with real config.
- Field-report doctrine & formatting (620, 625, 626, 629, 633, 636, 637, 658, 669): a cluster around the field-report lane and symbolic-moment promotion — suggests a shared doctrine/string drift.
- Relational / counterpart staging (222, 223, 640, 648, 654, 670, 672): recovery copy + counterpart gating.
- LLM voice inversion (824, 825, 826): the canonical-path renderer is falling back to deterministic prose when it shouldn't — likely a single root cause across all three.
- UI / dashboard surfaces (602, 695, 698, 700, 702, 704, 705): planner/balance-meter/corridor rendering assertions.
- Astrology host pinning (619, 704): expected host/endpoint config drift.
Suggested next step
Triage by cluster. The LLM-voice-inversion trio (824–826) and the field-report doctrine cluster are the highest-value starting points since they touch core reply-path behavior, not just copy. The environment-dependent tests (677, 681, 369) should be confirmed against a fully-configured local run before being treated as real regressions.
https://claude.ai/code/session_014eVa8yUqpMAcGS6PVnrncA
Summary
The Smoke Tests CI job (
npm run test:smoke, defined in.github/workflows/ci.yml) is currently failing onmain, independent of any open PR. As of commit3fd0613(tip ofmain), the suite reports 34 failures / 804 pass out of 838.This was surfaced while reviewing #617, whose red Smoke Tests check turned out to be inherited from
main— #617 itself introduces zero new failures and fixes one (test 540). Tracking the backlog here so the red check doesn't get mistaken for a per-PR regression.How to reproduce
Failing tests on
main(34)(Test 540 is struck from the live backlog once #617 merges, leaving 33.)
Likely clusters (for triage, not yet root-caused)
These group into a few themes worth investigating as units rather than one-by-one:
Suggested next step
Triage by cluster. The LLM-voice-inversion trio (824–826) and the field-report doctrine cluster are the highest-value starting points since they touch core reply-path behavior, not just copy. The environment-dependent tests (677, 681, 369) should be confirmed against a fully-configured local run before being treated as real regressions.
https://claude.ai/code/session_014eVa8yUqpMAcGS6PVnrncA