TheWizardsCode · SorraTheOrc · Dec 14, 2025 · Dec 13, 2025 · Dec 13, 2025 · Dec 13, 2025
diff --git a/.pm/tracker.md b/.pm/tracker.md
@@ -1,6 +1,6 @@
 # Project Task Tracker
 
-**Last Updated:** 2025-12-08T07:42:16Z
+**Last Updated:** 2025-12-14T05:53:29Z
 
 ## Quick Status Dashboard
 
@@ -13,11 +13,15 @@
 | 12.2.1 | Management Depth UI | complete | High | 12.1.1 | UI Team | 2025-12-07 |
 | 12.2.2 | Agent Roster Panel | complete | High | 12.2.1 | UI Team | 2025-12-07 |
 | 12.2.3 | Player Interactivity & UI Wiring | ✅ complete | High | 12.2.2 | UI Team | 2025-12-08 |
+| 13.1.1 | Build Test Chat Interface | not-started | High | None | LLM Team | 2025-12-13 |
+| 13.1.2 | Add RAG pipeline to LLM service | not-started | High | 13.1.1 | LLM Team | 2025-12-14 |
 
 
 
 **Active Tasks:**
 
+- 🆕 **13.1.2** - Add RAG pipeline to LLM service (Issue #91) - **NOT STARTED** (created 2025-12-14)
+- 🆕 **13.1.1** - Build Test Chat Interface (Issue #89) - **NOT STARTED** (created 2025-12-13)
 - ✅ **11.5.1** - CI Integration for Balance Validation - **COMPLETED** (merged 2025-12-05)
 - ✅ **10.1.9** - Comprehensive Scripts Test Coverage - **COMPLETED** (merged 2025-12-05)
 - ✅ **12.2.3** - Player Interactivity & UI Wiring (Issue #84) - **COMPLETED** (merged 2025-12-08)
@@ -45,6 +49,8 @@
      5. Submit for code review and merge.
    - **Last Updated:** 2025-12-07
 
+5. **13.1.1** - Build Test Chat Interface (Issue #89) - Define CLI UX + httpx helper so LLM work can be validated manually.
+6. **13.1.2** - Add RAG pipeline to LLM service (Issue #91) - Ground `/parse_intent` + `/narrate` with retrieved context per Semantic Kernel playbook.
 
 ## Comprehensive Project Status Report
 
@@ -65,7 +71,7 @@
 
 - Total tests: 1,042 (up from 849; +193 new tests)
 - Coverage: 91.37% overall (up from 90.95%), critical modules at 94-98%, scripts at 88.6%
-- Open issues: 2 (Issue #70 - Designer Tooling, Issue #71 - Parameter Optimization; both low priority)
+- Open issues: 4 (Issue #70 - Designer Tooling, Issue #71 - Parameter Optimization, Issue #89 - LLM Chat Harness, Issue #91 - LLM RAG pipeline)
 - Recent commits: 20+ commits in past 24 hours, excellent delivery pace
 - Repository hygiene: Excellent - clean issue backlog, well-documented
 - **Phase 11 Progress:** 4 of 6 milestones complete (11.1 Batch Sweeps, 11.2 Result Aggregation, 11.3 Analysis & Reporting, 11.5 CI Integration)
@@ -77,6 +83,9 @@
 
 **Recent Progress (since last update):**
 
+- 🆕 **Task 13.1.2 (Add RAG pipeline to LLM service)** - Issue [#91](https://github.com/TheWizardsCode/GEngine/issues/91) opened 2025-12-14 outlining ingestion tooling, retrieval hooks, telemetry, and tests per Microsoft Semantic Kernel reference.
+- 🆕 **Task 13.1.1 (Build Test Chat Interface)** - Issue [#89](https://github.com/TheWizardsCode/GEngine/issues/89) opened 2025-12-13 with CLI workflow, HTTP helper, docs, and test plan for exercising the LLM service manually.
+
 - 🎉 **Task 12.1.1 (Terminal UI Core Implementation) COMPLETED** - Merged to main 2025-12-07
   - Documented all Terminal UI views and keyboard controls in `docs/gengine/how_to_play_echoes.md`
   - Updated agent and faction view logic for GameState compatibility
@@ -364,11 +373,13 @@ All tasks are either complete or unblocked and ready to start.
 | **Phase prioritization unclear** | Low | Resource allocation between Phase 11 completion vs. Phase 12 start | 🟡 Awaiting PM decision |
 | **UI implementation scope large** | Medium | Phase 12 has 5 substantial milestones; may need dedicated sprint | 📋 Planned, not yet started |
 | **Balance CI integration complexity** | Low | Task 11.5.1 requires careful baseline management and threshold tuning | 📋 Documented in task, ready to start |
+| **LLM service lacks manual chat harness** | Medium | Hard to validate provider regressions without developer tooling | 🛠️ Task 13.1.1 / Issue #89 planned |
+| **LLM intents lack RAG grounding** | Medium | Without retrieval context, LLM outputs may drift from canon | 🛠️ Task 13.1.2 / Issue #91 opened to implement RAG |
 
 ### 🔄 Monitoring
 
 - **Test Coverage:** Improved to 91.37% (up from 90.95%); scripts module at 88.6%
-- **Issue Backlog:** Clean (1 open issue, just created)
+- **Issue Backlog:** 4 open issues (#70, #71, #89, #91) with owners + next steps
 - **PR Queue:** Empty - excellent merge velocity
 - **Documentation Drift:** None detected - docs updated with each milestone
 
@@ -465,14 +476,61 @@ The project has closely followed the implementation plan with excellent tracking
 
 |    ID    | Task                                  | Status      | Priority | Dependencies | Responsible      | Updated    |
 | -------: | ------------------------------------- | ----------- | -------- | ------------ | ---------------- | ---------- |
-| 13.1.1   | Build Test Chat Interface (TinyLlama)  | in-progress | High     | None         | LLM Team         | 2025-12-09 |
+| 13.1.1   | Build Test Chat Interface (echoes_llm_service) | not-started | High     | None         | LLM Team         | 2025-12-13 |
+| 13.1.2   | Add RAG pipeline to LLM service                | not-started | High     | 13.1.1      | LLM Team         | 2025-12-14 |
+
+### 13.1.1 — Build Test Chat Interface (echoes_llm_service)
+
+- **Description:** Build a lightweight Python CLI chat harness that targets `echoes_llm_service` so engineers, PMs, and designers can manually exercise `/parse_intent` and `/narrate` against stub or real providers.
+- **Acceptance Criteria:**
+  - `uv run python scripts/echoes_llm_chat.py --service-url http://localhost:8001` connects to the stub provider and supports interactive multi-turn chats.
+  - CLI supports `--mode parse|narrate`, optional context injection (`--context-file`), slash commands (`/clear`, `/save <path>`, `/quit`), and `--history-limit`.
+  - Requests include prior turns when history is enabled; transcripts export to JSON; errors surface readable messages and set non-zero exit codes.
+  - README (or linked doc) describes setup, env vars (`ECHOES_LLM_*`), sample usage, and troubleshooting for local vs. remote endpoints.
+  - Automated tests cover HTTP payload formation, history management, export/reset commands, and error handling using mocked transports.
+- **Priority:** High
+- **Responsible:** LLM Team (owner TBD)
+- **Dependencies:** None
+- **Status:** not-started
+- **Linked Issue:** [#89](https://github.com/TheWizardsCode/GEngine/issues/89)
+- **Risks & Mitigations:**
+  - Risk: Provider regressions go undetected without manual harness; mitigation: deliver CLI defaults to stub provider and capture telemetry.
+  - Risk: Transcript exports could leak secrets; mitigation: redact sensitive env vars and allow configurable history trimming.
+- **Testing Owner:** `test_agent`
+- **Next Steps:**
+  1. Finalize CLI spec/flags and align with UX expectations.
+  2. Implement reusable HTTP helper (`src/gengine/echoes/llm/chat_client.py`) with telemetry extraction.
+  3. Build the interactive script with prompt loop + slash commands + transcript export.
+  4. Partner with `test_agent` to add `tests/echoes/test_llm_chat_cli.py` using mocked transports.
+  5. Document usage and troubleshooting in README LLM section.
+- **Last Updated:** 2025-12-14
 
-### 13.1.1 — Build Test Chat Interface (TinyLlama)
+### 13.1.2 — Add RAG pipeline to LLM service
 
-- **Description:** Implement a simple Python chat interface using TinyLlama-1.1B-Chat-v1.0-ONNX running on a Copilot+ PC with Snapdragon NPU. Use ONNX Runtime with QNNExecutionProvider for hardware acceleration. The interface should support conversational input/output and run locally.
-  - Environment subsystem (pollution, diffusion, biodiversity, stability)
-  - Narrative director with story seeds, pacing, lifecycle management
-  - LLM integration (OpenAI, Anthropic providers) with intent parsing
+- **Description:** Adapt the Semantic Kernel + Foundry Local RAG approach (see [Microsoft TechCommunity article](https://techcommunity.microsoft.com/blog/educatordeveloperblog/building-enterprise-grade-local-rag-applications-with-semantic-kernel-and-foundr/4433945)) into our Python-based LLM service so `/parse_intent` and `/narrate` are grounded in curated Echoes documentation.
+- **Acceptance Criteria:**
+  - `scripts/build_llm_knowledge_base.py` ingests configured corpora, chunkifies content, generates embeddings via the active provider, and writes a deterministic local index.
+  - Enabling `ECHOES_LLM_ENABLE_RAG=true` causes the LLM service to retrieve top-K snippets and append them (with citations) to provider prompts for both endpoints.
+  - Retrieval failures fall back gracefully while emitting actionable warnings/telemetry; Prometheus metrics expose `rag_hits`, `rag_latency`, and `rag_context_chars`.
+  - CLI tooling and docs explain how to rebuild the knowledge base, point at Foundry Local vs. cloud providers, and debug retrieval results.
+  - Automated tests cover chunking, embedding request formation, retrieval filtering, and endpoint wiring (owned by `test_agent`).
+- **Priority:** High
+- **Responsible:** LLM Team (owner TBD)
+- **Dependencies:** 13.1.1 (chat harness useful for validation)
+- **Status:** not-started
+- **Linked Issue:** [#91](https://github.com/TheWizardsCode/GEngine/issues/91)
+- **Risks & Mitigations:**
+  - Risk: Embedding provider differences complicate ingestion; mitigation: wrap calls via Semantic Kernel abstractions and document per-provider requirements.
+  - Risk: Index bloat/stale docs; mitigation: include hashing + `--clean` flag so rebuilds stay deterministic.
+  - Risk: Added latency; mitigation: configurable `rag_top_k`, caching, and visible metrics.
+- **Testing Owner:** `test_agent`
+- **Next Steps:**
+  1. Implement ingestion/embedding script with configurable corpora and providers.
+  2. Add retriever module + settings toggles inside LLM service, including graceful fallbacks.
+  3. Wire prompts to include retrieved snippets + citation metadata for OpenAI/Anthropic/Foundry providers.
+  4. Capture telemetry + debugging endpoints for retrieval context.
+  5. Add docs + troubleshooting plus unit/integration tests.
+- **Last Updated:** 2025-12-14
 
 ### Phase 7: Player Experience ✅ COMPLETE (100%)
 

diff --git a/README.md b/README.md
@@ -703,6 +703,225 @@ The gateway:
 
 The integration uses HTTP retry logic (2 retries by default) and handles LLM service health checks on session creation. This enables conversational gameplay where players use natural language instead of memorizing CLI commands.
 
+## LLM Chat Harness
+
+The LLM chat harness (`scripts/echoes_llm_chat.py`) provides an interactive REPL for testing the LLM service endpoints (`/parse_intent` and `/narrate`) with multi-turn history support. This tool is useful for:
+
+- Testing prompt changes and observing model responses
+- Debugging latency and token usage
+- Running scripted demos against remote environments
+- Validating provider configurations (stub, OpenAI, Anthropic, Foundry)
+
+### Prerequisites
+
+1. **LLM Service Running**: Start the service locally or point to a remote instance:
+   ```bash
+   # Start local service with stub provider (default)
+   uv run echoes-llm-service
+
+   # Or with OpenAI provider
+   export ECHOES_LLM_PROVIDER=openai
+   export ECHOES_LLM_API_KEY=your-api-key
+   export ECHOES_LLM_MODEL=gpt-4
+   uv run echoes-llm-service
+   ```
+
+2. **Python Environment**: The chat client requires `httpx` (already in project dependencies).
+
+### Basic Usage
+
+**Auto-detect Service** (no arguments needed):
+```bash
+# Auto-detects service URL (tries Windows host if in WSL, then localhost)
+uv run python scripts/echoes_llm_chat.py
+
+# Output:
+Auto-detecting LLM service...
+✓ Detected service at http://localhost:8001
+```
+
+**Parse Mode** (default) - Convert natural language to intents:
+```bash
+uv run python scripts/echoes_llm_chat.py --service-url http://localhost:8001
+
+# Example session:
+You: inspect the industrial district
+📋 Intents:
+[
+  {
+    "type": "inspect",
+    "target": "district"
+  }
+]
+⏱  Latency: 45ms
+🎯 Confidence: 0.95
+```
+
+**Narrate Mode** - Generate narrative from game events:
+```bash
+uv run python scripts/echoes_llm_chat.py --service-url http://localhost:8001 --mode narrate
+
+# Example session:
+Events (JSON or text): [{"type": "pollution_increase", "district": "industrial", "amount": 5}]
+📖 Narrative:
+The industrial district's pollution levels rose sharply as factory output increased...
+⏱  Latency: 120ms
+📊 Tokens: 45 in / 32 out
+```
+
+### Command-Line Options
+
+- `--service-url URL`: Base URL of the LLM service (default: auto-detect)
+- `--mode MODE`: Chat mode - `parse` (intent JSON) or `narrate` (story text) (default: `parse`)
+- `--context-file FILE`: Load initial context from JSON file
+- `--history-limit N`: Maximum conversation history entries to keep (default: `10`)
+- `--export FILE`: (deprecated) Export transcript on exit; use `/save` command instead
+
+### Slash Commands
+
+- `/clear` - Clear conversation history
+- `/save <path>` - Save transcript to JSON file
+- `/quit` or `/exit` - Exit the chat interface
+
+### Multi-Turn History
+
+The client automatically maintains conversation history and sends it with each request in the `context.history` field:
+
+```json
+{
+  "user_input": "what happened next?",
+  "context": {
+    "history": [
+      {"role": "user", "content": "inspect district"},
+      {"role": "assistant", "content": "[{\"type\": \"inspect\", \"target\": \"district\"}]"}
+    ]
+  }
+}
+```
+
+History is limited by `--history-limit` (default 10 exchanges) to prevent unbounded token growth.
+
+### Context Files
+
+Use `--context-file` to provide initial game state context:
+
+```bash
+# Create context.json
+cat > context.json << EOF
+{
+  "tick": 42,
+  "district": "industrial-tier",
+  "recent_events": ["pollution_spike", "faction_meeting"]
+}
+EOF
+
+uv run python scripts/echoes_llm_chat.py --context-file context.json
+```
+
+The context merges with conversation history in subsequent requests.
+
+### Transcript Export
+
+Save conversation logs for analysis or sharing:
+
+```bash
+# During session
+You: inspect district
+/save my_session.json
+✓ Transcript saved to my_session.json
+```
+
+Transcript format:
+```json
+{
+  "mode": "parse",
+  "service_url": "http://localhost:8001",
+  "history": [
+    {"role": "user", "content": "inspect district"},
+    {"role": "assistant", "content": "[{\"type\": \"inspect\"}]"}
+  ],
+  "context": {"tick": 42}
+}
+```
+
+**Note**: API keys are NOT included in exported transcripts for security.
+
+### Remote Endpoints
+
+Point the client at any running LLM service:
+
+```bash
+# Staging environment
+uv run python scripts/echoes_llm_chat.py --service-url https://echoes-llm-staging.example.com
+
+# Docker Compose stack
+uv run python scripts/echoes_llm_chat.py --service-url http://localhost:8001
+
+# Kubernetes port-forward
+kubectl port-forward svc/echoes-llm-service 8001:8001
+uv run python scripts/echoes_llm_chat.py --service-url http://localhost:8001
+```
+
+### Troubleshooting
+
+**Auto-detection Fails:**
+- The client tries to detect the service on:
+  1. Windows host IP (when running in WSL) - reads from `/etc/resolv.conf`
+  2. `http://localhost:8001`
+- If both fail, manually specify with `--service-url`
+- Check if service is running: `curl http://localhost:8001/healthz`
+
+**Connection Refused / Timeout:**
+- Verify service is running: `curl http://localhost:8001/healthz`
+- Check Docker: `docker ps | grep llm`
+- Check Kubernetes: `kubectl get pods -l app=echoes-llm-service`
+- If in WSL and accessing Windows host, ensure Windows Firewall allows port 8001
+
+**HTTP 500 / Provider Errors:**
+- Check service logs for authentication failures (OpenAI/Anthropic API keys)
+- Verify provider configuration: `curl http://localhost:8001/healthz` (shows provider + model)
+- Test with stub provider first (no API keys required)
+
+**TLS Certificate Errors (remote endpoints):**
+- Use `http://` for local/dev environments
+- Verify certificate chain for production endpoints
+- Check firewall rules and ingress configuration
+
+**Empty Responses:**
+- Stub provider returns deterministic responses for testing
+- Real providers may fail if quota exceeded or model unavailable
+- Check token limits and rate limiting in service logs
+
+### Provider Configuration
+
+The chat client connects to the service; provider configuration happens server-side via environment variables:
+
+```bash
+# Stub provider (default, no API key needed)
+export ECHOES_LLM_PROVIDER=stub
+uv run echoes-llm-service
+
+# OpenAI
+export ECHOES_LLM_PROVIDER=openai
+export ECHOES_LLM_API_KEY=sk-...
+export ECHOES_LLM_MODEL=gpt-4
+uv run echoes-llm-service
+
+# Anthropic
+export ECHOES_LLM_PROVIDER=anthropic
+export ECHOES_LLM_API_KEY=sk-ant-...
+export ECHOES_LLM_MODEL=claude-3-sonnet-20240229
+uv run echoes-llm-service
+
+# Foundry Local (self-hosted)
+export ECHOES_LLM_PROVIDER=foundry_local
+export ECHOES_LLM_BASE_URL=http://foundry:8000
+export ECHOES_LLM_MODEL=your-model-name
+uv run echoes-llm-service
+```
+
+See `src/gengine/echoes/llm/settings.py` for all available configuration options.
+
 ## Headless Regression Driver
 
 `scripts/run_headless_sim.py` advances long simulations without interactive

diff --git a/gamedev-agent-thoughts.txt b/gamedev-agent-thoughts.txt
@@ -977,15 +977,18 @@ Created 2 new modules and enhanced 2 existing files:
 
 ## [GitHub Copilot] — 2025-12-12 10:05
 - (n/a): Reviewed the README, Azure Foundry Local REST reference, and LLM service modules to outline the work required for wiring the LLM service to the Foundry Local API.
-
-## [GitHub Copilot] — 2025-12-13 16:45
-- d4ac3ee: logged sanitized LLM configuration at echoes-llm-service startup by adding a loggable_dict helper and emitting the structured settings payload during boot.
-
-## [GitHub Copilot] — 2025-12-13 16:50
-- d4ac3ee: ensured the echoes-llm-service configuration log prints when launched via uv by initializing logging if the process lacks handlers.
-
-## [GitHub Copilot] — 2025-12-13 17:05
-- d4ac3ee: switched the LLM configuration log to pretty-printed JSON for easier inspection during Foundry troubleshooting.
-
-## [GitHub Copilot] — 2025-12-13 17:12
-- d4ac3ee: upgraded Foundry Local provider error handling to explain hostname/connection issues and echo actionable troubleshooting hints to callers.
+## [GitHub Copilot] — 2025-12-13 14:30
+- Merged PR #90 (LLM Chat Harness) into local branch.
+- Resolved merge conflicts in `scripts/echoes_llm_chat.py` and `tests/echoes/test_llm_chat_cli.py`.
+- Fixed indentation error in `scripts/echoes_llm_chat.py`.
+- Fixed test failures in `tests/echoes/test_gateway_llm_client.py` by updating test expectations to use `127.0.0.1` instead of `localhost` to avoid WSL auto-discovery logic interference.
+- Verified all relevant tests pass: `tests/echoes/test_llm_chat_cli.py` and `tests/echoes/test_gateway_llm_client.py`.
+- Reviewed and confirmed completeness of PR documentation in README.md.
+
+
+## [GitHub Copilot] — 2025-12-13 14:45
+- d35376c: Added logging to `LLMClient` to print the auto-discovered Windows host IP when running in WSL, aiding in debugging connection issues. Verified with existing tests.
+## [tracker_agent] — 2025-12-14 05:53
+- Reviewed README plus latest repo state, then refreshed .pm/tracker.md header, dashboard rows, risk table, and monitoring stats to reflect new LLM tasks and issue counts.
+- Captured detailed briefs for Tasks 13.1.1 and 13.1.2 (chat harness + RAG) with acceptance criteria, dependencies, test ownership, and next steps; added corresponding risk entries.
+- Created GitHub Issue #91 for the RAG pipeline referencing the Microsoft Semantic Kernel article and Python adaptation plan.