feat(engines): V1 ollama + openai adapters (Phase 1: types only) by nyem69 · Pull Request #13 · nyem69/jinn

nyem69 · 2026-05-17T16:23:17Z

Summary

First commit on a multi-phase branch adding two new engine adapters (ollama, openai) so cron jobs can run on cheaper / self-hosted models alongside Claude. Goal: cut the structural cache-write cost (currently ~38% of weekly cron spend on Claude SITREPs) by moving tool-light workloads off Claude entirely.

This PR is Phase 1 only — type/config surface, no runtime behavior change.

Subsequent phases will land as additional commits on this same branch:

Phase 2: pricing table + provider HTTP adapters (with fixture tests)
Phase 3: shared cwdJail + read / write / edit tools
Phase 4: runCommand (argv-only bash, allowlist, python3 restrictions)
Phase 5: webfetch (DNS-rebinding mitigation, IP blocklist, truncation)
Phase 6: provider-agnostic agentLoop + audit log
Phase 7: engine wrappers + server registration + unsupported-feature errors
Phase 8: docs + golden fixture for migration target

What this commit does

BuiltInEngineName ("claude" | "codex" | "gemini" | "ollama" | "openai") and EngineName = BuiltInEngineName | "mock" split — mock cannot leak into user-configurable defaults.
EngineConfigBase, OllamaConfig, OpenAIConfig, EngineToolsConfig, EnginesConfig types formalize the config shape new engines will read.
modelFor(config, engine) helper replaces 2 inline union-cast lookups (sessions/manager.ts:937, cron/runner.ts:67). Future engines no longer pay union-widening tax at every call site.

Explicit callouts

Phase 1 only — no HTTP engines wired yet. Adding ollama/openai to the engines Map in gateway/server.ts happens in Phase 7.
setup.ts intentionally unchanged. The interactive bootstrap detects local CLI binaries (claude/codex); HTTP engines belong in config.yaml, not the install flow.
mock excluded from configurable engine defaults. EnginesConfig.default: BuiltInEngineName enforces this at the type level.
Typecheck status: clean for touched files. One pre-existing TS2769 error in cli/migrate.ts:190 is unchanged from main.

Test plan

pnpm exec tsc --noEmit shows no new errors beyond pre-existing migrate.ts
Existing unit tests still pass (no behavior change so should be a no-op)
Manual smoke: gateway starts cleanly, existing claude/codex/gemini cron jobs continue to run unchanged

Out of scope (this commit; lands in later phases on same branch)

Any provider HTTP, tool execution, agent loop, audit log
Documentation — Phase 8 once full surface is settled

🤖 Generated with Claude Code

…penai Phase 1 of V1 engine adapters (ollama + openai). This commit only widens the type system; no runtime behavior changes. - BuiltInEngineName: union of user-configurable engines, now incl. ollama/openai. - EngineName: BuiltInEngineName | "mock" for the test surface. - EngineConfigBase + OllamaConfig + OpenAIConfig + EngineToolsConfig formalize the config shape new engines will read. - EnginesConfig.default narrowed to BuiltInEngineName so mock can never be set as default through config. - modelFor(config, engine) helper replaces 2 inline union-cast lookups in sessions/manager.ts and cron/runner.ts. Future engines no longer pay union-widening tax at every call site. Setup CLI keeps the narrower "claude" | "codex" union deliberately; HTTP engines aren't installed via local-binary detection and belong in config.yaml, not the interactive bootstrap flow. Pre-existing migrate.ts TS error (TS2769) is unrelated to this change (present on main). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phase 2 of V1 engine adapters. Provider layer only — no agent loop, no tools, no engine wrappers yet. Each provider exports a `ProviderCall` function suitable for the agent loop to consume in Phase 6. New module: packages/jimmy/src/engines/providers/ types.ts — NormalizedToolCall + ProviderMessage + ProviderCallOpts/ Result + OpenAIAuth/OllamaAuth. The wire-format-agnostic contract every adapter implements. pricing.ts — OpenAI per-model rate table (gpt-4o family, gpt-4.1 family, o1/o3/o4-mini, gpt-5 family). openaiCostFor() returns undefined for unknown models so the pricing gap surfaces in cost_log as NULL rather than masquerading as $0 spend. ollamaCostFor() returns 0 (self-hosted). openai.ts — POST /v1/chat/completions. Parses tool_calls and converts JSON-string arguments to objects. Trusts response.model for billed-model identity; falls back to requested model only when response omits it. Throws on non-2xx HTTP, transport errors, malformed payloads. No retries. ollama.ts — POST /api/chat. Accepts arguments as either object or JSON string (model/client variance). Synthesizes call_<uuid> ids when missing. Optional bearer token via OllamaAuth.token. No retries. All requests use injected fetch (opts.fetchFn) so tests can mock without touching the network. Production callers leave it undefined to use global fetch. Tests: 38 added (pricing 7, openai 16, ollama 15). Coverage targets: - happy-path text-only response - tool_calls normalization across both wire formats (string vs object args) - finishReason override when tool_calls present even if wire says 'stop' - id synthesis when provider omits it - malformed JSON in arguments → specific error - arguments parsing to non-object (array, primitive) → specific error - missing function.name → specific error - HTTP non-2xx → throws with status + body excerpt, fetch called once (asserts no auto-retry policy) - transport error → throws with underlying message, fetch called once - response body not JSON → throws specific error - response.model lookup vs fallback to requested model - request shape (URL, headers, body) for both providers - assistant→tool round-trip serialization (openai stringifies args; ollama keeps them as object) - optional auth header presence/absence - model-ignores-tools fallback (Ollama plain text) Full test suite still passes: 440/440. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phase 3 of V1 engine adapters. Three filesystem tools for HTTP-loop engines (ollama, openai), all sharing a lexical cwd-jail. No agent loop wiring yet — these are pure executors with a stable ToolResult contract. New files (packages/jimmy/src/engines/tools/): types.ts — ToolExecutionContext + ToolResult shape. Tools return {ok, content, audit:{...}} rather than throwing on user errors so the agent loop can feed errors back to the model as `tool` role messages. cwdJail.ts — resolveInJail(cwd, requested) + JailViolation error. Lexical-only: path.resolve + path.relative + reject if the relative path starts with ".." or is absolute. Documented limitation: symlink escape is NOT caught (would need fs.realpath which breaks write-to-new-path). read.ts — 1-indexed line offset/limit, default 2000 lines, 64k character truncation cap (overridable via toolOpts.read. maxChars). Truncation appends a `[truncated: X of Y characters]` marker so the model sees the boundary. write.ts — Overwrites; mkdir parent recursively; computes UTF-8 byte length (not char length) for the audit row. edit.ts — Mirrors Claude Code's Edit semantics: requires unique old_string match unless replace_all=true; rejects no-ops (old_string===new_string); refuses empty old_string. Single counter loop for occurrences. Tests: 41 added (cwdJail 16, fs tools 25). Coverage: - cwdJail: relative + absolute resolution, normalized "..", trailing slashes, NUL bytes, sibling-prefix attack (foo vs foo-OTHER), JailViolation carries requestedPath + cwd - read: offset/limit, default cap, override cap, truncation marker format, jail violation surfaces as ok:false (not throw), ENOENT, bad-args validation - write: new file, overwrite, recursive parent mkdir, jail rejection, empty content as valid truncate-to-empty, UTF-8 byte counting for multibyte content - edit: single replace, ambiguous-match refusal, replace_all=true, no-op rejection, not_found surfacing, empty old_string rejection, jail rejection, delete-by-empty-new-string Full suite: 481/481. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…caps Phase 3a — blocking security fix on PR #13 before Phase 4 lands command execution. Review of Phase 3 found four confirmed escape paths through the lexical-only jail; this commit closes all of them and adds a couple of related guards. cwdJail.ts (rewritten): - resolveInJail() is now async and exported as the ONLY resolver. - Lexical normalization (path.resolve + relative + reject ".." / absolute) is a private helper, not exported, so future callers can't pick the unsafe variant by accident. - Two-stage check: lexical first, then fs.realpath of the deepest existing ancestor reconstructed against the trailing not-yet-existing segments. Catches symlinks at the leaf and at any parent dir. - rejectSymlinkLeaf option: when true, refuse if the final segment exists and is itself a symlink, even if the target is inside the jail. write/edit enable this; read tolerates leaf-symlinks-to-inside. - JailViolation now carries `reason`: "lexical_escape", "realpath_escape", or "symlink_leaf". Tools surface this directly as audit.error. read.ts: - Awaits resolveInJail. - Stats before readFile; refuses files > 5 MB with error=too_large + file_bytes in the audit row. The existing 64 KB model-output truncation still runs on top. write.ts: - Awaits resolveInJail with rejectSymlinkLeaf: true. - Explicitly rejects path resolving to the cwd directory itself (error=is_cwd_dir) so the recursive mkdir cannot fire on dirname(cwd). edit.ts: - Awaits resolveInJail with rejectSymlinkLeaf: true. - 5 MB size cap with error=too_large, stat-before-read. Tests rewritten + 23 new regression tests: - cwdJail.test.ts now uses real tmp directories so realpath checks have something to chew on. Covers lexical_escape, realpath_escape via leaf-symlink, realpath_escape via parent-dir symlink, symlink_leaf refusal, allows leaf-symlink-to-inside when not rejecting, allows write-new-file path under rejectSymlinkLeaf. - fs-tools-jail.test.ts: the four documented escape probes from the review pass now flip from "leaks/clobbers" to "ok:false with correct reason code". Plus too_large for read and edit at the 6 MB boundary. Plus is_cwd_dir for write(path='.') and write(path=jail). Full package suite: 498/498. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…lowlist, timeout, truncation Phase 4 of V1 engine adapters. Adds the highest-stakes tool with the strictest gate. No shell interpretation, no implicit allowlist, no escape via python -c. tools/runCommand.ts: - Spawns via node:child_process spawn() with shell:false. argv is rejected upfront if any element contains shell metacharacters (;|&`$<>\\n\\r\\t*?~(){}\\[]!) or NUL bytes — belt-and-suspenders against future re-shelling. - Allowlist required: ctx.toolOpts.bash.allowlist must be a non-empty string array. Missing or [] → audit.error="disabled". argv[0] basename is matched (lowercased), so /bin/echo and echo route the same. - Hardcoded NEVER-LIST overrides the allowlist: sh, bash, zsh, fish, ksh, csh, tcsh, dash, ash, env, xargs, eval, exec, source. Returns audit.error="shell_blocked" even if the user adds one explicitly. - python3 (and python) special path: rejects -c, --command, -m, --module, -, -i, --interactive; requires a positional script arg; script must resolve under the cwd jail and exist as a regular file. - Per-call wall-clock timeout (default 60s). On hit: SIGTERM, wait killGraceMs (default 3s), SIGKILL. audit.timeout=true, audit.signal carries SIGTERM|SIGKILL. - stdout capped at 32 KB, stderr at 16 KB (defaults). Tracked independently: audit.truncated_stdout, .truncated_stderr, .original_stdout_bytes, .original_stderr_bytes. Top-level .truncated is the OR. - audit row also carries: command, args (each summarized to 200 chars), exit_code, signal, duration_ms, error code. tools/__tests__/runCommand.test.ts: 66 tests covering: - tool disabled (no toolOpts; empty allowlist) - allowlist hit/miss, absolute-path basename matching - all 14 NEVER-LIST entries (one test each via it.each), absolute path of /bin/bash, env basename - 21 metacharacters individually + NUL byte + shell-injection attempt - python3 -c (both isolated and metachar-coincident cases), -m, stdin, no-positional, escapes jail, missing script, valid script, non-banned flags before script - exit code propagation (0, 1, ENOENT) - stdout truncation while stderr fits; stderr truncation while stdout fits; no-truncation case - timeout fires SIGTERM; stubborn process that ignores SIGTERM escalates to SIGKILL - audit row shape (command, args, duration); args >200 chars truncated - bad arg shapes (missing command, non-array args, non-string elements) Full package suite: 564 tests passing, stable across 5 consecutive runs of the runCommand suite (the timing tests in particular). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…klist Phase 5 of V1 engine adapters. Builds the network-side of the V1 tool surface. Uses node:http / node:https request() (not bare fetch) so a custom net.LookupFunction validates the actual socket address — bare fetch in Node can't intercept the connect-time DNS lookup, which is the DNS-rebinding seam. ipBlocklist.ts (192L): - checkIPv4 + checkIPv6 + parseIpLiteral, pure functions. - IPv4 ranges: 0/8, 10/8, 100.64/10 (CGNAT), 127/8, 169.254/16, 172.16/12, 192.168/16, 224/4, 255.255.255.255/32. - IPv6: ::, ::1, fc00::/7, fe80::/10, ff00::/8, IPv4-mapped (::ffff:v4) validated through embedded v4 check. - Reason codes (blocked_loopback, blocked_private, blocked_cgnat, blocked_link_local, blocked_unique_local, blocked_multicast, blocked_unspecified, blocked_broadcast) flow through audit.error. - IPv6 parser handles "::" compression, zone-id suffix (%iface), and v4-embedded suffix in one pass. webfetch.ts (496L): - Scheme gate: http: / https: only. - Pre-resolve via dns.lookup({all:true}) — uses OS resolver + /etc/hosts (matches what the socket-time lookup sees; resolve4/6 miss /etc/hosts entries like localhost). - Custom net.LookupFunction re-validates the address at connect time, defense in depth against DNS rebinding. - Redirect loop: max 5 hops, same-scheme only, every redirect target re-validated from scratch. - Total wall-clock timeout (default 15s) covers the whole call including all redirects. - Body capped at 2 MB raw — the socket is destroyed on overflow, NOT buffered then truncated. Model-output truncation (default 64k chars) runs on top. - Content-Type whitelist (text/*, application/json/xml/yaml/atom/rss). - allowPrivate opt: default false. Allows private-network targets only when explicitly enabled per engine config. - All errors structured; never throws to caller. Tests (49 total): - ipBlocklist: 23 tests, every IPv4/IPv6 block class + edges + IPv4- mapped IPv6 + zone-id suffix + parseIpLiteral. - webfetch: 26 tests against a fixture HTTP server. Happy path: text, JSON, HTML. Scheme rejection: file://, ftp://, gopher://. Content-type: rejects octet-stream. IP literal blocks (allowPrivate=false): 127.0.0.1, 10.0.0.1, 192.168.1.1, 169.254.169.254 (AWS metadata!), 100.64.0.1, [::1], [fe80::1], [::ffff:127.0.0.1]. Hostname: http://localhost/ blocks via DNS pre-resolve. Redirects: single safe redirect, redirect-to-private (chain captured, dial fails fast), scheme-change refused, file:// refused, redirect loop hits limit at 6 hops. Byte cap: 3 MB response with 256 KB cap → audit.truncated=true, original_bytes >= 256 KB. Timeout: server holds connection → audit.error=timeout. Bad inputs: missing url, malformed URL, non-string url. Full package suite: 613/613, stable across 3 consecutive runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…rray defense Two small follow-ups from the Phase 5 review pass on webfetch. 1. Export buildLookup + add direct unit tests (7) covering the address-validator path the http.request lookup option invokes: - strict mode rejects localhost (resolves to loopback), 10.0.0.1 (literal private), ::1 (loopback), 169.254.169.254 (link-local / AWS metadata) - permissive mode (allowPrivate=true) accepts the same inputs - public hostname probe (one.one.one.one); tolerates offline test env by no-op on ENOTFOUND/EAI_AGAIN The integration test in webfetch.test.ts already proves the chain end-to-end via fixture server requests; these tests pin the validator so a regression surfaces immediately, independent of HTTP server availability. 2. Defensive Location-header unwrap. Node's IncomingHttpHeaders types `location` as `string | string[]` even though HTTP spec says it is single-valued. `String([a,b])` would have comma-joined the array into a broken URL. Unwrap to first element when an array sneaks through. Full package suite: 620/620. Review pass also empirically confirmed (no code changes needed): - IP parsing handles fe80 full uncompressed, fec0:: deliberately NOT blocked (deprecated site-local), all 4 boundaries of fc00/fdff/febf/fec0 and ff00/ffff, uncompressed loopback + v4-mapped forms, zone-id suffix (Node 22 net.isIPv6 accepts %iface), all 4 CGNAT edges - Timer cleanup: req.on("close", clearTimeout) covers end/error/ timeout/byte-cap paths; race between settle() and 'close' is safe because the timer callback is gated by `settled` flag - Byte cap aborts on the cap-crossing chunk: aborted=true, push partial chunk up to cap, req.destroy(), settle(); subsequent in-flight chunks dropped via `if(aborted) return` - Content-type whitelist runs BEFORE the data listener attaches, so binary bodies never accumulate any bytes - currentScheme is fixed to initial scheme (intentional V1 posture: no http→https upgrade across redirects) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Phase 6 of V1 engine adapters. The three pieces wire together the provider adapters (Phase 2), the filesystem tools (Phase 3+3a), the runCommand tool (Phase 4), and webfetch (Phase 5) into a working agent loop without any concrete engine wrapper yet (Phase 7). tools/schemas.ts: JSON-schema definitions for read/write/edit/bash/webfetch in OpenAI function-calling format (Ollama-compatible). Descriptions kept terse — every token here lands in every prompt. tools/index.ts: buildToolRegistry(toolsConfig) walks EngineToolsConfig.enabled and returns: - executors: Map<toolName, ToolExecutor> (Map for fast dispatch) - schemas: ProviderToolDef[] (in config order) - unknownRequested: string[] (forward-compat names) Missing / empty enabled → text-only mode (empty registry, empty schemas). Unknown names recorded, not thrown — engine wrapper logs a warning. engines/audit.ts: sanitizeArgsForAudit(args) walks the JSON tree: - Redacts keys matching api_key/authorization/token/secret/ password/bearer/cookie (case-insensitive regexes) - Truncates string values to 200 chars - Caps recursion depth at 5 buildAuditRow() produces an AuditRow whose ONLY keys are: toolName, argsSummary, durationMs, error, truncated, resultBytes, exitCode, httpStatus. Full tool output (stdout, stderr, body) is NEVER copied in — model already saw it in the conversation; logging it twice doubles storage and creates a leak surface for secrets the model observed. AuditLogger is an interface; the actual sink (sqlite write, log pipe, etc.) is injected by the engine wrapper in Phase 7. engines/agentLoop.ts: runAgentLoop(opts) is the provider-agnostic loop. Error taxonomy enforced by distinct AgentLoopResult kinds: - "ok" → terminal assistant message (no toolCalls) - "provider_error" → adapter threw (parse / transport / non-2xx) - "max_turns" → maxTurns exhausted without terminal message - "timeout" → wall-clock deadline crossed Tool execution errors stay model-visible: the executor's structured {ok:false, content, audit} result is fed back as a {role:"tool"} message and the loop continues. Unknown tool calls produce a synthetic unknown_tool result with the same shape. Tool executors that throw unexpectedly are caught and rewrapped as tool_exception results — they never abort the loop. Pre-call gates: wall-clock deadline is checked BEFORE every provider call AND BEFORE every individual tool call inside a multi-tool turn. Audit logger called once per tool invocation when configured; audit.record() failures are swallowed so audit-sink issues never break the loop itself. Tests added (40 across 3 files): audit.test.ts (16): - secret-key redaction (api_key, Authorization, apiKey camelcase, password/token/secret/cookie, nested fields, arrays) - 200-char string truncation - depth-5 cap - buildAuditRow output is exactly the documented whitelist - bash exit_code captured, webfetch http_status captured - "FULL FILE CONTENTS" sentinel does NOT appear in serialized AuditRow (content-leak negative test) tools/__tests__/registry.test.ts (8): - text-only mode (undefined and []) - filtering, order preservation, dedup - unknown tool names recorded in unknownRequested without throw - all 5 known tools resolvable - schema shape sanity __tests__/agentLoop.test.ts (15): - text-only mode passes empty tools[] to provider - single tool call → terminal text - multi-tool turn (2 calls in one assistant message, both executed) - max_turns exhaustion at the configured limit - timeout before provider call (deadline expires between turns) - timeout before tool call (deadline expires mid multi-tool turn) - unknown tool feeds back unknown_tool JSON and loop continues - tool exception feeds back tool_exception JSON and loop continues - provider parse/transport error aborts with kind=provider_error at turn 0 (initial) and mid-loop - audit records one row per tool call including unknown/exception - audit sink failure does NOT break the loop - token + billed-model accounting across turns Full package suite: 655/655, stable 3 consecutive runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… visibility) Three findings from the Phase 6 review pass, all addressed before Phase 7 wires the loop into live engines. 1. URL credential leak in audit.argsSummary (CRITICAL). Probe confirmed that URLs with ?api_key=, ?token=, ?password=, ?signature=, ?secret=, ?key= query parameters AND userinfo (https://user:pass@host) leaked into the audit row verbatim. Tool args are passed to sanitizeArgsForAudit which only redacted by key-NAME (e.g. an "api_key" object key); the URL string itself was stored untouched and only truncated at 200 chars — long tokens survived even truncation. Fix: redactUrl() runs on http(s) string values BEFORE truncation: - Strips username + password from userinfo - Walks searchParams, replaces values for keys matching SECRET_QUERY_PATTERNS (api[_-]?key, access[_-]?token, token, auth, authorization, secret, password, key, sig(nature)?) - Non-secret query params preserved as debug signal The key name itself is kept ("api_key=[redacted]") because its presence is a useful signal, just not its value. 2. AgentLoopResult missing durationMs. Wrappers can't populate EngineResult.durationMs without external timing, splitting responsibility for one obvious field. Added `durationMs: number` to AgentLoopBase so every result kind (ok, provider_error, max_turns, timeout) carries the loop's own wall-clock measurement. 3. Audit sink failures silently swallowed. safeAudit() previously had a bare `catch {}`. The comment claimed "the engine wrapper's logger will surface persistent audit-sink issues" but nothing in the loop actually forwarded the error. Fix: route audit failures through logger.warn with the tool name and underlying message. Still doesn't abort the loop. Tests added (14): audit.test.ts (+8): - api_key, token, access_token, password, secret, signature, sig, key query-string variants - userinfo stripping - non-credential URL passthrough - URLs nested in arrays (matches webfetch redirect_chain shape) - long token redacted BEFORE truncation (regression for tail-survival) - malformed URL doesn't crash, returns untouched agentLoop.test.ts (+6): - durationMs present on each of ok/provider_error/max_turns/ timeout result kinds; timeout's durationMs >= configured budget - audit sink throwing produces a logger.warn line carrying the tool name and underlying error message - repeated unknown-tool calls converge to max_turns cleanly with token accumulation across turns Review confirmed clean by inspection (no code change needed): - Turn counting: one provider call = one turn (loop iteration bound by maxTurns) - Empty tools[] handling: both providers omit the tools field entirely when toolSchemas.length === 0 (providers/openai.ts + providers/ollama.ts both check .length > 0) - Token accumulation: provider usage added on every loop iteration including tool-followup calls Full package suite: 669/669, stable across 3 runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…rver registration Wraps the provider adapters (Phase 2), filesystem tools (Phase 3+3a), runCommand (Phase 4), webfetch (Phase 5), and agent loop (Phase 6+6a) into two complete Engine implementations and wires them into the gateway boot only when configured. engines/ollama.ts (~170L): - Implements Engine interface. - Constructor validates config.url (throws if missing); reads process.env[config.authTokenEnvVar ?? "OLLAMA_TOKEN"] (optional). - Builds the tool registry from config.tools; warns via logger on unknown tool names rather than throwing. - Translates EngineToolsConfig.{bashAllowlist, bash, read, webfetch} into the runtime ToolExecutionContext.toolOpts shape the tool executors expect (e.g. bashAllowlist → toolOpts.bash.allowlist). - rejectUnsupported() is exported here and reused by OpenAIEngine; it returns a descriptive error string for resumeSessionId, mcpConfigPath, attachments, cliFlags BEFORE any provider HTTP call. - run() resolves model as opts.model || config.model || error; sessionId as opts.sessionId || randomUUID(); maps AgentLoopResult to EngineResult preserving turns, durationMs, cost (always 0 for Ollama via ollamaCostFor). engines/openai.ts (~155L): - Same shape. Construction reads process.env[apiKeyEnvVar ?? "OPENAI_API_KEY"]; throws if missing or empty. - Cost computed via openaiCostFor(billedModels[0], promptTokens, completionTokens). Returns undefined for unknown models so the cost_log row records NULL — plus a logger.warn surfaces the pricing gap for the weekly rollup. gateway/server.ts: - Engines Map widened from Map<string, ClaudeEngine | CodexEngine | GeminiEngine> to Map<string, Engine> so HTTP-loop engines coexist with CLI-spawning ones. Process-killing paths (claudeEngine.killAll() etc.) keep their original typed references — no behavior change. - Ollama / OpenAI registered ONLY when config.engines.{name} exists; construction errors (missing url / apiKey) log a warning via logger.warn and SKIP registration rather than crashing the gateway boot. Health/route surfaces will reflect what's actually available. - Imports the engine classes lazily via dynamic import() so a misconfigured opt-in engine can't break gateway startup. Tests added (28): Construction validation: - Ollama throws on empty url; succeeds with just url; reads OLLAMA_TOKEN by default; respects custom authTokenEnvVar. - OpenAI throws on missing/empty OPENAI_API_KEY; succeeds when set; respects custom apiKeyEnvVar. - Unknown tool names in config.tools.enabled → warn, not throw. Unsupported-feature rejection (with provider-call-count assertion that proves NO provider call happened): - resumeSessionId, mcpConfigPath, attachments, cliFlags each return EngineResult.error matching its specific reason. - Empty attachments / cliFlags arrays do NOT trip the gate. Model resolution: - opts.model takes precedence over config.model. - config.model is the fallback when opts.model is absent. - Neither set → EngineResult.error = "no model resolved". SessionId: - opts.sessionId preserved when present (both engines). - randomUUID() v4 generated when absent. Cost mapping: - Ollama always 0 regardless of token counts. - OpenAI computes correctly for known model. - OpenAI returns undefined (NOT 0) for unknown billed model. EngineResult shape: - Happy path carries result + cost + durationMs + numTurns; error undefined. - Provider error carries result="" + error="ollama: provider_error: <message>" + durationMs + numTurns + accumulated cost. Existing fallback behavior unchanged: sessions.fallbackEngine="codex" type is preserved; the Map widening only relaxed the entry-value type, not the cron/manager lookup pathway. Process-killing engine references (claudeEngine, codexEngine) keep their concrete typed references throughout shutdown. Full package suite: 697/697, stable 3 consecutive runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…tup on construction failure Three findings from the Phase 7 review pass. 1. /api/status no longer hardcodes the engines list. The endpoint previously emitted a static object with claude/codex/ gemini keys, hardcoded inline. After Phase 7 wired in opt-in HTTP engines (ollama, openai), the status payload would NEVER show them even when they were live. Now /api/status iterates the engines Map held on ApiContext and emits one entry per actually-registered engine. The behavior delta the operator sees: - engines.ollama appears iff config.engines.ollama is set AND construction succeeded - engines.openai appears iff config.engines.openai is set AND OPENAI_API_KEY (or configured env var) is non-empty - engines.gemini still only appears when the gemini block is present in config (same as before) ApiContext gains `engines: Map<string, Engine>` so the api layer has the truth-of-registration. Server.ts populates it during boot. Only server.ts constructs ApiContext (verified by grep); no other call sites to update. 2. Opt-in engine construction now FAILS gateway boot loudly. Previously a missing apiKey, malformed url, or any engine construction error was caught and logged with logger.warn, and the gateway continued without that engine registered. Per the review, declaring engines.<name> in config is a commitment — silent degradation lets misconfigurations go undiagnosed. The new policy: if config.engines.<name> is present and construction throws, server.ts rethrows with operator-facing guidance: Error: engines.ollama is declared but engine construction failed: ollama: config.url is required ... → Fix the config OR remove the engines.ollama block to opt out. To opt out cleanly: remove the engines.<name> block entirely. To keep it enabled: ensure the env var / url is valid before boot. 3. Route-failure path documented (no code change). When a cron declares engine="ollama" but the engine isn't in the Map (e.g. unconfigured), the existing sessions/manager.ts:230 path logs an error and delivers `Error: engine "ollama" not available.` via the cron's configured delivery connector. The operator sees the error in Telegram/WhatsApp / wherever the cron delivers. Cron run-log records the route() return value, which does not include the async runSession() error, so the run-log shows "success" — this is pre-existing behavior across all engines and out of scope for V1 to change. Full package suite: 697/697 (no regressions; api.ts change is a thin iteration, server.ts change preserves existing tests). Other review findings, confirmed clean by inspection: - Config loading env interpolation: engines read process.env directly in their constructor (no separate loader step needed — the config carries an apiKeyEnvVar name, not the secret value). - Tool config defaults: undefined tools.enabled → empty registry (text-only). bashAllowlist not auto-populated when bash isn't enabled; an explicitly empty bashAllowlist means "disabled" per the V1 deny-by-default posture. - sessions.fallbackEngine type still narrows to "codex" (existing pre-Phase-7 behavior). Out-of-scope for V1; will document as "existing fallback behavior, not provider fallback" in Phase 8. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…xture Final V1 phase: documentation + a structural-validation fixture for the first cron migration target. engines/README.md (~230 lines): - Engine roster + compatibility matrix (V1 supports / doesn't) - Explicit V1 limitations called out per cron's docs - Two doc points called out by the review: 1. STARTUP CONTRACT: declaring engines.ollama / engines.openai is a commitment. Bad config means the gateway will not boot. Remove the block to opt out cleanly. 2. CRON RUN-LOG ASYNC LIMITATION: route() returns synchronously BEFORE runSession()'s async outcome. Run-log "success" does NOT prove the session actually succeeded. Migration monitoring must use the SHADOW's delivered outputs + sessions table last_error + gateway logs, NOT the cron run-log alone. - Config example covering ollama + openai blocks, env-var sourcing, tool registry filtering, deny-by-default bash allowlist semantics, webfetch allowPrivate posture. - Cost reporting: ollama=0 always, openai=undefined-not-0 on unknown pricing. - Audit row shape: explicit "never contains tool output content". - 7-step migration recipe + rollback (one JSON edit). - Operational visibility: /api/status iterates registered engines; cron route failure path via existing manager.ts:230; sessions. fallbackEngine clarified as Claude rate-limit fallback (existing behavior), NOT provider fallback. - V1 known limitations list + file map. engines/__tests__/fixtures/report-url/: - input.txt: representative real-shape input — Pahang MB statement responding to flood-relief protests, political response surface, >1h old, multi-stakeholder content. Calibrated to clearly classify as as=issue + tenants=[pahang]. - expected-classification.schema.json: documents the required output structure (fields, types, enum values) WITHOUT pinning prose. Marked with fixture_intent comment so future maintainers understand what the fixture is for. engines/__tests__/report-url.fixture.test.ts (2 tests): - Happy path: agent loop with mock provider producing well-formed classification JSON. Asserts kind=ok, parses as JSON object, required fields present, `as` in enum, `confidence` in [0,1], `tenants` non-empty string array. Deliberately does NOT assert exact prose of `reason`. - Structural guard: bad model output (array instead of object) is detected by the parser and surfaces as null. This is the line of defense BEFORE downstream writes warroom_my.issue. Full package suite: 699/699. PR #13 readiness summary: - 10 commits across 8 phases + 4 review-pass follow-ups - ~3,560 LOC net production code - ~3,000 LOC tests - 699 tests, stable across multiple suite runs - Type-check clean for all touched files (pre-existing migrate.ts TS2769 unchanged from main) - Three review passes completed (Phase 3a, 6a, 7a) with all findings closed Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…crets V1 final follow-up. The config fields engines.openai.apiKeyEnvVar and engines.ollama.authTokenEnvVar take an env var NAME (e.g. "OPENAI_API_KEY"), not the secret value. Pasting the key directly was a documented foot-gun: the engine would do process.env["sk-..."] which is always undefined → engine constructs unauthenticated → first provider call fails with a confusing 401. Added assertEnvVarName() (exported from ollama.ts so both wrappers share one validator) that requires the POSIX env var name regex ^[A-Z_][A-Z0-9_]*$. Constructors throw at boot with operator-facing guidance pointing at the env var name vs value distinction. Error message never echoes the value — masks to length only — so a misconfigured config can't leak the secret into gateway logs. Tests added (8 across both wrappers): - rejects authTokenEnvVar that looks like a token value (e.g. "sk-abc-XYZ-123" — hyphens disqualify) - rejects lowercase env var names - rejects leading-digit names ("1KEY") - rejects empty strings with a separate diagnostic - error message never contains the raw value (sentinel string "sk-LEAK-DO-NOT-LOG-..." verified absent) - same coverage for OpenAI apiKeyEnvVar README.md release-notes section explains the convention, points at ~/.jinn/.env as the recommended secret source, and documents the masking guarantee on error messages. Full package suite: 707/707. PR #13 is now ready for review/merge per Phase 8 boundary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two issues surfaced by GitHub Actions on PR #13 head commit f01a5a1. 1. cli/migrate.ts:190 TS2769. Pre-Phase-1 the engines.default union was narrowly typed to "claude" | "codex" | "gemini" — all three have `bin: string` required. The Phase 1 widening to BuiltInEngineName added ollama and openai which have no `bin` (they're HTTP-only), so `config.engines[defaultEngine].bin` widened to `string | undefined`, breaking execFileSync(file: string, ...). I claimed this was pre-existing in the Phase 1 commit message; that was wrong. The widening is what introduced it. CI catches what my local check missed (origin/main pre-PR-12 didn't have CI configured, so I didn't see it run). Fix: migrate is a CLI-spawn operation that only makes sense for claude/codex/gemini engines. If the operator's default is an HTTP engine, fall back to claude for the migration step. Coalesce `cliEngineBin: string = engineConfig.bin ?? config.engines.claude.bin` so TS narrows correctly. 2. fs-tools-jail.test.ts: "write(.) does not mkdir parent-of-cwd" was asserting `parentBefore.sort() === parentAfter.sort()` on `path.dirname(jail)`. On Linux CI that's `/tmp`, which contains transient system mounts (`.ICE-unix`, `.X11-unix`, etc.) — the listing flapped between `readdir` calls and the test failed inconsistently. Fix: the is_cwd_dir guard short-circuits BEFORE the recursive mkdir runs, so we don't need to inspect the parent's listing to prove the guard fired. The audit code assertion alone is sufficient. Also verify the jail dir itself stayed a directory (didn't get clobbered into a file). Full package suite: 707/707, including the corrected test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

nyem69 and others added 14 commits May 18, 2026 00:22

nyem69 marked this pull request as ready for review May 18, 2026 04:27

nyem69 merged commit 0fe584f into main May 18, 2026
3 checks passed

nyem69 deleted the feat/engines-ollama-openai branch May 18, 2026 04:27

nyem69 mentioned this pull request May 18, 2026

feat(engines): SqliteAuditLogger + boot wiring (V1 follow-up #19) #14

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(engines): V1 ollama + openai adapters (Phase 1: types only)#13

feat(engines): V1 ollama + openai adapters (Phase 1: types only)#13
nyem69 merged 14 commits into
mainfrom
feat/engines-ollama-openai

nyem69 commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nyem69 commented May 17, 2026

Summary

What this commit does

Explicit callouts

Test plan

Out of scope (this commit; lands in later phases on same branch)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant