feat(engines): V1 ollama + openai adapters (Phase 1: types only)#13
Merged
Conversation
…penai Phase 1 of V1 engine adapters (ollama + openai). This commit only widens the type system; no runtime behavior changes. - BuiltInEngineName: union of user-configurable engines, now incl. ollama/openai. - EngineName: BuiltInEngineName | "mock" for the test surface. - EngineConfigBase + OllamaConfig + OpenAIConfig + EngineToolsConfig formalize the config shape new engines will read. - EnginesConfig.default narrowed to BuiltInEngineName so mock can never be set as default through config. - modelFor(config, engine) helper replaces 2 inline union-cast lookups in sessions/manager.ts and cron/runner.ts. Future engines no longer pay union-widening tax at every call site. Setup CLI keeps the narrower "claude" | "codex" union deliberately; HTTP engines aren't installed via local-binary detection and belong in config.yaml, not the interactive bootstrap flow. Pre-existing migrate.ts TS error (TS2769) is unrelated to this change (present on main). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 of V1 engine adapters. Provider layer only — no agent loop, no
tools, no engine wrappers yet. Each provider exports a `ProviderCall`
function suitable for the agent loop to consume in Phase 6.
New module: packages/jimmy/src/engines/providers/
types.ts — NormalizedToolCall + ProviderMessage + ProviderCallOpts/
Result + OpenAIAuth/OllamaAuth. The wire-format-agnostic
contract every adapter implements.
pricing.ts — OpenAI per-model rate table (gpt-4o family, gpt-4.1
family, o1/o3/o4-mini, gpt-5 family). openaiCostFor()
returns undefined for unknown models so the pricing gap
surfaces in cost_log as NULL rather than masquerading
as $0 spend. ollamaCostFor() returns 0 (self-hosted).
openai.ts — POST /v1/chat/completions. Parses tool_calls and
converts JSON-string arguments to objects. Trusts
response.model for billed-model identity; falls back
to requested model only when response omits it.
Throws on non-2xx HTTP, transport errors, malformed
payloads. No retries.
ollama.ts — POST /api/chat. Accepts arguments as either object or
JSON string (model/client variance). Synthesizes
call_<uuid> ids when missing. Optional bearer token
via OllamaAuth.token. No retries.
All requests use injected fetch (opts.fetchFn) so tests can mock without
touching the network. Production callers leave it undefined to use
global fetch.
Tests: 38 added (pricing 7, openai 16, ollama 15). Coverage targets:
- happy-path text-only response
- tool_calls normalization across both wire formats (string vs object args)
- finishReason override when tool_calls present even if wire says 'stop'
- id synthesis when provider omits it
- malformed JSON in arguments → specific error
- arguments parsing to non-object (array, primitive) → specific error
- missing function.name → specific error
- HTTP non-2xx → throws with status + body excerpt, fetch called once
(asserts no auto-retry policy)
- transport error → throws with underlying message, fetch called once
- response body not JSON → throws specific error
- response.model lookup vs fallback to requested model
- request shape (URL, headers, body) for both providers
- assistant→tool round-trip serialization (openai stringifies args; ollama
keeps them as object)
- optional auth header presence/absence
- model-ignores-tools fallback (Ollama plain text)
Full test suite still passes: 440/440.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 of V1 engine adapters. Three filesystem tools for HTTP-loop
engines (ollama, openai), all sharing a lexical cwd-jail. No agent loop
wiring yet — these are pure executors with a stable ToolResult contract.
New files (packages/jimmy/src/engines/tools/):
types.ts — ToolExecutionContext + ToolResult shape. Tools return
{ok, content, audit:{...}} rather than throwing on user
errors so the agent loop can feed errors back to the
model as `tool` role messages.
cwdJail.ts — resolveInJail(cwd, requested) + JailViolation error.
Lexical-only: path.resolve + path.relative + reject if
the relative path starts with ".." or is absolute.
Documented limitation: symlink escape is NOT caught
(would need fs.realpath which breaks write-to-new-path).
read.ts — 1-indexed line offset/limit, default 2000 lines, 64k
character truncation cap (overridable via toolOpts.read.
maxChars). Truncation appends a `[truncated: X of Y
characters]` marker so the model sees the boundary.
write.ts — Overwrites; mkdir parent recursively; computes UTF-8
byte length (not char length) for the audit row.
edit.ts — Mirrors Claude Code's Edit semantics: requires unique
old_string match unless replace_all=true; rejects
no-ops (old_string===new_string); refuses empty
old_string. Single counter loop for occurrences.
Tests: 41 added (cwdJail 16, fs tools 25). Coverage:
- cwdJail: relative + absolute resolution, normalized "..", trailing
slashes, NUL bytes, sibling-prefix attack (foo vs foo-OTHER),
JailViolation carries requestedPath + cwd
- read: offset/limit, default cap, override cap, truncation marker
format, jail violation surfaces as ok:false (not throw), ENOENT,
bad-args validation
- write: new file, overwrite, recursive parent mkdir, jail rejection,
empty content as valid truncate-to-empty, UTF-8 byte counting for
multibyte content
- edit: single replace, ambiguous-match refusal, replace_all=true,
no-op rejection, not_found surfacing, empty old_string rejection,
jail rejection, delete-by-empty-new-string
Full suite: 481/481.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…caps Phase 3a — blocking security fix on PR #13 before Phase 4 lands command execution. Review of Phase 3 found four confirmed escape paths through the lexical-only jail; this commit closes all of them and adds a couple of related guards. cwdJail.ts (rewritten): - resolveInJail() is now async and exported as the ONLY resolver. - Lexical normalization (path.resolve + relative + reject ".." / absolute) is a private helper, not exported, so future callers can't pick the unsafe variant by accident. - Two-stage check: lexical first, then fs.realpath of the deepest existing ancestor reconstructed against the trailing not-yet-existing segments. Catches symlinks at the leaf and at any parent dir. - rejectSymlinkLeaf option: when true, refuse if the final segment exists and is itself a symlink, even if the target is inside the jail. write/edit enable this; read tolerates leaf-symlinks-to-inside. - JailViolation now carries `reason`: "lexical_escape", "realpath_escape", or "symlink_leaf". Tools surface this directly as audit.error. read.ts: - Awaits resolveInJail. - Stats before readFile; refuses files > 5 MB with error=too_large + file_bytes in the audit row. The existing 64 KB model-output truncation still runs on top. write.ts: - Awaits resolveInJail with rejectSymlinkLeaf: true. - Explicitly rejects path resolving to the cwd directory itself (error=is_cwd_dir) so the recursive mkdir cannot fire on dirname(cwd). edit.ts: - Awaits resolveInJail with rejectSymlinkLeaf: true. - 5 MB size cap with error=too_large, stat-before-read. Tests rewritten + 23 new regression tests: - cwdJail.test.ts now uses real tmp directories so realpath checks have something to chew on. Covers lexical_escape, realpath_escape via leaf-symlink, realpath_escape via parent-dir symlink, symlink_leaf refusal, allows leaf-symlink-to-inside when not rejecting, allows write-new-file path under rejectSymlinkLeaf. - fs-tools-jail.test.ts: the four documented escape probes from the review pass now flip from "leaks/clobbers" to "ok:false with correct reason code". Plus too_large for read and edit at the 6 MB boundary. Plus is_cwd_dir for write(path='.') and write(path=jail). Full package suite: 498/498. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lowlist, timeout, truncation
Phase 4 of V1 engine adapters. Adds the highest-stakes tool with the
strictest gate. No shell interpretation, no implicit allowlist, no
escape via python -c.
tools/runCommand.ts:
- Spawns via node:child_process spawn() with shell:false. argv is
rejected upfront if any element contains shell metacharacters
(;|&`$<>\\n\\r\\t*?~(){}\\[]!) or NUL bytes — belt-and-suspenders
against future re-shelling.
- Allowlist required: ctx.toolOpts.bash.allowlist must be a non-empty
string array. Missing or [] → audit.error="disabled". argv[0]
basename is matched (lowercased), so /bin/echo and echo route the
same.
- Hardcoded NEVER-LIST overrides the allowlist: sh, bash, zsh, fish,
ksh, csh, tcsh, dash, ash, env, xargs, eval, exec, source. Returns
audit.error="shell_blocked" even if the user adds one explicitly.
- python3 (and python) special path: rejects -c, --command, -m,
--module, -, -i, --interactive; requires a positional script arg;
script must resolve under the cwd jail and exist as a regular file.
- Per-call wall-clock timeout (default 60s). On hit: SIGTERM, wait
killGraceMs (default 3s), SIGKILL. audit.timeout=true, audit.signal
carries SIGTERM|SIGKILL.
- stdout capped at 32 KB, stderr at 16 KB (defaults). Tracked
independently: audit.truncated_stdout, .truncated_stderr,
.original_stdout_bytes, .original_stderr_bytes. Top-level
.truncated is the OR.
- audit row also carries: command, args (each summarized to 200
chars), exit_code, signal, duration_ms, error code.
tools/__tests__/runCommand.test.ts: 66 tests covering:
- tool disabled (no toolOpts; empty allowlist)
- allowlist hit/miss, absolute-path basename matching
- all 14 NEVER-LIST entries (one test each via it.each), absolute
path of /bin/bash, env basename
- 21 metacharacters individually + NUL byte + shell-injection attempt
- python3 -c (both isolated and metachar-coincident cases), -m, stdin,
no-positional, escapes jail, missing script, valid script,
non-banned flags before script
- exit code propagation (0, 1, ENOENT)
- stdout truncation while stderr fits; stderr truncation while
stdout fits; no-truncation case
- timeout fires SIGTERM; stubborn process that ignores SIGTERM
escalates to SIGKILL
- audit row shape (command, args, duration); args >200 chars truncated
- bad arg shapes (missing command, non-array args, non-string elements)
Full package suite: 564 tests passing, stable across 5 consecutive
runs of the runCommand suite (the timing tests in particular).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…klist
Phase 5 of V1 engine adapters. Builds the network-side of the V1 tool
surface. Uses node:http / node:https request() (not bare fetch) so a
custom net.LookupFunction validates the actual socket address — bare
fetch in Node can't intercept the connect-time DNS lookup, which is
the DNS-rebinding seam.
ipBlocklist.ts (192L):
- checkIPv4 + checkIPv6 + parseIpLiteral, pure functions.
- IPv4 ranges: 0/8, 10/8, 100.64/10 (CGNAT), 127/8, 169.254/16,
172.16/12, 192.168/16, 224/4, 255.255.255.255/32.
- IPv6: ::, ::1, fc00::/7, fe80::/10, ff00::/8, IPv4-mapped (::ffff:v4)
validated through embedded v4 check.
- Reason codes (blocked_loopback, blocked_private, blocked_cgnat,
blocked_link_local, blocked_unique_local, blocked_multicast,
blocked_unspecified, blocked_broadcast) flow through audit.error.
- IPv6 parser handles "::" compression, zone-id suffix (%iface),
and v4-embedded suffix in one pass.
webfetch.ts (496L):
- Scheme gate: http: / https: only.
- Pre-resolve via dns.lookup({all:true}) — uses OS resolver +
/etc/hosts (matches what the socket-time lookup sees; resolve4/6
miss /etc/hosts entries like localhost).
- Custom net.LookupFunction re-validates the address at connect
time, defense in depth against DNS rebinding.
- Redirect loop: max 5 hops, same-scheme only, every redirect target
re-validated from scratch.
- Total wall-clock timeout (default 15s) covers the whole call
including all redirects.
- Body capped at 2 MB raw — the socket is destroyed on overflow,
NOT buffered then truncated. Model-output truncation (default 64k
chars) runs on top.
- Content-Type whitelist (text/*, application/json/xml/yaml/atom/rss).
- allowPrivate opt: default false. Allows private-network targets
only when explicitly enabled per engine config.
- All errors structured; never throws to caller.
Tests (49 total):
- ipBlocklist: 23 tests, every IPv4/IPv6 block class + edges + IPv4-
mapped IPv6 + zone-id suffix + parseIpLiteral.
- webfetch: 26 tests against a fixture HTTP server.
Happy path: text, JSON, HTML.
Scheme rejection: file://, ftp://, gopher://.
Content-type: rejects octet-stream.
IP literal blocks (allowPrivate=false): 127.0.0.1, 10.0.0.1,
192.168.1.1, 169.254.169.254 (AWS metadata!), 100.64.0.1,
[::1], [fe80::1], [::ffff:127.0.0.1].
Hostname: http://localhost/ blocks via DNS pre-resolve.
Redirects: single safe redirect, redirect-to-private (chain
captured, dial fails fast), scheme-change refused, file://
refused, redirect loop hits limit at 6 hops.
Byte cap: 3 MB response with 256 KB cap → audit.truncated=true,
original_bytes >= 256 KB.
Timeout: server holds connection → audit.error=timeout.
Bad inputs: missing url, malformed URL, non-string url.
Full package suite: 613/613, stable across 3 consecutive runs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rray defense
Two small follow-ups from the Phase 5 review pass on webfetch.
1. Export buildLookup + add direct unit tests (7) covering the
address-validator path the http.request lookup option invokes:
- strict mode rejects localhost (resolves to loopback),
10.0.0.1 (literal private), ::1 (loopback), 169.254.169.254
(link-local / AWS metadata)
- permissive mode (allowPrivate=true) accepts the same inputs
- public hostname probe (one.one.one.one); tolerates offline test
env by no-op on ENOTFOUND/EAI_AGAIN
The integration test in webfetch.test.ts already proves the chain
end-to-end via fixture server requests; these tests pin the
validator so a regression surfaces immediately, independent of
HTTP server availability.
2. Defensive Location-header unwrap. Node's IncomingHttpHeaders types
`location` as `string | string[]` even though HTTP spec says it
is single-valued. `String([a,b])` would have comma-joined the
array into a broken URL. Unwrap to first element when an array
sneaks through.
Full package suite: 620/620.
Review pass also empirically confirmed (no code changes needed):
- IP parsing handles fe80 full uncompressed, fec0:: deliberately
NOT blocked (deprecated site-local), all 4 boundaries of
fc00/fdff/febf/fec0 and ff00/ffff, uncompressed loopback +
v4-mapped forms, zone-id suffix (Node 22 net.isIPv6 accepts
%iface), all 4 CGNAT edges
- Timer cleanup: req.on("close", clearTimeout) covers end/error/
timeout/byte-cap paths; race between settle() and 'close' is
safe because the timer callback is gated by `settled` flag
- Byte cap aborts on the cap-crossing chunk: aborted=true,
push partial chunk up to cap, req.destroy(), settle();
subsequent in-flight chunks dropped via `if(aborted) return`
- Content-type whitelist runs BEFORE the data listener attaches,
so binary bodies never accumulate any bytes
- currentScheme is fixed to initial scheme (intentional V1
posture: no http→https upgrade across redirects)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 6 of V1 engine adapters. The three pieces wire together the
provider adapters (Phase 2), the filesystem tools (Phase 3+3a), the
runCommand tool (Phase 4), and webfetch (Phase 5) into a working
agent loop without any concrete engine wrapper yet (Phase 7).
tools/schemas.ts:
JSON-schema definitions for read/write/edit/bash/webfetch in OpenAI
function-calling format (Ollama-compatible). Descriptions kept terse —
every token here lands in every prompt.
tools/index.ts:
buildToolRegistry(toolsConfig) walks EngineToolsConfig.enabled and
returns:
- executors: Map<toolName, ToolExecutor> (Map for fast dispatch)
- schemas: ProviderToolDef[] (in config order)
- unknownRequested: string[] (forward-compat names)
Missing / empty enabled → text-only mode (empty registry, empty schemas).
Unknown names recorded, not thrown — engine wrapper logs a warning.
engines/audit.ts:
sanitizeArgsForAudit(args) walks the JSON tree:
- Redacts keys matching api_key/authorization/token/secret/
password/bearer/cookie (case-insensitive regexes)
- Truncates string values to 200 chars
- Caps recursion depth at 5
buildAuditRow() produces an AuditRow whose ONLY keys are: toolName,
argsSummary, durationMs, error, truncated, resultBytes, exitCode,
httpStatus. Full tool output (stdout, stderr, body) is NEVER copied
in — model already saw it in the conversation; logging it twice
doubles storage and creates a leak surface for secrets the model
observed.
AuditLogger is an interface; the actual sink (sqlite write, log
pipe, etc.) is injected by the engine wrapper in Phase 7.
engines/agentLoop.ts:
runAgentLoop(opts) is the provider-agnostic loop. Error taxonomy
enforced by distinct AgentLoopResult kinds:
- "ok" → terminal assistant message (no toolCalls)
- "provider_error" → adapter threw (parse / transport / non-2xx)
- "max_turns" → maxTurns exhausted without terminal message
- "timeout" → wall-clock deadline crossed
Tool execution errors stay model-visible: the executor's structured
{ok:false, content, audit} result is fed back as a {role:"tool"}
message and the loop continues. Unknown tool calls produce a
synthetic unknown_tool result with the same shape. Tool executors
that throw unexpectedly are caught and rewrapped as tool_exception
results — they never abort the loop.
Pre-call gates: wall-clock deadline is checked BEFORE every provider
call AND BEFORE every individual tool call inside a multi-tool turn.
Audit logger called once per tool invocation when configured;
audit.record() failures are swallowed so audit-sink issues never
break the loop itself.
Tests added (40 across 3 files):
audit.test.ts (16):
- secret-key redaction (api_key, Authorization, apiKey camelcase,
password/token/secret/cookie, nested fields, arrays)
- 200-char string truncation
- depth-5 cap
- buildAuditRow output is exactly the documented whitelist
- bash exit_code captured, webfetch http_status captured
- "FULL FILE CONTENTS" sentinel does NOT appear in serialized
AuditRow (content-leak negative test)
tools/__tests__/registry.test.ts (8):
- text-only mode (undefined and [])
- filtering, order preservation, dedup
- unknown tool names recorded in unknownRequested without throw
- all 5 known tools resolvable
- schema shape sanity
__tests__/agentLoop.test.ts (15):
- text-only mode passes empty tools[] to provider
- single tool call → terminal text
- multi-tool turn (2 calls in one assistant message, both executed)
- max_turns exhaustion at the configured limit
- timeout before provider call (deadline expires between turns)
- timeout before tool call (deadline expires mid multi-tool turn)
- unknown tool feeds back unknown_tool JSON and loop continues
- tool exception feeds back tool_exception JSON and loop continues
- provider parse/transport error aborts with kind=provider_error
at turn 0 (initial) and mid-loop
- audit records one row per tool call including unknown/exception
- audit sink failure does NOT break the loop
- token + billed-model accounting across turns
Full package suite: 655/655, stable 3 consecutive runs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… visibility)
Three findings from the Phase 6 review pass, all addressed before
Phase 7 wires the loop into live engines.
1. URL credential leak in audit.argsSummary (CRITICAL).
Probe confirmed that URLs with ?api_key=, ?token=, ?password=,
?signature=, ?secret=, ?key= query parameters AND userinfo
(https://user:pass@host) leaked into the audit row verbatim. Tool
args are passed to sanitizeArgsForAudit which only redacted by
key-NAME (e.g. an "api_key" object key); the URL string itself was
stored untouched and only truncated at 200 chars — long tokens
survived even truncation.
Fix: redactUrl() runs on http(s) string values BEFORE truncation:
- Strips username + password from userinfo
- Walks searchParams, replaces values for keys matching
SECRET_QUERY_PATTERNS (api[_-]?key, access[_-]?token, token,
auth, authorization, secret, password, key, sig(nature)?)
- Non-secret query params preserved as debug signal
The key name itself is kept ("api_key=[redacted]") because its
presence is a useful signal, just not its value.
2. AgentLoopResult missing durationMs.
Wrappers can't populate EngineResult.durationMs without external
timing, splitting responsibility for one obvious field. Added
`durationMs: number` to AgentLoopBase so every result kind (ok,
provider_error, max_turns, timeout) carries the loop's own
wall-clock measurement.
3. Audit sink failures silently swallowed.
safeAudit() previously had a bare `catch {}`. The comment claimed
"the engine wrapper's logger will surface persistent audit-sink
issues" but nothing in the loop actually forwarded the error.
Fix: route audit failures through logger.warn with the tool name
and underlying message. Still doesn't abort the loop.
Tests added (14):
audit.test.ts (+8):
- api_key, token, access_token, password, secret, signature, sig,
key query-string variants
- userinfo stripping
- non-credential URL passthrough
- URLs nested in arrays (matches webfetch redirect_chain shape)
- long token redacted BEFORE truncation (regression for
tail-survival)
- malformed URL doesn't crash, returns untouched
agentLoop.test.ts (+6):
- durationMs present on each of ok/provider_error/max_turns/
timeout result kinds; timeout's durationMs >= configured budget
- audit sink throwing produces a logger.warn line carrying the
tool name and underlying error message
- repeated unknown-tool calls converge to max_turns cleanly with
token accumulation across turns
Review confirmed clean by inspection (no code change needed):
- Turn counting: one provider call = one turn (loop iteration
bound by maxTurns)
- Empty tools[] handling: both providers omit the tools field
entirely when toolSchemas.length === 0 (providers/openai.ts +
providers/ollama.ts both check .length > 0)
- Token accumulation: provider usage added on every loop iteration
including tool-followup calls
Full package suite: 669/669, stable across 3 runs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rver registration
Wraps the provider adapters (Phase 2), filesystem tools (Phase 3+3a),
runCommand (Phase 4), webfetch (Phase 5), and agent loop (Phase 6+6a)
into two complete Engine implementations and wires them into the
gateway boot only when configured.
engines/ollama.ts (~170L):
- Implements Engine interface.
- Constructor validates config.url (throws if missing); reads
process.env[config.authTokenEnvVar ?? "OLLAMA_TOKEN"] (optional).
- Builds the tool registry from config.tools; warns via logger on
unknown tool names rather than throwing.
- Translates EngineToolsConfig.{bashAllowlist, bash, read, webfetch}
into the runtime ToolExecutionContext.toolOpts shape the tool
executors expect (e.g. bashAllowlist → toolOpts.bash.allowlist).
- rejectUnsupported() is exported here and reused by OpenAIEngine;
it returns a descriptive error string for resumeSessionId,
mcpConfigPath, attachments, cliFlags BEFORE any provider HTTP call.
- run() resolves model as opts.model || config.model || error;
sessionId as opts.sessionId || randomUUID(); maps AgentLoopResult
to EngineResult preserving turns, durationMs, cost (always 0 for
Ollama via ollamaCostFor).
engines/openai.ts (~155L):
- Same shape. Construction reads process.env[apiKeyEnvVar ??
"OPENAI_API_KEY"]; throws if missing or empty.
- Cost computed via openaiCostFor(billedModels[0], promptTokens,
completionTokens). Returns undefined for unknown models so the
cost_log row records NULL — plus a logger.warn surfaces the
pricing gap for the weekly rollup.
gateway/server.ts:
- Engines Map widened from
Map<string, ClaudeEngine | CodexEngine | GeminiEngine>
to Map<string, Engine> so HTTP-loop engines coexist with
CLI-spawning ones. Process-killing paths (claudeEngine.killAll()
etc.) keep their original typed references — no behavior change.
- Ollama / OpenAI registered ONLY when config.engines.{name}
exists; construction errors (missing url / apiKey) log a warning
via logger.warn and SKIP registration rather than crashing the
gateway boot. Health/route surfaces will reflect what's actually
available.
- Imports the engine classes lazily via dynamic import() so a
misconfigured opt-in engine can't break gateway startup.
Tests added (28):
Construction validation:
- Ollama throws on empty url; succeeds with just url; reads
OLLAMA_TOKEN by default; respects custom authTokenEnvVar.
- OpenAI throws on missing/empty OPENAI_API_KEY; succeeds when
set; respects custom apiKeyEnvVar.
- Unknown tool names in config.tools.enabled → warn, not throw.
Unsupported-feature rejection (with provider-call-count assertion
that proves NO provider call happened):
- resumeSessionId, mcpConfigPath, attachments, cliFlags each
return EngineResult.error matching its specific reason.
- Empty attachments / cliFlags arrays do NOT trip the gate.
Model resolution:
- opts.model takes precedence over config.model.
- config.model is the fallback when opts.model is absent.
- Neither set → EngineResult.error = "no model resolved".
SessionId:
- opts.sessionId preserved when present (both engines).
- randomUUID() v4 generated when absent.
Cost mapping:
- Ollama always 0 regardless of token counts.
- OpenAI computes correctly for known model.
- OpenAI returns undefined (NOT 0) for unknown billed model.
EngineResult shape:
- Happy path carries result + cost + durationMs + numTurns; error
undefined.
- Provider error carries result="" + error="ollama: provider_error:
<message>" + durationMs + numTurns + accumulated cost.
Existing fallback behavior unchanged: sessions.fallbackEngine="codex"
type is preserved; the Map widening only relaxed the entry-value
type, not the cron/manager lookup pathway. Process-killing engine
references (claudeEngine, codexEngine) keep their concrete typed
references throughout shutdown.
Full package suite: 697/697, stable 3 consecutive runs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tup on construction failure
Three findings from the Phase 7 review pass.
1. /api/status no longer hardcodes the engines list.
The endpoint previously emitted a static object with claude/codex/
gemini keys, hardcoded inline. After Phase 7 wired in opt-in HTTP
engines (ollama, openai), the status payload would NEVER show them
even when they were live. Now /api/status iterates the engines Map
held on ApiContext and emits one entry per actually-registered
engine. The behavior delta the operator sees:
- engines.ollama appears iff config.engines.ollama is set AND
construction succeeded
- engines.openai appears iff config.engines.openai is set AND
OPENAI_API_KEY (or configured env var) is non-empty
- engines.gemini still only appears when the gemini block is
present in config (same as before)
ApiContext gains `engines: Map<string, Engine>` so the api layer
has the truth-of-registration. Server.ts populates it during
boot. Only server.ts constructs ApiContext (verified by grep);
no other call sites to update.
2. Opt-in engine construction now FAILS gateway boot loudly.
Previously a missing apiKey, malformed url, or any engine
construction error was caught and logged with logger.warn, and
the gateway continued without that engine registered. Per the
review, declaring engines.<name> in config is a commitment —
silent degradation lets misconfigurations go undiagnosed.
The new policy: if config.engines.<name> is present and
construction throws, server.ts rethrows with operator-facing
guidance:
Error: engines.ollama is declared but engine construction
failed: ollama: config.url is required ...
→ Fix the config OR remove the engines.ollama block to opt out.
To opt out cleanly: remove the engines.<name> block entirely. To
keep it enabled: ensure the env var / url is valid before boot.
3. Route-failure path documented (no code change).
When a cron declares engine="ollama" but the engine isn't in the
Map (e.g. unconfigured), the existing sessions/manager.ts:230
path logs an error and delivers `Error: engine "ollama" not
available.` via the cron's configured delivery connector. The
operator sees the error in Telegram/WhatsApp / wherever the cron
delivers. Cron run-log records the route() return value, which
does not include the async runSession() error, so the run-log
shows "success" — this is pre-existing behavior across all
engines and out of scope for V1 to change.
Full package suite: 697/697 (no regressions; api.ts change is
a thin iteration, server.ts change preserves existing tests).
Other review findings, confirmed clean by inspection:
- Config loading env interpolation: engines read process.env
directly in their constructor (no separate loader step needed —
the config carries an apiKeyEnvVar name, not the secret value).
- Tool config defaults: undefined tools.enabled → empty registry
(text-only). bashAllowlist not auto-populated when bash isn't
enabled; an explicitly empty bashAllowlist means "disabled" per
the V1 deny-by-default posture.
- sessions.fallbackEngine type still narrows to "codex" (existing
pre-Phase-7 behavior). Out-of-scope for V1; will document as
"existing fallback behavior, not provider fallback" in Phase 8.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…xture
Final V1 phase: documentation + a structural-validation fixture
for the first cron migration target.
engines/README.md (~230 lines):
- Engine roster + compatibility matrix (V1 supports / doesn't)
- Explicit V1 limitations called out per cron's docs
- Two doc points called out by the review:
1. STARTUP CONTRACT: declaring engines.ollama / engines.openai
is a commitment. Bad config means the gateway will not boot.
Remove the block to opt out cleanly.
2. CRON RUN-LOG ASYNC LIMITATION: route() returns synchronously
BEFORE runSession()'s async outcome. Run-log "success" does
NOT prove the session actually succeeded. Migration
monitoring must use the SHADOW's delivered outputs +
sessions table last_error + gateway logs, NOT the cron
run-log alone.
- Config example covering ollama + openai blocks, env-var sourcing,
tool registry filtering, deny-by-default bash allowlist semantics,
webfetch allowPrivate posture.
- Cost reporting: ollama=0 always, openai=undefined-not-0 on unknown
pricing.
- Audit row shape: explicit "never contains tool output content".
- 7-step migration recipe + rollback (one JSON edit).
- Operational visibility: /api/status iterates registered engines;
cron route failure path via existing manager.ts:230; sessions.
fallbackEngine clarified as Claude rate-limit fallback (existing
behavior), NOT provider fallback.
- V1 known limitations list + file map.
engines/__tests__/fixtures/report-url/:
- input.txt: representative real-shape input — Pahang MB statement
responding to flood-relief protests, political response surface,
>1h old, multi-stakeholder content. Calibrated to clearly classify
as as=issue + tenants=[pahang].
- expected-classification.schema.json: documents the required output
structure (fields, types, enum values) WITHOUT pinning prose.
Marked with fixture_intent comment so future maintainers
understand what the fixture is for.
engines/__tests__/report-url.fixture.test.ts (2 tests):
- Happy path: agent loop with mock provider producing well-formed
classification JSON. Asserts kind=ok, parses as JSON object,
required fields present, `as` in enum, `confidence` in [0,1],
`tenants` non-empty string array. Deliberately does NOT assert
exact prose of `reason`.
- Structural guard: bad model output (array instead of object)
is detected by the parser and surfaces as null. This is the
line of defense BEFORE downstream writes warroom_my.issue.
Full package suite: 699/699.
PR #13 readiness summary:
- 10 commits across 8 phases + 4 review-pass follow-ups
- ~3,560 LOC net production code
- ~3,000 LOC tests
- 699 tests, stable across multiple suite runs
- Type-check clean for all touched files (pre-existing migrate.ts
TS2769 unchanged from main)
- Three review passes completed (Phase 3a, 6a, 7a) with all
findings closed
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…crets
V1 final follow-up. The config fields engines.openai.apiKeyEnvVar and
engines.ollama.authTokenEnvVar take an env var NAME (e.g.
"OPENAI_API_KEY"), not the secret value. Pasting the key directly
was a documented foot-gun: the engine would do process.env["sk-..."]
which is always undefined → engine constructs unauthenticated → first
provider call fails with a confusing 401.
Added assertEnvVarName() (exported from ollama.ts so both wrappers
share one validator) that requires the POSIX env var name regex
^[A-Z_][A-Z0-9_]*$. Constructors throw at boot with operator-facing
guidance pointing at the env var name vs value distinction. Error
message never echoes the value — masks to length only — so a
misconfigured config can't leak the secret into gateway logs.
Tests added (8 across both wrappers):
- rejects authTokenEnvVar that looks like a token value
(e.g. "sk-abc-XYZ-123" — hyphens disqualify)
- rejects lowercase env var names
- rejects leading-digit names ("1KEY")
- rejects empty strings with a separate diagnostic
- error message never contains the raw value (sentinel string
"sk-LEAK-DO-NOT-LOG-..." verified absent)
- same coverage for OpenAI apiKeyEnvVar
README.md release-notes section explains the convention, points at
~/.jinn/.env as the recommended secret source, and documents the
masking guarantee on error messages.
Full package suite: 707/707.
PR #13 is now ready for review/merge per Phase 8 boundary.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two issues surfaced by GitHub Actions on PR #13 head commit f01a5a1. 1. cli/migrate.ts:190 TS2769. Pre-Phase-1 the engines.default union was narrowly typed to "claude" | "codex" | "gemini" — all three have `bin: string` required. The Phase 1 widening to BuiltInEngineName added ollama and openai which have no `bin` (they're HTTP-only), so `config.engines[defaultEngine].bin` widened to `string | undefined`, breaking execFileSync(file: string, ...). I claimed this was pre-existing in the Phase 1 commit message; that was wrong. The widening is what introduced it. CI catches what my local check missed (origin/main pre-PR-12 didn't have CI configured, so I didn't see it run). Fix: migrate is a CLI-spawn operation that only makes sense for claude/codex/gemini engines. If the operator's default is an HTTP engine, fall back to claude for the migration step. Coalesce `cliEngineBin: string = engineConfig.bin ?? config.engines.claude.bin` so TS narrows correctly. 2. fs-tools-jail.test.ts: "write(.) does not mkdir parent-of-cwd" was asserting `parentBefore.sort() === parentAfter.sort()` on `path.dirname(jail)`. On Linux CI that's `/tmp`, which contains transient system mounts (`.ICE-unix`, `.X11-unix`, etc.) — the listing flapped between `readdir` calls and the test failed inconsistently. Fix: the is_cwd_dir guard short-circuits BEFORE the recursive mkdir runs, so we don't need to inspect the parent's listing to prove the guard fired. The audit code assertion alone is sufficient. Also verify the jail dir itself stayed a directory (didn't get clobbered into a file). Full package suite: 707/707, including the corrected test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
First commit on a multi-phase branch adding two new engine adapters (
ollama,openai) so cron jobs can run on cheaper / self-hosted models alongside Claude. Goal: cut the structural cache-write cost (currently ~38% of weekly cron spend on Claude SITREPs) by moving tool-light workloads off Claude entirely.This PR is Phase 1 only — type/config surface, no runtime behavior change.
Subsequent phases will land as additional commits on this same branch:
cwdJail+read/write/edittoolsrunCommand(argv-only bash, allowlist, python3 restrictions)webfetch(DNS-rebinding mitigation, IP blocklist, truncation)agentLoop+ audit logWhat this commit does
BuiltInEngineName("claude" | "codex" | "gemini" | "ollama" | "openai") andEngineName = BuiltInEngineName | "mock"split —mockcannot leak into user-configurable defaults.EngineConfigBase,OllamaConfig,OpenAIConfig,EngineToolsConfig,EnginesConfigtypes formalize the config shape new engines will read.modelFor(config, engine)helper replaces 2 inline union-cast lookups (sessions/manager.ts:937,cron/runner.ts:67). Future engines no longer pay union-widening tax at every call site.Explicit callouts
gateway/server.tshappens in Phase 7.setup.tsintentionally unchanged. The interactive bootstrap detects local CLI binaries (claude/codex); HTTP engines belong inconfig.yaml, not the install flow.mockexcluded from configurable engine defaults.EnginesConfig.default: BuiltInEngineNameenforces this at the type level.cli/migrate.ts:190is unchanged from main.Test plan
pnpm exec tsc --noEmitshows no new errors beyond pre-existing migrate.tsOut of scope (this commit; lands in later phases on same branch)
🤖 Generated with Claude Code