Skip to content

feat(engines): V1 ollama + openai adapters (Phase 1: types only)#13

Merged
nyem69 merged 14 commits into
mainfrom
feat/engines-ollama-openai
May 18, 2026
Merged

feat(engines): V1 ollama + openai adapters (Phase 1: types only)#13
nyem69 merged 14 commits into
mainfrom
feat/engines-ollama-openai

Conversation

@nyem69

@nyem69 nyem69 commented May 17, 2026

Copy link
Copy Markdown
Owner

Summary

First commit on a multi-phase branch adding two new engine adapters (ollama, openai) so cron jobs can run on cheaper / self-hosted models alongside Claude. Goal: cut the structural cache-write cost (currently ~38% of weekly cron spend on Claude SITREPs) by moving tool-light workloads off Claude entirely.

This PR is Phase 1 only — type/config surface, no runtime behavior change.

Subsequent phases will land as additional commits on this same branch:

  • Phase 2: pricing table + provider HTTP adapters (with fixture tests)
  • Phase 3: shared cwdJail + read / write / edit tools
  • Phase 4: runCommand (argv-only bash, allowlist, python3 restrictions)
  • Phase 5: webfetch (DNS-rebinding mitigation, IP blocklist, truncation)
  • Phase 6: provider-agnostic agentLoop + audit log
  • Phase 7: engine wrappers + server registration + unsupported-feature errors
  • Phase 8: docs + golden fixture for migration target

What this commit does

  • BuiltInEngineName ("claude" | "codex" | "gemini" | "ollama" | "openai") and EngineName = BuiltInEngineName | "mock" split — mock cannot leak into user-configurable defaults.
  • EngineConfigBase, OllamaConfig, OpenAIConfig, EngineToolsConfig, EnginesConfig types formalize the config shape new engines will read.
  • modelFor(config, engine) helper replaces 2 inline union-cast lookups (sessions/manager.ts:937, cron/runner.ts:67). Future engines no longer pay union-widening tax at every call site.

Explicit callouts

  • Phase 1 only — no HTTP engines wired yet. Adding ollama/openai to the engines Map in gateway/server.ts happens in Phase 7.
  • setup.ts intentionally unchanged. The interactive bootstrap detects local CLI binaries (claude/codex); HTTP engines belong in config.yaml, not the install flow.
  • mock excluded from configurable engine defaults. EnginesConfig.default: BuiltInEngineName enforces this at the type level.
  • Typecheck status: clean for touched files. One pre-existing TS2769 error in cli/migrate.ts:190 is unchanged from main.

Test plan

  • pnpm exec tsc --noEmit shows no new errors beyond pre-existing migrate.ts
  • Existing unit tests still pass (no behavior change so should be a no-op)
  • Manual smoke: gateway starts cleanly, existing claude/codex/gemini cron jobs continue to run unchanged

Out of scope (this commit; lands in later phases on same branch)

  • Any provider HTTP, tool execution, agent loop, audit log
  • Documentation — Phase 8 once full surface is settled

🤖 Generated with Claude Code

nyem69 and others added 14 commits May 18, 2026 00:22
…penai

Phase 1 of V1 engine adapters (ollama + openai). This commit only widens
the type system; no runtime behavior changes.

- BuiltInEngineName: union of user-configurable engines, now incl. ollama/openai.
- EngineName: BuiltInEngineName | "mock" for the test surface.
- EngineConfigBase + OllamaConfig + OpenAIConfig + EngineToolsConfig
  formalize the config shape new engines will read.
- EnginesConfig.default narrowed to BuiltInEngineName so mock can never
  be set as default through config.
- modelFor(config, engine) helper replaces 2 inline union-cast lookups
  in sessions/manager.ts and cron/runner.ts. Future engines no longer
  pay union-widening tax at every call site.

Setup CLI keeps the narrower "claude" | "codex" union deliberately; HTTP
engines aren't installed via local-binary detection and belong in
config.yaml, not the interactive bootstrap flow.

Pre-existing migrate.ts TS error (TS2769) is unrelated to this change
(present on main).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 of V1 engine adapters. Provider layer only — no agent loop, no
tools, no engine wrappers yet. Each provider exports a `ProviderCall`
function suitable for the agent loop to consume in Phase 6.

New module: packages/jimmy/src/engines/providers/

  types.ts      — NormalizedToolCall + ProviderMessage + ProviderCallOpts/
                  Result + OpenAIAuth/OllamaAuth. The wire-format-agnostic
                  contract every adapter implements.

  pricing.ts    — OpenAI per-model rate table (gpt-4o family, gpt-4.1
                  family, o1/o3/o4-mini, gpt-5 family). openaiCostFor()
                  returns undefined for unknown models so the pricing gap
                  surfaces in cost_log as NULL rather than masquerading
                  as $0 spend. ollamaCostFor() returns 0 (self-hosted).

  openai.ts     — POST /v1/chat/completions. Parses tool_calls and
                  converts JSON-string arguments to objects. Trusts
                  response.model for billed-model identity; falls back
                  to requested model only when response omits it.
                  Throws on non-2xx HTTP, transport errors, malformed
                  payloads. No retries.

  ollama.ts     — POST /api/chat. Accepts arguments as either object or
                  JSON string (model/client variance). Synthesizes
                  call_<uuid> ids when missing. Optional bearer token
                  via OllamaAuth.token. No retries.

All requests use injected fetch (opts.fetchFn) so tests can mock without
touching the network. Production callers leave it undefined to use
global fetch.

Tests: 38 added (pricing 7, openai 16, ollama 15). Coverage targets:
  - happy-path text-only response
  - tool_calls normalization across both wire formats (string vs object args)
  - finishReason override when tool_calls present even if wire says 'stop'
  - id synthesis when provider omits it
  - malformed JSON in arguments → specific error
  - arguments parsing to non-object (array, primitive) → specific error
  - missing function.name → specific error
  - HTTP non-2xx → throws with status + body excerpt, fetch called once
    (asserts no auto-retry policy)
  - transport error → throws with underlying message, fetch called once
  - response body not JSON → throws specific error
  - response.model lookup vs fallback to requested model
  - request shape (URL, headers, body) for both providers
  - assistant→tool round-trip serialization (openai stringifies args; ollama
    keeps them as object)
  - optional auth header presence/absence
  - model-ignores-tools fallback (Ollama plain text)

Full test suite still passes: 440/440.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 of V1 engine adapters. Three filesystem tools for HTTP-loop
engines (ollama, openai), all sharing a lexical cwd-jail. No agent loop
wiring yet — these are pure executors with a stable ToolResult contract.

New files (packages/jimmy/src/engines/tools/):
  types.ts    — ToolExecutionContext + ToolResult shape. Tools return
                {ok, content, audit:{...}} rather than throwing on user
                errors so the agent loop can feed errors back to the
                model as `tool` role messages.

  cwdJail.ts  — resolveInJail(cwd, requested) + JailViolation error.
                Lexical-only: path.resolve + path.relative + reject if
                the relative path starts with ".." or is absolute.
                Documented limitation: symlink escape is NOT caught
                (would need fs.realpath which breaks write-to-new-path).

  read.ts     — 1-indexed line offset/limit, default 2000 lines, 64k
                character truncation cap (overridable via toolOpts.read.
                maxChars). Truncation appends a `[truncated: X of Y
                characters]` marker so the model sees the boundary.

  write.ts    — Overwrites; mkdir parent recursively; computes UTF-8
                byte length (not char length) for the audit row.

  edit.ts     — Mirrors Claude Code's Edit semantics: requires unique
                old_string match unless replace_all=true; rejects
                no-ops (old_string===new_string); refuses empty
                old_string. Single counter loop for occurrences.

Tests: 41 added (cwdJail 16, fs tools 25). Coverage:
  - cwdJail: relative + absolute resolution, normalized "..", trailing
    slashes, NUL bytes, sibling-prefix attack (foo vs foo-OTHER),
    JailViolation carries requestedPath + cwd
  - read: offset/limit, default cap, override cap, truncation marker
    format, jail violation surfaces as ok:false (not throw), ENOENT,
    bad-args validation
  - write: new file, overwrite, recursive parent mkdir, jail rejection,
    empty content as valid truncate-to-empty, UTF-8 byte counting for
    multibyte content
  - edit: single replace, ambiguous-match refusal, replace_all=true,
    no-op rejection, not_found surfacing, empty old_string rejection,
    jail rejection, delete-by-empty-new-string

Full suite: 481/481.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…caps

Phase 3a — blocking security fix on PR #13 before Phase 4 lands command
execution. Review of Phase 3 found four confirmed escape paths through
the lexical-only jail; this commit closes all of them and adds a couple
of related guards.

cwdJail.ts (rewritten):
  - resolveInJail() is now async and exported as the ONLY resolver.
  - Lexical normalization (path.resolve + relative + reject ".." /
    absolute) is a private helper, not exported, so future callers can't
    pick the unsafe variant by accident.
  - Two-stage check: lexical first, then fs.realpath of the deepest
    existing ancestor reconstructed against the trailing not-yet-existing
    segments. Catches symlinks at the leaf and at any parent dir.
  - rejectSymlinkLeaf option: when true, refuse if the final segment
    exists and is itself a symlink, even if the target is inside the
    jail. write/edit enable this; read tolerates leaf-symlinks-to-inside.
  - JailViolation now carries `reason`: "lexical_escape", "realpath_escape",
    or "symlink_leaf". Tools surface this directly as audit.error.

read.ts:
  - Awaits resolveInJail.
  - Stats before readFile; refuses files > 5 MB with error=too_large
    + file_bytes in the audit row. The existing 64 KB model-output
    truncation still runs on top.

write.ts:
  - Awaits resolveInJail with rejectSymlinkLeaf: true.
  - Explicitly rejects path resolving to the cwd directory itself
    (error=is_cwd_dir) so the recursive mkdir cannot fire on
    dirname(cwd).

edit.ts:
  - Awaits resolveInJail with rejectSymlinkLeaf: true.
  - 5 MB size cap with error=too_large, stat-before-read.

Tests rewritten + 23 new regression tests:
  - cwdJail.test.ts now uses real tmp directories so realpath checks
    have something to chew on. Covers lexical_escape, realpath_escape
    via leaf-symlink, realpath_escape via parent-dir symlink, symlink_leaf
    refusal, allows leaf-symlink-to-inside when not rejecting, allows
    write-new-file path under rejectSymlinkLeaf.
  - fs-tools-jail.test.ts: the four documented escape probes from the
    review pass now flip from "leaks/clobbers" to "ok:false with correct
    reason code". Plus too_large for read and edit at the 6 MB boundary.
    Plus is_cwd_dir for write(path='.') and write(path=jail).

Full package suite: 498/498.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lowlist, timeout, truncation

Phase 4 of V1 engine adapters. Adds the highest-stakes tool with the
strictest gate. No shell interpretation, no implicit allowlist, no
escape via python -c.

tools/runCommand.ts:
  - Spawns via node:child_process spawn() with shell:false. argv is
    rejected upfront if any element contains shell metacharacters
    (;|&`$<>\\n\\r\\t*?~(){}\\[]!) or NUL bytes — belt-and-suspenders
    against future re-shelling.
  - Allowlist required: ctx.toolOpts.bash.allowlist must be a non-empty
    string array. Missing or [] → audit.error="disabled". argv[0]
    basename is matched (lowercased), so /bin/echo and echo route the
    same.
  - Hardcoded NEVER-LIST overrides the allowlist: sh, bash, zsh, fish,
    ksh, csh, tcsh, dash, ash, env, xargs, eval, exec, source. Returns
    audit.error="shell_blocked" even if the user adds one explicitly.
  - python3 (and python) special path: rejects -c, --command, -m,
    --module, -, -i, --interactive; requires a positional script arg;
    script must resolve under the cwd jail and exist as a regular file.
  - Per-call wall-clock timeout (default 60s). On hit: SIGTERM, wait
    killGraceMs (default 3s), SIGKILL. audit.timeout=true, audit.signal
    carries SIGTERM|SIGKILL.
  - stdout capped at 32 KB, stderr at 16 KB (defaults). Tracked
    independently: audit.truncated_stdout, .truncated_stderr,
    .original_stdout_bytes, .original_stderr_bytes. Top-level
    .truncated is the OR.
  - audit row also carries: command, args (each summarized to 200
    chars), exit_code, signal, duration_ms, error code.

tools/__tests__/runCommand.test.ts: 66 tests covering:
  - tool disabled (no toolOpts; empty allowlist)
  - allowlist hit/miss, absolute-path basename matching
  - all 14 NEVER-LIST entries (one test each via it.each), absolute
    path of /bin/bash, env basename
  - 21 metacharacters individually + NUL byte + shell-injection attempt
  - python3 -c (both isolated and metachar-coincident cases), -m, stdin,
    no-positional, escapes jail, missing script, valid script,
    non-banned flags before script
  - exit code propagation (0, 1, ENOENT)
  - stdout truncation while stderr fits; stderr truncation while
    stdout fits; no-truncation case
  - timeout fires SIGTERM; stubborn process that ignores SIGTERM
    escalates to SIGKILL
  - audit row shape (command, args, duration); args >200 chars truncated
  - bad arg shapes (missing command, non-array args, non-string elements)

Full package suite: 564 tests passing, stable across 5 consecutive
runs of the runCommand suite (the timing tests in particular).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…klist

Phase 5 of V1 engine adapters. Builds the network-side of the V1 tool
surface. Uses node:http / node:https request() (not bare fetch) so a
custom net.LookupFunction validates the actual socket address — bare
fetch in Node can't intercept the connect-time DNS lookup, which is
the DNS-rebinding seam.

ipBlocklist.ts (192L):
  - checkIPv4 + checkIPv6 + parseIpLiteral, pure functions.
  - IPv4 ranges: 0/8, 10/8, 100.64/10 (CGNAT), 127/8, 169.254/16,
    172.16/12, 192.168/16, 224/4, 255.255.255.255/32.
  - IPv6: ::, ::1, fc00::/7, fe80::/10, ff00::/8, IPv4-mapped (::ffff:v4)
    validated through embedded v4 check.
  - Reason codes (blocked_loopback, blocked_private, blocked_cgnat,
    blocked_link_local, blocked_unique_local, blocked_multicast,
    blocked_unspecified, blocked_broadcast) flow through audit.error.
  - IPv6 parser handles "::" compression, zone-id suffix (%iface),
    and v4-embedded suffix in one pass.

webfetch.ts (496L):
  - Scheme gate: http: / https: only.
  - Pre-resolve via dns.lookup({all:true}) — uses OS resolver +
    /etc/hosts (matches what the socket-time lookup sees; resolve4/6
    miss /etc/hosts entries like localhost).
  - Custom net.LookupFunction re-validates the address at connect
    time, defense in depth against DNS rebinding.
  - Redirect loop: max 5 hops, same-scheme only, every redirect target
    re-validated from scratch.
  - Total wall-clock timeout (default 15s) covers the whole call
    including all redirects.
  - Body capped at 2 MB raw — the socket is destroyed on overflow,
    NOT buffered then truncated. Model-output truncation (default 64k
    chars) runs on top.
  - Content-Type whitelist (text/*, application/json/xml/yaml/atom/rss).
  - allowPrivate opt: default false. Allows private-network targets
    only when explicitly enabled per engine config.
  - All errors structured; never throws to caller.

Tests (49 total):
  - ipBlocklist: 23 tests, every IPv4/IPv6 block class + edges + IPv4-
    mapped IPv6 + zone-id suffix + parseIpLiteral.
  - webfetch: 26 tests against a fixture HTTP server.
    Happy path: text, JSON, HTML.
    Scheme rejection: file://, ftp://, gopher://.
    Content-type: rejects octet-stream.
    IP literal blocks (allowPrivate=false): 127.0.0.1, 10.0.0.1,
      192.168.1.1, 169.254.169.254 (AWS metadata!), 100.64.0.1,
      [::1], [fe80::1], [::ffff:127.0.0.1].
    Hostname: http://localhost/ blocks via DNS pre-resolve.
    Redirects: single safe redirect, redirect-to-private (chain
      captured, dial fails fast), scheme-change refused, file://
      refused, redirect loop hits limit at 6 hops.
    Byte cap: 3 MB response with 256 KB cap → audit.truncated=true,
      original_bytes >= 256 KB.
    Timeout: server holds connection → audit.error=timeout.
    Bad inputs: missing url, malformed URL, non-string url.

Full package suite: 613/613, stable across 3 consecutive runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rray defense

Two small follow-ups from the Phase 5 review pass on webfetch.

1. Export buildLookup + add direct unit tests (7) covering the
   address-validator path the http.request lookup option invokes:
     - strict mode rejects localhost (resolves to loopback),
       10.0.0.1 (literal private), ::1 (loopback), 169.254.169.254
       (link-local / AWS metadata)
     - permissive mode (allowPrivate=true) accepts the same inputs
     - public hostname probe (one.one.one.one); tolerates offline test
       env by no-op on ENOTFOUND/EAI_AGAIN
   The integration test in webfetch.test.ts already proves the chain
   end-to-end via fixture server requests; these tests pin the
   validator so a regression surfaces immediately, independent of
   HTTP server availability.

2. Defensive Location-header unwrap. Node's IncomingHttpHeaders types
   `location` as `string | string[]` even though HTTP spec says it
   is single-valued. `String([a,b])` would have comma-joined the
   array into a broken URL. Unwrap to first element when an array
   sneaks through.

Full package suite: 620/620.

Review pass also empirically confirmed (no code changes needed):
   - IP parsing handles fe80 full uncompressed, fec0:: deliberately
     NOT blocked (deprecated site-local), all 4 boundaries of
     fc00/fdff/febf/fec0 and ff00/ffff, uncompressed loopback +
     v4-mapped forms, zone-id suffix (Node 22 net.isIPv6 accepts
     %iface), all 4 CGNAT edges
   - Timer cleanup: req.on("close", clearTimeout) covers end/error/
     timeout/byte-cap paths; race between settle() and 'close' is
     safe because the timer callback is gated by `settled` flag
   - Byte cap aborts on the cap-crossing chunk: aborted=true,
     push partial chunk up to cap, req.destroy(), settle();
     subsequent in-flight chunks dropped via `if(aborted) return`
   - Content-type whitelist runs BEFORE the data listener attaches,
     so binary bodies never accumulate any bytes
   - currentScheme is fixed to initial scheme (intentional V1
     posture: no http→https upgrade across redirects)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 6 of V1 engine adapters. The three pieces wire together the
provider adapters (Phase 2), the filesystem tools (Phase 3+3a), the
runCommand tool (Phase 4), and webfetch (Phase 5) into a working
agent loop without any concrete engine wrapper yet (Phase 7).

tools/schemas.ts:
  JSON-schema definitions for read/write/edit/bash/webfetch in OpenAI
  function-calling format (Ollama-compatible). Descriptions kept terse —
  every token here lands in every prompt.

tools/index.ts:
  buildToolRegistry(toolsConfig) walks EngineToolsConfig.enabled and
  returns:
    - executors: Map<toolName, ToolExecutor>  (Map for fast dispatch)
    - schemas: ProviderToolDef[]               (in config order)
    - unknownRequested: string[]               (forward-compat names)
  Missing / empty enabled → text-only mode (empty registry, empty schemas).
  Unknown names recorded, not thrown — engine wrapper logs a warning.

engines/audit.ts:
  sanitizeArgsForAudit(args) walks the JSON tree:
    - Redacts keys matching api_key/authorization/token/secret/
      password/bearer/cookie (case-insensitive regexes)
    - Truncates string values to 200 chars
    - Caps recursion depth at 5
  buildAuditRow() produces an AuditRow whose ONLY keys are: toolName,
  argsSummary, durationMs, error, truncated, resultBytes, exitCode,
  httpStatus. Full tool output (stdout, stderr, body) is NEVER copied
  in — model already saw it in the conversation; logging it twice
  doubles storage and creates a leak surface for secrets the model
  observed.
  AuditLogger is an interface; the actual sink (sqlite write, log
  pipe, etc.) is injected by the engine wrapper in Phase 7.

engines/agentLoop.ts:
  runAgentLoop(opts) is the provider-agnostic loop. Error taxonomy
  enforced by distinct AgentLoopResult kinds:
    - "ok"             → terminal assistant message (no toolCalls)
    - "provider_error" → adapter threw (parse / transport / non-2xx)
    - "max_turns"      → maxTurns exhausted without terminal message
    - "timeout"        → wall-clock deadline crossed
  Tool execution errors stay model-visible: the executor's structured
  {ok:false, content, audit} result is fed back as a {role:"tool"}
  message and the loop continues. Unknown tool calls produce a
  synthetic unknown_tool result with the same shape. Tool executors
  that throw unexpectedly are caught and rewrapped as tool_exception
  results — they never abort the loop.
  Pre-call gates: wall-clock deadline is checked BEFORE every provider
  call AND BEFORE every individual tool call inside a multi-tool turn.
  Audit logger called once per tool invocation when configured;
  audit.record() failures are swallowed so audit-sink issues never
  break the loop itself.

Tests added (40 across 3 files):
  audit.test.ts (16):
    - secret-key redaction (api_key, Authorization, apiKey camelcase,
      password/token/secret/cookie, nested fields, arrays)
    - 200-char string truncation
    - depth-5 cap
    - buildAuditRow output is exactly the documented whitelist
    - bash exit_code captured, webfetch http_status captured
    - "FULL FILE CONTENTS" sentinel does NOT appear in serialized
      AuditRow (content-leak negative test)

  tools/__tests__/registry.test.ts (8):
    - text-only mode (undefined and [])
    - filtering, order preservation, dedup
    - unknown tool names recorded in unknownRequested without throw
    - all 5 known tools resolvable
    - schema shape sanity

  __tests__/agentLoop.test.ts (15):
    - text-only mode passes empty tools[] to provider
    - single tool call → terminal text
    - multi-tool turn (2 calls in one assistant message, both executed)
    - max_turns exhaustion at the configured limit
    - timeout before provider call (deadline expires between turns)
    - timeout before tool call (deadline expires mid multi-tool turn)
    - unknown tool feeds back unknown_tool JSON and loop continues
    - tool exception feeds back tool_exception JSON and loop continues
    - provider parse/transport error aborts with kind=provider_error
      at turn 0 (initial) and mid-loop
    - audit records one row per tool call including unknown/exception
    - audit sink failure does NOT break the loop
    - token + billed-model accounting across turns

Full package suite: 655/655, stable 3 consecutive runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… visibility)

Three findings from the Phase 6 review pass, all addressed before
Phase 7 wires the loop into live engines.

1. URL credential leak in audit.argsSummary (CRITICAL).
   Probe confirmed that URLs with ?api_key=, ?token=, ?password=,
   ?signature=, ?secret=, ?key= query parameters AND userinfo
   (https://user:pass@host) leaked into the audit row verbatim. Tool
   args are passed to sanitizeArgsForAudit which only redacted by
   key-NAME (e.g. an "api_key" object key); the URL string itself was
   stored untouched and only truncated at 200 chars — long tokens
   survived even truncation.
   Fix: redactUrl() runs on http(s) string values BEFORE truncation:
     - Strips username + password from userinfo
     - Walks searchParams, replaces values for keys matching
       SECRET_QUERY_PATTERNS (api[_-]?key, access[_-]?token, token,
       auth, authorization, secret, password, key, sig(nature)?)
     - Non-secret query params preserved as debug signal
   The key name itself is kept ("api_key=[redacted]") because its
   presence is a useful signal, just not its value.

2. AgentLoopResult missing durationMs.
   Wrappers can't populate EngineResult.durationMs without external
   timing, splitting responsibility for one obvious field. Added
   `durationMs: number` to AgentLoopBase so every result kind (ok,
   provider_error, max_turns, timeout) carries the loop's own
   wall-clock measurement.

3. Audit sink failures silently swallowed.
   safeAudit() previously had a bare `catch {}`. The comment claimed
   "the engine wrapper's logger will surface persistent audit-sink
   issues" but nothing in the loop actually forwarded the error.
   Fix: route audit failures through logger.warn with the tool name
   and underlying message. Still doesn't abort the loop.

Tests added (14):
  audit.test.ts (+8):
    - api_key, token, access_token, password, secret, signature, sig,
      key query-string variants
    - userinfo stripping
    - non-credential URL passthrough
    - URLs nested in arrays (matches webfetch redirect_chain shape)
    - long token redacted BEFORE truncation (regression for
      tail-survival)
    - malformed URL doesn't crash, returns untouched
  agentLoop.test.ts (+6):
    - durationMs present on each of ok/provider_error/max_turns/
      timeout result kinds; timeout's durationMs >= configured budget
    - audit sink throwing produces a logger.warn line carrying the
      tool name and underlying error message
    - repeated unknown-tool calls converge to max_turns cleanly with
      token accumulation across turns

Review confirmed clean by inspection (no code change needed):
  - Turn counting: one provider call = one turn (loop iteration
    bound by maxTurns)
  - Empty tools[] handling: both providers omit the tools field
    entirely when toolSchemas.length === 0 (providers/openai.ts +
    providers/ollama.ts both check .length > 0)
  - Token accumulation: provider usage added on every loop iteration
    including tool-followup calls

Full package suite: 669/669, stable across 3 runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rver registration

Wraps the provider adapters (Phase 2), filesystem tools (Phase 3+3a),
runCommand (Phase 4), webfetch (Phase 5), and agent loop (Phase 6+6a)
into two complete Engine implementations and wires them into the
gateway boot only when configured.

engines/ollama.ts (~170L):
  - Implements Engine interface.
  - Constructor validates config.url (throws if missing); reads
    process.env[config.authTokenEnvVar ?? "OLLAMA_TOKEN"] (optional).
  - Builds the tool registry from config.tools; warns via logger on
    unknown tool names rather than throwing.
  - Translates EngineToolsConfig.{bashAllowlist, bash, read, webfetch}
    into the runtime ToolExecutionContext.toolOpts shape the tool
    executors expect (e.g. bashAllowlist → toolOpts.bash.allowlist).
  - rejectUnsupported() is exported here and reused by OpenAIEngine;
    it returns a descriptive error string for resumeSessionId,
    mcpConfigPath, attachments, cliFlags BEFORE any provider HTTP call.
  - run() resolves model as opts.model || config.model || error;
    sessionId as opts.sessionId || randomUUID(); maps AgentLoopResult
    to EngineResult preserving turns, durationMs, cost (always 0 for
    Ollama via ollamaCostFor).

engines/openai.ts (~155L):
  - Same shape. Construction reads process.env[apiKeyEnvVar ??
    "OPENAI_API_KEY"]; throws if missing or empty.
  - Cost computed via openaiCostFor(billedModels[0], promptTokens,
    completionTokens). Returns undefined for unknown models so the
    cost_log row records NULL — plus a logger.warn surfaces the
    pricing gap for the weekly rollup.

gateway/server.ts:
  - Engines Map widened from
      Map<string, ClaudeEngine | CodexEngine | GeminiEngine>
    to Map<string, Engine> so HTTP-loop engines coexist with
    CLI-spawning ones. Process-killing paths (claudeEngine.killAll()
    etc.) keep their original typed references — no behavior change.
  - Ollama / OpenAI registered ONLY when config.engines.{name}
    exists; construction errors (missing url / apiKey) log a warning
    via logger.warn and SKIP registration rather than crashing the
    gateway boot. Health/route surfaces will reflect what's actually
    available.
  - Imports the engine classes lazily via dynamic import() so a
    misconfigured opt-in engine can't break gateway startup.

Tests added (28):
  Construction validation:
    - Ollama throws on empty url; succeeds with just url; reads
      OLLAMA_TOKEN by default; respects custom authTokenEnvVar.
    - OpenAI throws on missing/empty OPENAI_API_KEY; succeeds when
      set; respects custom apiKeyEnvVar.
    - Unknown tool names in config.tools.enabled → warn, not throw.
  Unsupported-feature rejection (with provider-call-count assertion
  that proves NO provider call happened):
    - resumeSessionId, mcpConfigPath, attachments, cliFlags each
      return EngineResult.error matching its specific reason.
    - Empty attachments / cliFlags arrays do NOT trip the gate.
  Model resolution:
    - opts.model takes precedence over config.model.
    - config.model is the fallback when opts.model is absent.
    - Neither set → EngineResult.error = "no model resolved".
  SessionId:
    - opts.sessionId preserved when present (both engines).
    - randomUUID() v4 generated when absent.
  Cost mapping:
    - Ollama always 0 regardless of token counts.
    - OpenAI computes correctly for known model.
    - OpenAI returns undefined (NOT 0) for unknown billed model.
  EngineResult shape:
    - Happy path carries result + cost + durationMs + numTurns; error
      undefined.
    - Provider error carries result="" + error="ollama: provider_error:
      <message>" + durationMs + numTurns + accumulated cost.

Existing fallback behavior unchanged: sessions.fallbackEngine="codex"
type is preserved; the Map widening only relaxed the entry-value
type, not the cron/manager lookup pathway. Process-killing engine
references (claudeEngine, codexEngine) keep their concrete typed
references throughout shutdown.

Full package suite: 697/697, stable 3 consecutive runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tup on construction failure

Three findings from the Phase 7 review pass.

1. /api/status no longer hardcodes the engines list.
   The endpoint previously emitted a static object with claude/codex/
   gemini keys, hardcoded inline. After Phase 7 wired in opt-in HTTP
   engines (ollama, openai), the status payload would NEVER show them
   even when they were live. Now /api/status iterates the engines Map
   held on ApiContext and emits one entry per actually-registered
   engine. The behavior delta the operator sees:
     - engines.ollama appears iff config.engines.ollama is set AND
       construction succeeded
     - engines.openai appears iff config.engines.openai is set AND
       OPENAI_API_KEY (or configured env var) is non-empty
     - engines.gemini still only appears when the gemini block is
       present in config (same as before)

   ApiContext gains `engines: Map<string, Engine>` so the api layer
   has the truth-of-registration. Server.ts populates it during
   boot. Only server.ts constructs ApiContext (verified by grep);
   no other call sites to update.

2. Opt-in engine construction now FAILS gateway boot loudly.
   Previously a missing apiKey, malformed url, or any engine
   construction error was caught and logged with logger.warn, and
   the gateway continued without that engine registered. Per the
   review, declaring engines.<name> in config is a commitment —
   silent degradation lets misconfigurations go undiagnosed.

   The new policy: if config.engines.<name> is present and
   construction throws, server.ts rethrows with operator-facing
   guidance:
     Error: engines.ollama is declared but engine construction
     failed: ollama: config.url is required ...
     → Fix the config OR remove the engines.ollama block to opt out.

   To opt out cleanly: remove the engines.<name> block entirely. To
   keep it enabled: ensure the env var / url is valid before boot.

3. Route-failure path documented (no code change).
   When a cron declares engine="ollama" but the engine isn't in the
   Map (e.g. unconfigured), the existing sessions/manager.ts:230
   path logs an error and delivers `Error: engine "ollama" not
   available.` via the cron's configured delivery connector. The
   operator sees the error in Telegram/WhatsApp / wherever the cron
   delivers. Cron run-log records the route() return value, which
   does not include the async runSession() error, so the run-log
   shows "success" — this is pre-existing behavior across all
   engines and out of scope for V1 to change.

Full package suite: 697/697 (no regressions; api.ts change is
a thin iteration, server.ts change preserves existing tests).

Other review findings, confirmed clean by inspection:
  - Config loading env interpolation: engines read process.env
    directly in their constructor (no separate loader step needed —
    the config carries an apiKeyEnvVar name, not the secret value).
  - Tool config defaults: undefined tools.enabled → empty registry
    (text-only). bashAllowlist not auto-populated when bash isn't
    enabled; an explicitly empty bashAllowlist means "disabled" per
    the V1 deny-by-default posture.
  - sessions.fallbackEngine type still narrows to "codex" (existing
    pre-Phase-7 behavior). Out-of-scope for V1; will document as
    "existing fallback behavior, not provider fallback" in Phase 8.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…xture

Final V1 phase: documentation + a structural-validation fixture
for the first cron migration target.

engines/README.md (~230 lines):
  - Engine roster + compatibility matrix (V1 supports / doesn't)
  - Explicit V1 limitations called out per cron's docs
  - Two doc points called out by the review:
      1. STARTUP CONTRACT: declaring engines.ollama / engines.openai
         is a commitment. Bad config means the gateway will not boot.
         Remove the block to opt out cleanly.
      2. CRON RUN-LOG ASYNC LIMITATION: route() returns synchronously
         BEFORE runSession()'s async outcome. Run-log "success" does
         NOT prove the session actually succeeded. Migration
         monitoring must use the SHADOW's delivered outputs +
         sessions table last_error + gateway logs, NOT the cron
         run-log alone.
  - Config example covering ollama + openai blocks, env-var sourcing,
    tool registry filtering, deny-by-default bash allowlist semantics,
    webfetch allowPrivate posture.
  - Cost reporting: ollama=0 always, openai=undefined-not-0 on unknown
    pricing.
  - Audit row shape: explicit "never contains tool output content".
  - 7-step migration recipe + rollback (one JSON edit).
  - Operational visibility: /api/status iterates registered engines;
    cron route failure path via existing manager.ts:230; sessions.
    fallbackEngine clarified as Claude rate-limit fallback (existing
    behavior), NOT provider fallback.
  - V1 known limitations list + file map.

engines/__tests__/fixtures/report-url/:
  - input.txt: representative real-shape input — Pahang MB statement
    responding to flood-relief protests, political response surface,
    >1h old, multi-stakeholder content. Calibrated to clearly classify
    as as=issue + tenants=[pahang].
  - expected-classification.schema.json: documents the required output
    structure (fields, types, enum values) WITHOUT pinning prose.
    Marked with fixture_intent comment so future maintainers
    understand what the fixture is for.

engines/__tests__/report-url.fixture.test.ts (2 tests):
  - Happy path: agent loop with mock provider producing well-formed
    classification JSON. Asserts kind=ok, parses as JSON object,
    required fields present, `as` in enum, `confidence` in [0,1],
    `tenants` non-empty string array. Deliberately does NOT assert
    exact prose of `reason`.
  - Structural guard: bad model output (array instead of object)
    is detected by the parser and surfaces as null. This is the
    line of defense BEFORE downstream writes warroom_my.issue.

Full package suite: 699/699.

PR #13 readiness summary:
  - 10 commits across 8 phases + 4 review-pass follow-ups
  - ~3,560 LOC net production code
  - ~3,000 LOC tests
  - 699 tests, stable across multiple suite runs
  - Type-check clean for all touched files (pre-existing migrate.ts
    TS2769 unchanged from main)
  - Three review passes completed (Phase 3a, 6a, 7a) with all
    findings closed

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…crets

V1 final follow-up. The config fields engines.openai.apiKeyEnvVar and
engines.ollama.authTokenEnvVar take an env var NAME (e.g.
"OPENAI_API_KEY"), not the secret value. Pasting the key directly
was a documented foot-gun: the engine would do process.env["sk-..."]
which is always undefined → engine constructs unauthenticated → first
provider call fails with a confusing 401.

Added assertEnvVarName() (exported from ollama.ts so both wrappers
share one validator) that requires the POSIX env var name regex
^[A-Z_][A-Z0-9_]*$. Constructors throw at boot with operator-facing
guidance pointing at the env var name vs value distinction. Error
message never echoes the value — masks to length only — so a
misconfigured config can't leak the secret into gateway logs.

Tests added (8 across both wrappers):
  - rejects authTokenEnvVar that looks like a token value
    (e.g. "sk-abc-XYZ-123" — hyphens disqualify)
  - rejects lowercase env var names
  - rejects leading-digit names ("1KEY")
  - rejects empty strings with a separate diagnostic
  - error message never contains the raw value (sentinel string
    "sk-LEAK-DO-NOT-LOG-..." verified absent)
  - same coverage for OpenAI apiKeyEnvVar

README.md release-notes section explains the convention, points at
~/.jinn/.env as the recommended secret source, and documents the
masking guarantee on error messages.

Full package suite: 707/707.

PR #13 is now ready for review/merge per Phase 8 boundary.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two issues surfaced by GitHub Actions on PR #13 head commit f01a5a1.

1. cli/migrate.ts:190 TS2769.
   Pre-Phase-1 the engines.default union was narrowly typed to
   "claude" | "codex" | "gemini" — all three have `bin: string`
   required. The Phase 1 widening to BuiltInEngineName added ollama
   and openai which have no `bin` (they're HTTP-only), so
   `config.engines[defaultEngine].bin` widened to `string | undefined`,
   breaking execFileSync(file: string, ...).

   I claimed this was pre-existing in the Phase 1 commit message;
   that was wrong. The widening is what introduced it. CI catches what
   my local check missed (origin/main pre-PR-12 didn't have CI
   configured, so I didn't see it run).

   Fix: migrate is a CLI-spawn operation that only makes sense for
   claude/codex/gemini engines. If the operator's default is an
   HTTP engine, fall back to claude for the migration step. Coalesce
   `cliEngineBin: string = engineConfig.bin ?? config.engines.claude.bin`
   so TS narrows correctly.

2. fs-tools-jail.test.ts: "write(.) does not mkdir parent-of-cwd" was
   asserting `parentBefore.sort() === parentAfter.sort()` on
   `path.dirname(jail)`. On Linux CI that's `/tmp`, which contains
   transient system mounts (`.ICE-unix`, `.X11-unix`, etc.) — the
   listing flapped between `readdir` calls and the test failed
   inconsistently.

   Fix: the is_cwd_dir guard short-circuits BEFORE the recursive
   mkdir runs, so we don't need to inspect the parent's listing
   to prove the guard fired. The audit code assertion alone is
   sufficient. Also verify the jail dir itself stayed a directory
   (didn't get clobbered into a file).

Full package suite: 707/707, including the corrected test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@nyem69 nyem69 marked this pull request as ready for review May 18, 2026 04:27
@nyem69 nyem69 merged commit 0fe584f into main May 18, 2026
3 checks passed
@nyem69 nyem69 deleted the feat/engines-ollama-openai branch May 18, 2026 04:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant