Skip to content

Release 11.26.0#547

Merged
kaihaase merged 1 commit into
mainfrom
develop
May 31, 2026
Merged

Release 11.26.0#547
kaihaase merged 1 commit into
mainfrom
develop

Conversation

@kaihaase
Copy link
Copy Markdown
Member

AI Assistant Module: a fully opt-in, multi-tenant-capable, OWASP-aligned assistance system with DB-backed LLM connections, role-filtered tool registry, governed safety mechanisms, and MCP server integration — introduced on branch feature/ai-module.

Core Features:

  • DB-backed LLM connections with AES-256-GCM encrypted API keys (admin CRUD, hasApiKey output, production-secret guard)
  • Provider abstraction (OpenAiCompatibleProvider, ClaudeCliProvider) with auto-detection of supportsNativeTools / supportsJsonResponse / contextWindow
  • Prioritized connection-resolution chain (global default → tenant default → user default → client selection → tenant/admin enforced)
  • Role-filtered tool registry with mutating / destructive flags and pre-flight authorize() hooks
  • Plan mode with all-or-nothing pre-flight authorization of every planned step
  • Confirmation policy (mutating.default/enforced, destructive always-confirm, persistent tool grants "remember my choice")
  • Scoped tool policies (deny / ask / allow against tool arguments via regex)
  • Lifecycle hooks (PreToolUse / PostToolUse / SessionStart / Stop)
  • Token budgets per user AND per tenant (day / month / none reset windows, cumulative usage report, HTTP 429 with i18n translation)
  • Multi-turn conversations with $push-based message persistence and capped retention
  • SSE streaming endpoint (POST /ai/stream) plus REST and GraphQL single-shot
  • Multi-modal attachments (image URLs / dataUrls in prompts)
  • Admin-defined named agent modes with allowedTools filter and prompt addendum
  • Self-optimizing prompts (admin-editable tenant-scoped slots with override/reset, fragment-based builder, soft-delete via enabled: false)
  • Governed learning loop (prompt hints from tool failures, admin-approval-gated or autoApply)
  • User-facing prompt templates (scope: user|tenant, owner-only mutations)
  • Runtime placeholder registry ({{userId}}, {{roles}}, {{tools}}, …, project-specific placeholders dynamically registrable)
  • LLM-driven context compaction on context-window overflow with hard-trim fallback
  • Deferred tool schemas plus built-in search_tools meta-tool for large tool catalogs
  • Built-in ask_user_question tool for interactive clarification
  • Audit logging into aiInteractions (admin-readable, prerequisite for budgets)
  • MCP server (/ai/mcp Streamable HTTP) with lazy-loaded @modelcontextprotocol/sdk and 503 fallback when the SDK is missing
  • OAuth 2.1 for MCP clients (HMAC-SHA256 access tokens with timingSafeEqual, PKCE S256-only, refresh-token rotation bound to client ID, dynamic client registration with persisted client_secret)
  • SSRF allowlist for connection base URLs (ai.allowedBaseUrlHosts)
  • Per-user rate limiting (max / windowSeconds)
  • Three-layer security model (@Restricted / @Roles / securityCheck) with global stripping of apiKeyEncrypted and all token fields
  • Multi-tenancy isolation on slots, prompts, hints, budgets, and conversations
  • Full override hooks for every collaborator via ICoreModuleOverrides.ai

* feat(ai): add AI assistant module foundation (connections, tools, orchestrator)

Adds an extensible AI-assistant layer to the core:
- DB-backed LLM connections (CoreAiConnection) with AES-256-GCM-encrypted API
  keys (admin CRUD; key never returned, only hasApiKey)
- Provider abstraction (ILlmProvider) + fetch-based OpenAiCompatibleProvider
  (mittwald/OpenAI-compatible) + LlmProviderFactory
- Global AiToolRegistry with role-filtered, self-registering tools (IAiTool/AiTool)
- CoreAiService orchestrator with emulated tool calling (mittwald has no native
  tool calling), rate-limit + audit hooks
- GraphQL resolver + REST controller (aiPrompt + connection CRUD)
- CoreAiModule.forRoot (autoRegister + overrides), `ai` config in IServerOptions,
  CoreModule wiring, exports, FRAMEWORK-API regen
- Example User tools + AiToolsModule in src/server
- Docs (README, INTEGRATION-CHECKLIST, configurable-features) + AI-MODULE-PLAN.md
- Tests: unit 7/7, e2e 6/6 (full suite 85/85)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(ai): persist prompt runs as audit records (Phase 3)

- CoreAiInteraction model (aiInteractions, admin-only) + CoreAiInteractionService
  with system-internal record()
- CoreAiService.audit() persists when ai.audit is enabled (optional injection,
  never breaks a prompt response)
- Admin read endpoints (findAiInteractions/getAiInteraction) on resolver + controller
- ai.audit config flag; model/service wired into CoreAiModule (+ override option)
- Test: prompt persists an audit record (e2e 7/7)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(ai): multi-turn conversations with persisted message history (Phase 4)

- CoreAiConversation model (aiConversations) + CoreAiMessage subdoc; owner-scoped
  via securityCheck (creator/admin only)
- CoreAiConversationService with appendMessage() ($push, never round-trips the
  subdocument array through update())
- CoreAiService loads prior turns into the LLM context when conversationId is
  given and appends the user+assistant turns after the run
- Owner-scoped CRUD endpoints (create/find/get/delete) on resolver + controller
- Model/service wired into CoreAiModule (+ override option), exports
- Test: 2-turn conversation keeps context and persists 4 messages (e2e 8/8)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(ai): stream prompt answers via SSE (Phase 5)

- CoreAiService.promptStream(): emits action events, then the answer as token
  chunks, then a final event (reuses prompt() so history/audit/persistence apply)
- AiStreamEvent type; POST /ai/stream SSE endpoint (raw @Res, role-guarded)
- Test: streaming yields action/token/final, tokens concatenate to the answer
  (unit 8/8, e2e 8/8)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(ai): require confirmation for destructive tool actions (Phase 6)

- IAiTool.destructive flag; CoreAiPromptInput.confirm
- CoreAiResponse.requiresConfirmation + pendingActions
- Orchestrator halts on a destructive tool call until the prompt is re-sent with
  confirm: true (no execution, no conversation persistence while pending)
- Example destructive tool delete_user (admin) added to the reference tools
- Test: destructive tool blocked until confirmed, then executes (unit 9/9)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(ai): expose tool registry as an MCP server (Phase 7)

- CoreAiMcpService: role-filtered mcpListTools()/mcpCallTool() (testable logic)
  and lazy createServer() using the low-level MCP SDK Server with JSON-schema
  tool definitions (no zod conversion)
- CoreAiMcpController: Streamable HTTP at /ai/mcp (POST/GET/DELETE), per-session
  McpServer bound to the authenticated user (Bearer via @CurrentUser), session
  map with eviction, MCP-style 401 + WWW-Authenticate
- ai.mcp config flag; controller registered only when enabled; @modelcontextprotocol/sdk
  added (fixed 1.29.0, lazy-loaded)
- Tests: MCP role gating + permitted/forbidden calls (unit 10/10); MCP 401 over
  HTTP + app boot with MCP (e2e 9/9)
- Docs (README, configurable-features) + AI-MODULE-PLAN updated

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(ai): keep JSON scalar available for bare GraphQL apps; isolate prompt e2e

- Package types using the JSON scalar (AI models) leak into the global GraphQL
  type registry, so any schema build needs the JSON scalar provided. Real apps
  provide it via ServerModule; the framework-internal error-code-scenarios bare
  apps now provide it too (one JSON provider per app — no consumer impact).
- Make the AI prompt e2e deterministic under parallel runs by exercising a
  registered test tool instead of find_users (shared User collection).

Full e2e suite green: 61 files, 1770 tests.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(ai): plan mode + confirmation policy + client metadata + prompt enrichment (Phases 9-12)

Phase 9 — Plan mode (Goals 1/2/5): input.mode='plan' produces a full plan, then
pre-flight authorizes ALL steps (registry role filter + optional IAiTool.authorize()
dry-run) BEFORE executing anything. If any step is not permitted, NOTHING runs and a
translated (de/en) error with deniedActions is returned. Otherwise steps run in order.

Phase 10 — Confirmation policy for mutating actions: IAiTool.mutating; ai.confirmation.mutating
{ default, enforced }; client override via input.requireConfirmation (ignored when enforced);
destructive tools always require confirmation. Applies to both auto and plan modes.

Phase 11 — Client metadata: input.metadata (URL, navigation, console logs) injected as a
clearly-delimited, size-capped, UNTRUSTED context message (prompt-injection hardening).

Phase 12 — Prompt enrichment: system prompt now includes the user's roles + available tools
and optional system documentation (ai.documentation / overridable getDocumentation()).

Refactor: prompt() dispatches to runAuto/runPlan via shared prepareRun; extracted helpers
(authorizeCall, confirmationRequiredFor, translate, appendClientContext, …).
CoreAiResponse gains plan/denied/deniedActions. Tests: unit 19/19, ai e2e 9/9.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(ai): per-user daily cost/budget enforcement (Phase 8)

- ai.budget { maxPromptsPerDay, maxTokensPerDay } enforced in prepareRun before any
  LLM call; exceeding it aborts with HTTP 429 + translated message
- CoreAiInteractionService.usageSince() aggregates today's prompts/tokens per user
- ai.defaultMode lets admins default to plan mode; ai.confirmation/documentation typed
- Tests: budget block + under-budget pass (unit 21/21)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(ai): MCP full-handshake test + robust MCP user resolution (Phase 7b.1)

- CoreAiMcpController resolves the user via req.user and, as a fallback, by
  verifying the Bearer token directly (BetterAuthTokenService) — so MCP works even
  though the S_EVERYONE guard does not populate the user
- Full MCP protocol test over the SDK in-memory transport: initialize handshake →
  tools/list (role-filtered) → tools/call execution (unit 22/22)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(ai): MCP OAuth 2.1 security core + provider + mount helper (Phase 7b.2)

- CoreAiMcpOAuthService: HMAC-signed access tokens (constant-time verify), PKCE S256
  verification, MongoDB-backed stores (clients/codes-TTL/refresh), loadUser, and
  buildOAuthProvider() implementing the SDK OAuthServerProvider (clients store, PKCE
  challenge lookup, code/refresh exchange, access-token verification)
- mountAiMcpOAuth(app) helper to mount mcpAuthRouter in main.ts (lazy SDK)
- MCP controller now also accepts OAuth access tokens when ai.mcp.oauth is enabled
- ai.mcp gains { oauth, oauthSecret }; interactive consent is overridable
  (authorizeConsent) and documented for consumer main.ts integration
- Tests: token roundtrip/tamper/expiry + PKCE S256 (unit 26/26)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(ai): finalize AI module docs; remove temporary implementation plan

- README: plan mode, confirmation policy, client metadata, prompt enrichment,
  budget, and MCP OAuth 2.1 sections
- INTEGRATION-CHECKLIST: advanced config + main.ts OAuth mounting step
- configurable-features: full ai.* config reference
- Remove AI-MODULE-PLAN.md (all backend phases done; full suite 1786 tests green)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(ai): per-user/tenant token budgets with defaults + usage reporting (Phase 13)

- CoreAiBudgetLimit model (aiBudgetLimits, admin CRUD) for per-user/per-tenant
  limit overrides; CoreAiBudgetService resolves override → ai.budget default →
  unlimited (0/missing = unlimited). period day/month/none with resetAt.
- ai.budget restructured: { period, user:{maxTokens,maxPrompts}, tenant:{...} }.
- Enforcement (user OR tenant) before the run → HTTP 429 + translated message;
  usage read via a read-only native count over aiInteractions (tenant filter).
- tenantId captured on aiInteractions (tenant plugin) for per-tenant accounting.
- Every response carries a compact `budget` summary (promptTokens, usedTokens,
  remainingTokens, resetAt); full breakdown via aiUsage query / GET /ai/usage.
- Admin budget-limit CRUD + aiUsage endpoints (resolver + controller); module
  wiring, exports, docs. Replaces the flat Phase-8 budget.
- Tests: unit (resolve/assert/summary/usage-info) + e2e (limit CRUD, response
  budget, 429 enforcement, aiUsage). Full suite 1792 green.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* refactor(ai): make the module fully provider-agnostic (capabilities, no vendor names)

- ILlmProvider declares LlmCapabilities { nativeTools, jsonResponse, systemPrompt }
  (replaces supportsNativeTools); orchestrator compensates across all gradations.
- Capabilities configured per connection (supportsNativeTools, supportsJsonResponse);
  OpenAiCompatibleProvider derives them and sends native tools / response_format only
  when supported.
- Removed all concrete vendor/runtime names from code; docs neutralized to describe
  the OpenAI-compatible API shape (protocol, not a vendor) + capability gradations.
- config.env example seed genericized (AI_BASE_URL/AI_API_KEY, only seeded when set).
- Tests + docs updated. Unit 31/31, ai e2e 10/10.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* AI module: DB-backed connection resolution chain (provider-agnostic)

Adds a prioritized, fully overridable connection-resolution chain so projects
with multiple LLM connections can flexibly pick one per request. No connections
→ AI handling is disabled (translated "unavailable" response); exactly one → it
is the implicit default; multiple → resolved via 8 ascending-priority layers.

Resolution order (later overrides earlier):
  1 global default (isDefault)        — soft (must be available to tenant)
  2 tenant default (preference)       — soft
  3 user default (preference)         — soft
  4 client selection (input)          — soft
  5 tenant-enforced (preference)      — hard (mandatory, wins regardless)
  6 admin-enforced global (flag)      — hard
  7 admin-enforced per tenant (flag)  — hard
  8 code override (serviceOptions)    — hard (deliberate, trusted top layer)

Each layer is an overridable protected method; resolutionLayers() can be
reordered/replaced. Availability is restrictable per tenant via Connection
tenantIds (empty = all tenants).

New:
- CoreAiConnectionResolverService (the chain; overridable per project)
- CoreAiConnectionPreference model/input/service (tenant/user defaults +
  tenant-enforced, unique per (scope, refId))
- CoreAiAvailableConnection model (non-sensitive list with selected/locked flags)
- Connection fields: tenantIds, enforced, enforcedTenantIds
- Endpoints: aiAvailableConnections, aiSetUserConnection (S_USER self-service),
  admin preference CRUD (GraphQL + REST under /ai/connections/*)

Orchestrator now resolves the connection via the chain (falls back to the plain
connection service when no resolver is wired), returns a denied response when no
connection is usable, and honors the serviceOptions._aiConnectionId code override.

Module wiring, ICoreModuleOverrides.ai (connectionResolver, preferenceService),
barrel + top-level exports, README/INTEGRATION-CHECKLIST/configurable-features
docs, and FRAMEWORK-API.md updated.

Tests: 14 unit (resolution chain incl. each layer, availability filtering,
one=default, none=disabled, subclass override) + 5 e2e (per-tenant restriction,
self-service validation, tenant-enforced lock, disabled response, endpoint auth).
Full suite green (1811 tests).

Removes the temporary AI-MODULE-PLAN.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* AI MCP: silence false-positive prefer-add-event-listener lint warning

The MCP SDK transport exposes `onclose` as a callback property, not a DOM
EventTarget, so addEventListener does not apply. Add a targeted inline disable.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* AI connection resolution: robustness, validation, dedupe & cleanup

Four follow-up optimizations to the connection-resolution chain:

1. Robustness — the chain now drops a selection that points to a deleted/disabled
   connection (orphaned enforced preference or stale code override) with a warn log
   and degrades to the fallback, instead of returning a dead id that made
   connectionService.resolve() throw a 404 mid-prompt.

2. Admin validation — CoreAiConnectionResolverService.setPreference() verifies the
   connection exists and is usable before persisting; the admin GraphQL/REST
   preference endpoints route through it (fail-fast instead of a dangling preference).

3. Performance — tenantDefault (layer 2) and tenantEnforced (layer 5) now share a
   single tenant-preference DB read per resolution (memoized on the ctx via WeakMap).

4. Cleanup — deleting a connection removes preferences pointing to it
   (PreferenceService.deleteByConnectionId + ConnectionService.delete override with an
   @Optional preference service; best-effort, never fails the delete). No DI cycle:
   resolver → {connection, preference}; connection → preference; preference → none.

Tests: +4 unit (P1 stale enforced + stale code override degrade, P2 setPreference
validation, P3 single tenant query) and +2 e2e (P4 preference cleanup on delete,
admin setPreference validation). Full suite green (1817 tests). README updated.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* AI module: address review findings (perf, security, docs, tests)

Implements all optimizations from the multi-agent review of feature/ai-module.

Performance:
- getUsage() now sums prompts/tokens server-side via a $group aggregation instead
  of loading every aiInteractions doc per prompt
- Added compound indexes { userId, createdAt } and { tenantId, createdAt } on
  aiInteractions for the budget period query
- buildSummary() skips the usage aggregation for unlimited users (no finite limit)
- Conversation history loads via a lean, projected, $slice-capped read
  (loadRecentMessages) instead of a hydrated get() running the full process()
  pipeline over the whole messages array
- appendMessage() caps the messages array with $slice (-500)

Security:
- Adopted the ErrorCode registry for all thrown AI exceptions; added the
  LTNS_0600–LTNS_0608 AI error codes (de/en translated)
- Added enum validation (IsIn) for mode / scope / period and @MaxLength on prompt
- CoreAiConnectionPreference model tightened from S_USER to ADMIN
- CheckSecurityInterceptor now MERGES (union) project secretFields with the
  framework defaults so a custom list can't drop password/apiKeyEncrypted/etc.
- Optional SSRF allowlist ai.allowedBaseUrlHosts (opt-in; local providers like
  Ollama still work by default)
- ai-crypto getKey() and provider mapNativeToolCalls() are now protected (overridable)

Docs:
- Removed all residual vendor names from shipped code (provider-agnostic)
- Reconciled the AI override fields across ICoreModuleOverrides.ai (now 10),
  configurable-features.md, and the README override table
- .env.example: added AI env vars incl. NSC__AI__ENCRYPTION_SECRET
- FRAMEWORK-API generator now emits IAi / IAiRateLimit / IAiDefaultConnection
- REQUEST-LIFECYCLE.md documents the AI/MCP/SSE/OAuth entry points
- New migration guide migration-guides/11.25.x-to-11.26.0.md
- Fixed the MCP controller JSDoc (resolveUser, not @CurrentUser) + provider chat() JSDoc

Tests:
- Unit: OpenAiCompatibleProvider (capabilities, mapNativeToolCalls, transport/HTTP
  error mapping, allowlist enforcement) + OAuth buildOAuthProvider wiring
- E2e: HTTP secret-exposure (apiKeyEncrypted never returned), admin 403 for a
  non-admin, authenticated self-service 200, SSE framing, shipped get_user tool
  ownership (S_SELF) and admin-only delete_user visibility

Full `pnpm run check` green (audit, format, lint, tests, build, server start).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* AI module: capability auto-detection + review remediation

Auto-detection of JSON / native-tool support (provider-agnostic). Connection flags
supportsJsonResponse / supportsNativeTools are now OPTIONAL: undefined = auto-detect,
explicit true/false = authoritative (never probed). Removed their mongoose default so
"not set" stays distinguishable.

Detection (A + B, sharing one probe path):
- A (eager): on create with an undefined flag, the endpoint is probed once and the
  result persisted. Best-effort — never fails the create.
- B (lazy): if a flag is still undefined at prompt time, the orchestrator probes once
  and persists; until then the safe emulated baseline applies.
- On demand: admin detectAiConnectionCapabilities / POST /ai/connections/:id/detect-capabilities.
Probe (OpenAiCompatibleProvider.detectCapabilities, optional on ILlmProvider):
response_format: json_object (2xx → JSON) and a trivial tool with tool_choice:'required'
(2xx WITH tool_calls → native tools; 4xx / silent-ignore → unsupported).
config.env seed only sets the flags when AI_SUPPORTS_* env vars are provided.

Review remediation catalog (all 6):
1. GraphQL resolver tests (admin findAiConnections happy-path + non-admin Access-denied)
2. get_user ownership denial now asserts a typed 401/403 error (was rejects.toBeDefined)
3. Dropped the redundant single-field index on aiInteractions.userId (compound covers it)
4. Removed the duplicate IsNotEmpty() on prompt (auto-applied for required fields)
5. assertWithinBudget(language) documented as deprecated/unused (kept for compat)
6. Added a mountAiMcpOAuth mount-wiring unit test

Tests: +7 unit (detectCapabilities x3, mount, GraphQL coverage) and +6 e2e (eager/
explicit/endpoint/403 detection, GraphQL admin+denial). README, IAiDefaultConnection
JSDoc, and the migration guide document auto-detection. Full `pnpm run check` green.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(review): persist lt-dev review-agent learnings

Durable knowledge captured by the lt-dev review agents during the AI-module
reviews (backend, docs, performance, security, test reviewers): e.g. AI secret
stripping, FRAMEWORK-API generator allowlist, and the doc surfaces a configurable
feature must touch. Consistent with the already-tracked agent-memory structure.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(ai): harden emulated tool-calling prompt against fabricated execution

In emulated tool calling (providers without native function calling), weak models
sometimes reply with a natural-language "action done" message without ever emitting
the tool_calls request — so the tool never actually runs and the confirmation gate
is bypassed (a false positive, never a real side-effect: execution can only happen
via executeToolCall() + the gate).

Add an explicit instruction to the emulated tool protocol: never claim to have
executed/performed/deleted/updated/created anything without first emitting a
tool_calls request and receiving its TOOL_RESULTS. Document the residual limitation
(and the intact security guarantee) in the module README, recommending native
tool-calling backends for action-heavy workflows.

Observed during real fullstack E2E testing against a live OpenAI-compatible
endpoint without native tool support. Full e2e suite green (1839 tests).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* style(ai): oxfmt the emulated-mode limitation note in README

Format-only follow-up to 786aee3 — normalize markdown emphasis (oxfmt). No content change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(ai): add opt-in ClaudeCliProvider (Claude Code CLI as an LLM backend)

Proves the module connects to all three backend kinds — external (hosted
OpenAI-compatible), local (e.g. Ollama via the same provider) and the local
Claude Code CLI — via the existing ILlmProvider extension point.

ClaudeCliProvider shells out to `claude -p --output-format json` and parses the
single result + usage. Security model:
- `--tools ""` disables ALL of Claude Code's own tools — it is a pure text
  generator, so it cannot read files, run shell commands, or reach the network on
  its own. Tool calling is emulated (the orchestrator executes tools itself via
  CrudService with the caller's permissions).
- `spawn` with an argument array (never a shell) — prompt content can't be
  interpreted as a command. Conversation is piped via stdin.
- Runs in tmpdir (no CLAUDE.md/settings auto-discovery), `--system-prompt`
  replaces Claude Code's default agent prompt, `--no-session-persistence`, and a
  timeout that kills the child on overrun. Optional `ai.claudeCli` config
  (bin/extraArgs/maxBudgetUsd); `apiKey` is forwarded as ANTHROPIC_API_KEY.

Opt-in (not auto-registered): register via
`factory.registerBuilder('claude-cli', (c) => new ClaudeCliProvider(c))`.
Exported from the AI module barrel + top-level index. README documents external /
local / CLI connection recipes. +7 unit tests (capabilities, tool-free argv,
transcript flattening, result/usage parse, error + transport mapping, factory
wiring); live-verified against the real CLI. Full e2e suite green (1846 tests).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(ai): never leak a bare tool-protocol wrapper as the final answer

In emulated tool calling a (weaker / JSON-mode) model sometimes replies with a bare
protocol wrapper like `{"tool_calls":[]}` instead of a user answer. The orchestrator
treated that as the final text and surfaced the raw JSON to the user.

Now, when the response parses to a protocol-shaped object with no `final`:
- nudge the model once ("reply with your final answer as {\"final\":\"…\"}") and
  continue the loop, and
- if it still returns only a `tool_calls`/`final` wrapper, drop it so the generic
  "could not produce a final answer" fallback applies instead of leaking JSON.

Found during real fullstack E2E with a local Ollama model (qwen2.5) in JSON mode.
+2 unit tests (nudge recovers a final answer; bare wrapper never surfaces). Full
e2e suite green (1848 tests).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(ai): fail loud in prod on missing secrets + boot self-check for stale keys

Hardening to match the existing email/cookie production guards. Previously a missing
AI encryption secret only logged a warning and silently used a public, insecure
development default — you could ship insecure key storage to production.

- AiCryptoService.assertProductionSafe() (run in onModuleInit): throws at boot in
  production/staging when AI is active but no secret is resolvable
  (ai.encryptionSecret / NSC__AI__ENCRYPTION_SECRET / SECRETS_ENCRYPTION_KEY).
  Non-prod keeps the warn-only dev default. getKey() now shares resolveSecret().
- CoreAiMcpOAuthService.assertProductionSafe() (onModuleInit): same guard for the
  OAuth signing secret, but only when ai.mcp.oauth is enabled.
- CoreAiConnectionService: the defaultConnection seed now WARNS when configured
  incompletely (missing baseUrl/model/name) instead of skipping silently; plus a
  best-effort boot self-check that logs which connections' stored API keys can no
  longer be decrypted (e.g. after a secret rotation) — actionable at boot instead of
  failing deep in a request. Never throws.

Behavior change (warn → throw in prod/staging): documented in the AI
INTEGRATION-CHECKLIST and the 11.26.0 migration guide. +2 unit tests (both guards
across prod/staging/local × secret/no-secret × oauth on/off). Full e2e green (1850).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(ai): extract emulated tool calls even when the model appends trailing text

Emulated tool-call extraction used a naive first-`{` … last-`}` slice. A model that
keeps writing after its JSON — e.g. the Claude CLI emits a valid
`{"tool_calls":[…]}` and then a self-hallucinated `TOOL_RESULTS:` block in the same
turn — produced an unparseable slice, so the tool call was missed entirely: the tool
never ran and the raw JSON leaked as the final answer. This broke function calling on
any backend that doesn't stop cleanly after the JSON.

- extractJsonObject() now prefers the first *brace-balanced* object (string-aware),
  falling back to the old slice. New protected firstBalancedJson() helper.
- Emulated tool protocol now instructs the model to STOP after tool_calls and not
  write the results itself.

Verified end-to-end through the orchestrator with the real Claude CLI: server_time is
now actually executed (2 iterations, correct final answer) where it previously failed.
+1 unit test reproducing the trailing-text case. Full e2e green (1851).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(ai): feed back only normalized tool_calls as the assistant turn

After detecting emulated tool calls, the orchestrator pushed the model's raw
completion text as the assistant turn. When a model appends a self-hallucinated
`TOOL_RESULTS:` block after its tool_calls (observed with the Claude CLI), that fake
block was fed back alongside the REAL results, confusing the model — its final answer
became a meta-complaint ("these TOOL_RESULTS were not requested by me…") instead of a
clean summary, even though the tools executed correctly.

Now record a normalized assistant turn containing only `{"tool_calls":[…]}` (name +
arguments), never the raw text. Verified end-to-end via Chrome MCP across all three
backends (external/mittwald, local/Ollama, Claude CLI): multi-tool calls + result
processing now produce clean answers, and the confirm-before-execute gate works on
each. +1 unit test (assistant turn never carries hallucinated trailing text). Full
e2e green (1852).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(ai): self-optimizing prompts — DB-editable templates, learning loop, context-window handling

Makes prompt construction transparent, adaptive and self-improving so non-technical
users only enter domain prompts while the system supplies everything the LLM needs —
across all backends (external/local/CLI) — and always within the security model.

- DB-editable prompt store (CoreAiPromptTemplate, admin CRUD): the system prompt is
  assembled from keyed fragments (base, permissions, anti_hallucination,
  output_contract, tool protocol, error_guidance, learned_hints, plan_protocol). Ships
  rich built-in defaults; DB rows override per key (locale/capability scoped). No prompt
  text is hard-coded-and-unreachable.
- Rich auto-enrichment in CoreAiPromptBuilderService (now template-driven + async):
  roles, exact allowed tools, tool catalog with parameter schemas, anti-hallucination
  contract ("never invent/guess; use a tool or say you don't know"), structured output
  contract, error guidance — placeholders rendered at build time.
- Structured tool errors fed back to the model ({ code, message, hint }) so it can
  self-correct instead of pretending success.
- Governed self-improvement loop (CoreAiPromptHint + CoreAiPromptHintService):
  orchestrator records failure signals (tool_error/exception/not_available); recurring
  patterns become learned hints. Default governed (admin approves); ai.promptLearning
  { enabled, autoApply, minOccurrences } can auto-apply. Hints only ADD guidance — never
  relax permissions.
- Per user/session context-window handling: contextWindow per connection (+ ai.contextWindow
  fallback 8192); fitMessagesToContext trims the oldest session-history turns (keeping the
  system prompt + current input), truncates as a last resort, and caps oversized tool
  results (ai.maxToolResultChars). Applied in auto + plan modes.
- New override hooks (promptTemplateService, promptHintService), barrel + config exports,
  IAi config additions. +9 unit tests. Full e2e green (1858).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(ai): admin CRUD for prompt templates/hints + auto-detected context window

Phase 5/6 of the self-optimizing prompt system:

- Admin endpoints (GraphQL + REST, all @Roles(ADMIN)) to CRUD prompt-template
  fragments and the learned prompt hints — so admins can edit every prompt text
  and review/approve/reject the governed learning loop.
- Context-window size is now auto-detected per LLM: OpenAiCompatibleProvider
  probes a local Ollama /api/show then falls back to a known-model table;
  ClaudeCliProvider returns the model-appropriate size (200k, 1M for 1m variants).
  detectAndPersistCapabilities() persists the detected window alongside the
  capability flags; the orchestrator's lazy-detect guard also fires when the
  window is still unknown.
- Tests: +unit (detectContextWindow for known models + claude alias) and +e2e
  (template override applies to the prompt; learning loop end-to-end with a
  throwing tool → suggested hint → admin approve → hint in prompt; admin-only
  prompt-template/hint queries + non-admin denial).

Full suite green (61 files, 1863 e2e; 80 ai unit).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(ai): register the claude-cli provider in the example AiToolsModule

Demonstrates the opt-in LlmProviderFactory.registerBuilder() pattern in the
reference implementation so a connection with providerType 'claude-cli' works
out of the box in the framework's own server. No effect on existing tests
(no test creates a claude-cli connection); full ai e2e green (32).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(ai): document self-optimizing prompts + context-window handling

- README: new "Self-optimizing prompts" (editable templates + governed
  learning loop) and "Context window" sections; config example + override
  table extended (promptTemplateService, promptHintService).
- configurable-features: AI row + override list updated with the prompt
  template/hint stores, promptLearning, contextWindow, maxToolResultChars.
- INTEGRATION-CHECKLIST: advanced-config example + admin-UI / context-window notes.
- migration guide 11.26.0: new-features entry + dedicated subsections.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* style(ai): oxfmt the README and the context-window model table

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(ai): scoped prompt-template fragments (tool:/role:/mode:)

Adds an optional `scope` field to CoreAiPromptTemplate. Fragments with a
non-empty `scope` are only included when the active run scopes contain it.
The prompt builder computes the active scopes from the run: `tool:<name>`
for each tool in scope, `role:<name>` for each user role, and `mode:<name>`
when a named mode is active (foundation for the upcoming AiMode work).

This lets admins author per-tool / per-role / per-mode guidance fragments
that the LLM only sees when relevant — keeping the prompt focused for
end-user requests instead of inflating it with every possible instruction.

+2 unit tests (template-service scope hop + in-builder fallback).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(ai): AskUserQuestion built-in tool — let the LLM clarify before acting

For non-technical end users a focused clarifying question often beats a wrong
action. The model can now call the built-in `ask_user_question` tool to pause
the run and ask the user (with optional multiple-choice options) instead of
guessing.

The orchestrator detects the tool's sentinel return shape, breaks the loop,
and surfaces `CoreAiResponse.pendingQuestion = { question, options? }`. The
client renders it and sends the user's answer as the next prompt — the
conversation continues naturally, no special response field needed.

The tool is auto-registered with `roles: [S_USER]`, non-mutating; permission
restrictions are still enforced backend-side regardless of how the prompt is
phrased. +1 unit test verifying short-circuit + payload.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(ai): persistent permission decisions (remember-my-choice for mutating tools)

End users should not have to re-confirm the same mutating action repeatedly
when they have already granted consent. Adds `aiToolGrants` (admin CRUD) and
a new `rememberDecision` field on `CoreAiPromptInput`:

- When the user confirms (`confirm: true`) AND sets `rememberDecision` to
  'conversation', 'user' or 'tenant', the orchestrator persists a grant for
  that scope after a successful, non-destructive mutating execution.
- On future iterations, the confirmation gate consults the grant store: if an
  active, non-expired grant covers the tool in any scope (user / tenant /
  conversation), the gate is skipped for that call. Destructive tools NEVER
  use grants — they always confirm.
- Grants only skip the confirmation gate. The permission model itself
  (`@Restricted`, `@Roles`, `authorize()`) is enforced backend-side
  regardless.

Override the store via `CoreModule.forRoot(env, { ai: { toolGrantService } })`.
+2 unit tests verifying skip-on-grant + persist-on-remember.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(ai): lifecycle hooks (PreToolUse / PostToolUse / SessionStart / Stop)

Adds a generic hook registry so projects can register compliance-, audit- and
sanitization callbacks at well-defined points of the agent loop without forking
the orchestrator. Hooks are NestJS providers extending AiHookBase that
self-register in the global AiHookRegistry on module init.

- `preToolUse(call, tool, event)`: can `{block: true}` the call (returned as a
  structured BLOCKED_BY_HOOK error to the LLM) or replace `args` (chained
  sanitization across hooks).
- `postToolUse(call, tool, {result, success}, event)`: pure notification — for
  webhooks, metrics, audit shipping.
- `sessionStart(event)`: called once before the first LLM call.
- `stop(response, event)`: called once at the end of every run.

Errors thrown inside hooks are swallowed + warn-logged — a misbehaving hook
must never crash a prompt. Hooks can only ADD restrictions; they cannot relax
the existing permission model (@Restricted/@Roles/authorize() still apply).

+2 unit tests: blocking hook stops execution + observe hook receives notify.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(ai): scoped tool-policies (deny/ask/allow against tool arguments)

Adds a second permission layer underneath the role-based tool gating: admins
can author fine-grained allow / ask / deny rules against the arguments of a
tool call without needing to write code. Example: a generic dbQuery tool can
be opened up to a support role with:
  - allow `sql` matching `^SELECT\\b`
  - deny  `sql` matching `(?i)\\b(DROP|TRUNCATE|DELETE FROM)\\b`
  - ask   `sql` matching `(?i)\\bUPDATE\\b`

Evaluation across all matching policies follows the precedence
  deny > ask > allow
so a deny anywhere always wins; an ask routes the call through the existing
confirmation gate even if the tool itself isn't `mutating`. A pure allow lets
the call proceed without re-gating. No matching policy → fall through to the
existing behaviour.

Policies stack across scopes: `tool` (any user), `role`, `tenant`, `user`.
Like all other layers, policies can only TIGHTEN — they never relax the
underlying permission model (@Restricted / @Roles / authorize() still apply).

+2 unit tests verifying deny aborts + ask forces the confirmation gate.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(ai): use ConfigService.setConfig({reInit:true}) for the postToolUse hook test to avoid config merge leakage from prior tests

* style(ai): oxfmt new files (hooks, grant service, builder, orchestrator)

* feat(ai): deferred tool-schemas + search_tools meta-tool (#13)

With many tools, the JSON-Schema catalog can dominate the system prompt and
cost a large chunk of every request. `ai.deferToolSchemas: true` switches the
prompt to a NAMES + descriptions only listing; the LLM uses the new built-in
`search_tools` meta-tool to fetch a specific tool's parameter schema on demand.
Result: massively smaller default context footprint for projects with rich
tool registries, with no loss of capability for the model.

`search_tools` is role-gated and returns only the tools the current user can
already see (defense in depth on top of the orchestrator's role filter).

TDD: +2 unit tests (defer-on emits names + hint, schemas hidden; defer-off
keeps the full catalog as before).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(ai): LLM-driven context compaction (#7)

When a session would overflow the model's context window, the orchestrator now
summarizes the oldest non-system / non-last turns into a single short system
message via the same connection's provider — instead of dropping them. The
result: long sessions stay coherent (cross-turn intent preserved) instead of
losing context to hard truncation. Falls back to the hard trim path on any
error. Configurable: `ai.compaction: false` to disable.

TDD: +1 unit test with a fake provider returning a known summary verifies
the oldest turns get replaced by the summary and the system + current prompt
are preserved.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(ai): named agent-modes (#8) — admin-defined preset assistants

A `CoreAiMode` row bundles a curated tool whitelist, optional model override,
optional role gating and an optional prompt addendum under a name like
`support`, `audit`, `billing`. End-user prompts activate one with
`agentMode: 'support'` — the orchestrator then narrows the available tool set
to the whitelist (built-in ask_user_question + search_tools always stay
available — they are essential for end-user UX). Modes only ever TIGHTEN;
they cannot relax the underlying permission model.

Combines with the existing prompt-template `mode:<name>` scope filter (#2):
admins can author per-mode prompt fragments that fire only when that mode is
active.

TDD: +1 unit test — a model in `support` mode tries to call a non-whitelisted
tool and gets the standard TOOL_NOT_AVAILABLE response.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(ai): multi-modal attachments (#17)

End users can now attach images / files to a prompt via `input.attachments`
(array of `{mimeType, dataUrl|url, name?}`). The orchestrator forwards them on
the user message; OpenAiCompatibleProvider translates them to the OpenAI
content-parts shape (`[{type:'text'},{type:'image_url'}]`) — sent to vision-
capable backends (mittwald-Ministral, OpenAI, Anthropic via OpenAI-compat
gateways, Ollama vision models). Providers without vision support naturally
ignore them.

Adds LlmAttachment interface + `LlmMessage.attachments` so other providers
can adopt the mapping in one place. Powers product flows like "send a
screenshot + describe the bug".

TDD: +1 unit test verifies attachments survive from CoreAiPromptInput to the
provider on the user message.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(ai): MCP-Client — import tools from external MCP servers (#15)

`CoreAiMcpClientService.registerExternalClient({ name, client })` takes an
MCP-like client (the SDK `Client`, or a duck-typed one), calls `listTools()`,
and registers each as a wrapper tool in the global AiToolRegistry under the
namespaced name `<clientName>_<toolName>`. The wrapper dispatches `execute()`
back to the MCP server's `callTool` and normalises the response into our
{success, data, message} shape.

Imported tools are role-gated like any other tool and ship with the
conservative defaults `mutating: true` + `roles: [S_USER]` — so they always
require confirmation unless an admin scoped-policy (or grant) relaxes that.

Projects bring their own MCP client (stdio spawn / StreamableHTTP / SSE /
OAuth) — this service is the glue that turns it into a first-class set of
backend tools. Override the service for custom connection logic.

TDD: +1 unit test — a fake MCP client returns one tool, we register it, and
verify the wrapper dispatches `callTool` correctly and surfaces the result.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(ai): use JSON scalar for attachments field (GraphQL schema build)

* style(ai): oxfmt the MCP-client service

* feat(ai): expose effective token limit + context-window utilization on responses

Two new payload fields drive the chat UI's usage indicators:

1. `CoreAiBudgetSummary.maxTokens` + `scope` ('user' | 'tenant' | 'llm'):
   resolved as user-limit → tenant-limit → LLM context-window. Lets the client
   render a usage progress bar against the effective ceiling for THIS user.
   Null = no limit at all (frontend hides the bar).

2. `CoreAiResponse.contextWindow = { used, total }`: current session token
   utilization vs. the connection's auto-detected context window. Powers a
   "closing-circle" indicator (Claude Code-style) that appears once the model
   is approaching its context limit.

Both work even without a user (LLM-window fallback) and even without the
budget service (no budget = LLM fallback only).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(ai): user-facing prompt snippets ("Vorlagen") with own/tenant/global visibility

A user-facing companion to the admin-only CoreAiPromptTemplate: any signed-in
user can author short, named prompt snippets that they can insert into the
chat input with one click. Different from the system-prompt building blocks,
these are USER prompts (e.g. "Schreibe eine kurze Antwort zu …").

Visibility scopes (enforced server-side; the picker only sees what it's
allowed to see):
  - 'user'   — only the owner sees it (default).
  - 'tenant' — all members of the owner's tenant see it (tenant context required).
  - 'global' — every signed-in user sees it (creation requires the ADMIN role).

Owner-only mutations: only the creator can update/delete; admins still pass
via the standard admin pipeline. The full security model:
  - listVisible(): server-side $or query (own + global + tenant), filtered to
    enabled snippets, ordered by `order` asc then `name`.
  - create(): runs the standard pipeline first (validation + per-input
    whitelist), then writes the system-owned ownerId/scope/tenantId via a
    direct update — the input DTO deliberately doesn't expose those fields,
    so `prepareInput` would strip them if we passed them through super.create.
  - update(): assertOwner first, strip ownerId/tenantId from input,
    validate scope changes (incl. admin gate for 'global'), then persist
    tenantId directly when scope changed.
  - securityCheck(): per-row filter that returns undefined for snippets the
    caller is not allowed to see (own + tenant-member + global pass).
  - assertOwner(): ADMIN bypass; otherwise compares row.ownerId to user.id.

Module wiring: new options field promptSnippetService, MongooseModule.forFeature
schema entry, AI_PROMPT_SNIPPET_CLASS + service token, provider registration,
barrel exports. The aiPromptSnippets collection has a unique compound index
{ ownerId: 1, name: 1 } so a user can't shadow their own snippet by accident.

Endpoints (all S_USER):
  REST     GET    /ai/snippets
           POST   /ai/snippets
           PUT    /ai/snippets/:id
           DELETE /ai/snippets/:id
  GraphQL  findAiPromptSnippets
           createAiPromptSnippet
           updateAiPromptSnippet
           deleteAiPromptSnippet

Tests: +5 e2e covering scope filtering (own/tenant/global), admin-only global
creation, tenant-context requirement, owner-only mutations, and the REST +
GraphQL surface. Full suite green (1883 tests).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* refactor(ai): rename PromptTemplate→Slot + Snippet→Prompt; drop 'global' prompt scope

Phase 1 of the AI-naming overhaul. Renames the two prompt-side stores to terms
that match how they're actually used in the UI and conversations:

  * CoreAiPromptTemplate → CoreAiSlot
    The admin-edited building blocks of the SYSTEM prompt. Each row fills a
    keyed slot (`base`, `permissions`, `anti_hallucination`, `tool_catalog`,
    `output_contract`, `tool_protocol_emulated`, `error_guidance`, …) and
    overrides the framework default for that key. The model already used
    "slot" in its description text — the file/class/collection names now
    match. Adds an auto-set `tenantId` field so Phase 2 can introduce
    tenant-scoped overrides without another schema bump.

  * CoreAiPromptSnippet → CoreAiPrompt
    User-facing re-usable prompts ("Vorlagen") — short, named user-prompt
    presets that can be inserted into the chat input with one click. No
    longer to be confused with the system-prompt building blocks above.

The existing CoreAiPromptInput (= the payload of the aiPrompt mutation, i.e.
the user's question to the LLM) keeps its name; the CRUD inputs of the new
model are CoreAiPromptCreateInput / CoreAiPromptUpdateInput.

Endpoints and GraphQL operations follow:

  * /ai/prompt-templates → /ai/slots
    findAi/createAi/updateAi/deleteAiPromptTemplate(s)
       → findAi/createAi/updateAi/deleteAiSlot(s)
  * /ai/snippets → /ai/prompts
    findAi/createAi/updateAi/deleteAiPromptSnippet(s)
       → findAi/createAi/updateAi/deleteAiPrompt(s)

The user-facing prompt scope drops `'global'`. The Backend used to allow
admins to create system-wide prompts; admins now have a proper tenant-scoped
system-prompt extension mechanism via Slots instead. Remaining valid scopes
are `'user'` (private, default) and `'tenant'` (public within the tenant).
Backend rejects `scope: 'global'` for everyone — including admins.

Collections (aiPromptTemplates, aiPromptSnippets) are renamed to (aiSlots,
aiPrompts). The AI module hasn't been used in production yet, so no DB
migration is shipped — fresh installs start clean.

Module override field renamed from `promptTemplateService`/`promptSnippetService`
to `slotService`/`promptService`. Module exports updated.

37/37 ai e2e tests green. Full `pnpm run check` green.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(ai): tenant-scoped slots with override/reset + runtime placeholder registry

Phase 2 + 3 of the AI naming overhaul.

Phase 2 — Slot tenant-scoping + system-default overrides
========================================================

`CoreAiSlot` now carries a `tenantId` that's set system-side from the calling
admin's tenant (RequestContext + serviceOptions). Multi-tenant deployments
get per-tenant overrides for free; single-tenant deployments stay
system-wide (tenantId stays undefined).

`CoreAiSlotService`:
- create / update / delete now require ADMIN and verify the row belongs to
  the caller's tenant; tenantId is set system-side and stripped from input.
- `resolveFragments(...)` filters DB rows by the request's tenantId and now
  also honors disabled (soft-deleted) system-default overrides — a row with
  `enabled: false` matching a system key HIDES the default for that tenant.
- New `listEffective(...)` returns the framework defaults overlaid by the
  tenant's rows, each with `isSystem` / `isOverride` flags so the admin UI
  can render the right action (Bearbeiten / Zurücksetzen / Deaktivieren /
  Löschen).
- New `resetSystemSlot(id, ...)` — deletes a tenant override row → the
  framework default applies again. Refuses to operate on custom slots
  (use `delete` for those).

The default-fragments definition was extracted to a module function
`getSystemDefaultSlots()` so both the prompt builder and the slot service
can reach it without a circular dependency. The builder's
`defaultFragments()` is now a thin wrapper that combines the system
defaults with the configured `ai.systemPrompt`.

New endpoints:
- `GET /ai/slots/effective` — system defaults + overrides + customs (admin).
- `POST /ai/slots/:id/reset` — reset a tenant override (admin).

Phase 3 — Runtime placeholder registry
======================================

New `CoreAiPlaceholderRegistry` service. The 6 system placeholders
(`roles`, `tools`, `toolCatalog`, `documentation`, `learnedHints`,
`userId`) are registered at boot; projects can add their own via
`register({ name, description, resolve })` from any provider.

Why a registry instead of hard-coded names in the frontend:

  * The frontend loads the list via the new endpoint, so admin / user
    editors see EVERY currently-supported placeholder (including project-
    specific ones) without a frontend change.
  * Resolvers stay in TypeScript — no eval, no DB-stored function bodies,
    no admin-defined runtime code paths. Secure by construction.

The prompt builder's `renderContext()` now delegates to the registry when
available (falls back to the hard-coded record for legacy paths).

New endpoint: `GET /ai/placeholders` (S_USER — placeholders aren't secrets,
the resolver implementations stay backend-side).

Tests: 43/43 ai e2e green (was 37/37). Added:
- listEffective returns 10 system defaults on a clean tenant
- admin override + reset round-trip
- non-admin → ForbiddenException on slot reads/writes
- placeholder registry lists the 6 system placeholders
- non-admin user can hit `/ai/placeholders` (200)
- project-registered placeholder is honored end-to-end

Module override fields added: `placeholderRegistry`. Index barrel exports
the new interface + service.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(ai): idempotent slot create + user-prompt placeholder resolution + docs

Four follow-up optimizations on the AI naming overhaul + tenant-scoped slots
+ placeholder registry:

1. **Idempotent slot create** — `CoreAiSlotService.create()` now upserts on
   `(tenantId, key)`: a second "Override anlegen" / "Deaktivieren" action on
   the same system slot UPDATES the existing row instead of inserting a
   duplicate. Live-verified via Chrome MCP — two overrides of `base`
   produce a single DB row with the latest content; the row's `_id` stays
   stable across edits.

2. **User-prompt placeholder resolution** — `CoreAiService.prompt()` now
   runs a new `resolvePromptPlaceholders(input, serviceOptions)` pass
   BEFORE prepareRun, replacing `{{placeholder}}` tokens in the user's
   prompt text using the runtime placeholder registry. A stored prompt
   template like "Erkläre dem Nutzer mit ID {{userId}} …" now gets the
   real id substituted at run time. Unknown tokens are left as-is so
   plain text with curly braces survives. `promptStream()` inherits the
   resolution because it delegates to `prompt()`.

3. **assertSameTenant performance** — projects on `tenantId` only when
   verifying tenant ownership (cheaper than loading the full document).

4. **Documentation** — README, INTEGRATION-CHECKLIST and the 11.25.x →
   11.26.0 migration guide now use the new Slot / Prompt / placeholder-
   registry terminology. The migration guide gains an "AI naming overhaul"
   section documenting the rename for projects that experimented with
   pre-release AI builds (cleanup steps: copy data from `aiPromptTemplates`
   → `aiSlots`, `aiPromptSnippets` → `aiPrompts`; routes / module
   override names changed; user-prompt scope `'global'` removed).

Tests: +3 e2e (46/46 total ai e2e green) — covers (a) idempotent override,
(b) user-prompt placeholder substitution with real values, (c) unknown
tokens preserved.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(ai): make the tool-registration step impossible to miss

A fresh AI module on a new project answers "I don't have a tool to do that"
for every domain question — that's correct behaviour (LLM is untrusted, the
registry starts empty), but new integrators (humans and AI agents) keep
reporting it as a bug. The docs now spell this out wherever someone might
land first:

- **CLAUDE.md** gains an "AI Module: Tools Are Opt-In" section right under
  Core Principles so AI agents working with the framework see it before
  starting any AI integration.
- **README.md** (AI module) gains a prominent callout above the existing
  "## Tools" section explaining the no-auto-discovery contract, the
  three-step path (write tool subclass / group in AiToolsModule / import
  in ServerModule), the four reference tools, and the security contract
  (CrudService + serviceOptions; never direct Model.find()).
- **INTEGRATION-CHECKLIST.md** §3 + §4 now explicitly flag both as
  REQUIRED for the assistant to do anything domain-specific, and walk
  through the copy-from-framework step with the exact import rewrites
  ('../../../../core/...' → '@lenne.tech/nest-server', user.service path
  adjustment) plus the boot-still-works fallback.

No behavioural change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(ai): cumulative budget summary + provider-quota fallback on connection

The token bar showed only the last request's tokens against the LLM context
window when no user/tenant budget was configured. That's neither a real
"used" value (it's per-request, not per-period) nor a real limit (the
context window is per-call, not per-day). Two changes:

1. **Cumulative usage even at scope='llm'.** `buildSummary` now always
   aggregates the user's running per-period token total (via
   `getUsage({ userId }, period)`) — same query that backs the hard-budget
   path. The fallback hierarchy is unchanged:

      user override > tenant override > config defaults > provider quota > context window

   …but `summary.usedTokens` is now filled at EVERY level, not just for
   hard limits. The UI's token bar can therefore show a real "verbraucht
   im aktuellen Zeitraum" counter against whatever soft fallback applies.

2. **Provider-quota fallback on the connection.** Two new admin fields
   on `CoreAiConnection`:

      defaultUserMaxTokens?: number   // e.g. 50000 (= soft cap per user)
      defaultUserMaxPeriod?: string   // 'day' (default) | 'month' | 'none'

   When set, the connection's soft user quota is used as `maxTokens`
   (still surfaced as scope='llm' so the UI knows it isn't a 429-able
   hard limit). This is the technical landing spot for the user's "vom
   LLM-Anbieter ermittelt" requirement — the admin pins the provider's
   per-user quota once per connection and the UI immediately reflects it.

The resolved-connection interface gains the two fields so the orchestrator
can pass them through to `buildSummary`; the connection input DTO accepts
them for create/update.

UI (in nuxt-base-starter): the token bar now reads `usedTokens` for every
scope (no more "last request" special case), the scope label for `'llm'`
becomes "Anbieter-Quota (weich)", and the tooltip says "Kumulativ (weiches
Limit)" so users understand the bar fills over the period.

Tests: +2 e2e (48/48 ai e2e total green) — covers cumulative aggregation
under the context-window fallback and the new provider-quota fallback.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(release): 11.26.0 — AI assistant module + tenant-scoped slots + placeholder registry

Version bump to ship the AI-module work that has been collecting on
feature/ai-module. The complete catalog of changes lives in
`migration-guides/11.25.x-to-11.26.0.md`, which the same commit extends
with the missing follow-up sections (user-prompt placeholder resolution,
cumulative budget summary at every scope, the new `defaultUserMaxTokens` /
`defaultUserMaxPeriod` provider quota on `CoreAiConnection`, and an
explicit "Tools are opt-in" callout for new integrators).

No breaking changes for projects that don't opt in to the AI module
(absent `ai` config block → module stays disabled). For projects that
experimented with the pre-release AI build, the "AI naming overhaul"
section of the migration guide lists every rename (`PromptTemplate → Slot`,
`PromptSnippet → Prompt`, the `'global'` user-prompt scope removal, etc.).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(docs): demote spectaql {{placeholder}} interpolation errors to warnings

Spectaql renders GraphQL descriptions through Handlebars and turned the AI
module's `{{placeholders}}` notation into a hard `Unsupported interpolation`
error. That error broke the `docs:bootstrap` script chain — spectaql crashed
before writing its tmp index.html, which then surfaced as a cascade of
follow-up ENOENT errors. The `{{placeholders}}` notation in slot / prompt
content descriptions is intentional: it documents the runtime placeholder
registry consumed by `CoreAiPromptBuilderService`, not a Handlebars binding.

Sets `spectaql.errorOnInterpolationReferenceNotFound: false` in `spectaql.yml`
(must sit under the `spectaql:` block, not at root — verified against
spectaql 3.0.9 source / examples). Unresolved interpolations now log as
`WARNING: Unsupported interpolation encountered: "{{placeholders}}"` and the
docs build runs to completion.

Verified: `pnpm run docs:bootstrap` exits 0; `public/index.html` is
regenerated; the ENOENT cascade in `server.log` is gone.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore(boot): silence dotenv startup banner

`dotenv` v17.4.2 emits a "◇ injected env (N) from .env // tip: …" line on
every `config()` call. The tip is randomly chosen from a hard-coded TIPS
array that includes promotional links to other Motdotla products
(www.dotenvx.com, www.vestauth.com). Since the framework already has
structured logging via NestJS, the banner is pure noise.

`{ quiet: true }` is dotenv's surgical kill switch — it disables the two
status `_log()` calls inside dotenv (the "injected env" line and the
encrypted-`.env.vault` loader line, which this codebase doesn't trigger
anyway). It does NOT silence `_warn()` (e.g. missing `.env.vault` for a
configured `DOTENV_KEY`) or `_debug()` (only active with `debug: true`),
so genuine misconfiguration warnings still surface.

Applied at both `dotenv.config()` call sites — `src/config.env.ts:19`
(top-level config bootstrap) and `src/core/common/helpers/config.helper.ts`
(env-aware helper used by consumer projects).

Verified by running `pnpm start` after the change — server boot log now
starts directly with `Configured for: local` and the regular NestJS
InstanceLoader output; both `injected env` lines and the `vestauth`
promo are gone.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* chore: silence spectaql warnings, clear lint hints, fix vitest oxc warning

- Reword AI Slot descriptions to drop `{{placeholders}}` notation (the
  registry endpoint at `/ai/placeholders` already documents the active
  token set) — silences three spectaql "Unsupported interpolation"
  warnings without losing information.
- Resolve oxlint warnings: switch `!= null` / `== null` to strict
  comparisons in the placeholder/prompt resolver paths, rename
  `registry` → `placeholderRegistry` in two test cases to avoid
  shadowing the outer `AiToolRegistry` variable, and add a
  `no-useless-constructor` disable + explanatory comment on the two
  built-in tool subclasses (their constructors are NOT useless — they
  re-export the protected `AiTool` constructor as public so NestJS DI
  can instantiate them).
- Add `oxc: false` to `vitest.config.ts` and `vitest-e2e.config.ts`:
  Vite 8 switched the default TS/JS transformer from esbuild to Oxc;
  `unplugin-swc` disables esbuild internally, but without the new flag
  Oxc would still run in parallel and emit a deprecation warning on
  every test run.
- Migration guide 11.25.x→11.26.0: new "Operational notes (no action
  required)" section covering the dotenv banner suppression, the
  vitest/Vite 8 `oxc: false` tip, and the Slot description rewrite.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(ai): OAuth client binding, conversation list projection, createHttpUser cookie→JWT fallback + extended overrides

Round of fixes from a multi-agent code review of the AI module. All 9 HIGH-severity
findings addressed; one pre-existing test-flakiness in `tests/ai.e2e-spec.ts` fixed
in the process.

Security (MCP OAuth 2.1):
- Bind refresh-token rotation to client_id (OAuth 2.1 §4.13.2 / §7.4) — prevents
  a stolen refresh token from being rotated by a different client and used to
  impersonate the original user. `rotateRefreshToken(token, clientId)` now
  atomically `findOneAndDelete({ token, clientId })`; `exchangeRefreshToken`
  rejects requests where `client.client_id` is missing or does not match.
- Return `client_secret`/`client_secret_expires_at`/`token_endpoint_auth_method`
  from `getClient()` so the MCP SDK's `clientAuth` middleware can actually
  verify the secret. Previously the secret was persisted but dropped on read,
  silently downgrading confidential clients to public clients (the SDK's
  secret-verify branch is gated by `if (client.client_secret)`).

Performance:
- Add `index: true` to `CoreAiConversation.createdBy` — previously a collection
  scan for `findByOwner`.
- Add `select: '-messages'` to `findAiConversations` / `findConversations` —
  conversation list previously over-fetched up to 500 messages per item.

Public API typing:
- Extend `ICoreModuleOverrides.ai` with `mcpClientService`, `modeService`,
  `placeholderRegistry`, `promptHintService`, `promptService`, `slotService`,
  `toolGrantService`, `toolPolicyService` (already accepted at runtime via
  `CoreAiModule.forRoot(...)`; TypeScript was blocking consumers from passing
  them).
- Add typed `IAi.claudeCli?: { bin?, extraArgs?, maxBudgetUsd? }` — was
  already consumed by `ClaudeCliProvider` but untyped.
- Sync the `.claude/rules/configurable-features.md` AI row to the post-rename
  surface (`aiSlots` / `/ai/slots` / `slotService`); previous text contradicted
  the README, INTEGRATION-CHECKLIST and migration guide.

Tests:
- 21 new unit tests covering the OAuth 2.1 store flow (single-use auth code,
  client_secret roundtrip, refresh-token rotation bound to client_id incl. the
  stolen-token-rejected case, `exchangeAuthorizationCode` and
  `exchangeRefreshToken` end-to-end through `buildOAuthProvider`,
  `challengeForAuthorizationCode`) and the MCP HTTP session lifecycle
  (`handlePost`/`handleGet`/`handleDelete` 401 with WWW-Authenticate, 404 for
  unknown sessions, `resolveUser` precedence chain, `evictIfNeeded` cap).
- Fix flaky 401 in `createHttpUser` (pre-existing): BetterAuth's
  `api.signInEmail()` does not always echo a session token into the response
  body — in isolated test runs the body is `{ requiresTwoFactor, success, user }`
  without `token` or `session`. Session is still established via the
  `iam.session_token` Set-Cookie. `createHttpUser` now forwards that cookie to
  `betterAuthService.getToken({ headers: { cookie } })` (the JWT plugin path)
  to deterministically obtain a Bearer JWT — eliminates flaky 401s across the
  HTTP-layer tests in this file.

Docs / housekeeping:
- Add `load-tests/ai-smoke.k6.js` as the AI request-pipeline smoke baseline.
- Persist reviewer agent memories for future review continuity.

Verification: 3 consecutive `pnpm run check` runs green (1915 tests pass,
build OK, server starts cleanly).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs(migration): document AI module fixes from the post-release review

Add migration notes for the changes that landed in the previous commit:

- Conversation list endpoint behaviour change (`select: '-messages'`) —
  clients that read `messages` from the list endpoint must switch to the
  per-id detail call. Surfaced both in "Operational notes" and as a
  ⚠️ Behaviour change row in Compatibility Notes.
- MCP OAuth security fixes — refresh-token rotation is now bound to
  `client_id` (impersonation fix) and `getClient()` returns
  `client_secret` so confidential clients actually verify it. Subclassers
  must mirror the new `rotateRefreshToken(token, clientId)` signature
  and not strip `client_secret` from the returned shape.
- Typed `ICoreModuleOverrides.ai` (now exposes all 18 collaborators) and
  `IAi.claudeCli` — `as any` casts can be dropped.

Overview Bugfixes row + Compatibility Notes updated to point readers at
the new "Operational notes" entries.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(ai): 5 bugs found during live mittwald-LLM browser test

Live test against a fresh fullstack workspace (mittwald
Ministral-3-14B-Instruct-2512) surfaced five bugs of varying severity.
This commit fixes all of them and adds the missing operational docs.

BUG-1 — HIGH: BSONError 500 from `loadRecentMessages` on bad conversationId
- A client that…
@kaihaase kaihaase merged commit f5fe09f into main May 31, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant