Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 15 additions & 1 deletion docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -972,8 +972,22 @@ Master Key (env var EDDI_VAULT_MASTER_KEY)
- **Envelope encryption**: Rotating the master key re-wraps KEK→DEK without touching individual secrets
- **Export scrubbing**: Agent export/sync automatically strips secrets from ZIP files

### Authentication Model
### Cryptographic Agent Identity

EDDI agents can sign their inter-agent messages using Ed25519 digital signatures. This protects multi-agent group conversations against identity spoofing, message tampering, and provides non-repudiation for audit trails.

**Key lifecycle:**

1. **Key generation**: `POST /agentstore/{id}/signing/keys` → `AgentSigningService.generateKeyPair()` creates an Ed25519 keypair. Public key stored in `AgentConfiguration.identity.publicKey`, private key encrypted in the Secrets Vault
2. **Key rotation**: `AgentPublicKey` records support versioned keys with `validFromMs`/`validUntilMs` windows. Old and new keys overlap during rotation. Private keys use versioned vault paths (`agent-signing-key:{agentId}:v{version}`)
3. **Signing**: When `security.signInterAgentMessages=true`, the `GroupConversationService` creates a `SignedEnvelope` for each agent response. The envelope contains the message payload, a UUID nonce, and an epoch timestamp. The canonical JSON form (RFC 8785 via `JacksonCanonicalizer`) is signed with Ed25519
4. **Self-verification**: Immediately after signing, the service verifies its own signature against the agent's public key. If self-verification fails, the signature is discarded (fail-safe to unsigned)
5. **Replay protection**: The `NonceCacheService` registers each nonce with freshness (5min default) and clock-skew (30s default) checks. Duplicate nonces are rejected
Comment thread
ginccc marked this conversation as resolved.
6. **Peer verification**: When `security.requirePeerVerification=true` on a receiving agent, the service reconstructs envelopes from stored `TranscriptEntry` fields and verifies each speaker's signature against their public key before sending context

**What is NOT covered:** MCP invocation signing is not yet implemented — the `signMcpInvocations` config field has been removed until the feature is built.

### Authentication Model
| Environment | OIDC Enabled | Behavior |
|-------------|-------------|----------|
| **Dev mode** | No | Allowed — info log on startup |
Expand Down
145 changes: 145 additions & 0 deletions docs/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,151 @@

---

## 🛠️ PR Feedback Remediation — Production Hardening (2026-05-17)
Comment thread
ginccc marked this conversation as resolved.

**Repo:** EDDI (`feature/feature-gap-remediation`)
**What changed:** Addressed ~25 findings from CodeQL, code quality bot, Copilot, and CodeRabbit reviews. All actionable items resolved.

### Security Fixes
- **NonceCacheService TOCTOU:** Replaced non-atomic `get()`+`put()` with `putIfAbsent()` for replay detection. The get-then-put pattern allowed two concurrent requests with the same nonce to both pass the replay check.
- **NonceCacheService null guard:** Added null/blank nonce early rejection.
- **Log injection (centralized):** Replaced per-file `sanitizeForLog()` methods in `GroupConversationService`, `MongoTenantQuotaStore`, `PostgresTenantQuotaStore` with centralized `LogSanitizer.sanitize()`. Added Unicode line separator (U+2028/U+2029) handling per CodeQL feedback. Also wrapped `e.getMessage()` in log calls.
- **Fail-closed cost accounting:** `PostgresTenantQuotaStore.tryAddCost()` now returns `DENIED` on SQL failure instead of `OK` — prevents budget bypass when database is unreachable.
- **Key version validation:** `AgentSigningService.generateKeyPairVersioned()` and `rotateKey()` now reject `version <= 0`.
- **JacksonCanonicalizer strict duplicate detection:** Enabled `StreamReadFeature.STRICT_DUPLICATE_DETECTION` to prevent collision attacks where different JSON payloads produce identical canonical output. Removed inaccurate RFC 8785 claim from javadoc.
- **AgentSigningService versioned key cleanup:** `deleteKeyPair()` now deletes both legacy unversioned and all versioned vault secrets. `generateKeyPairVersioned()` now evicts version-specific cache entries.

### Performance Fixes
- **Incremental peer verification:** `verifyPriorEntriesIfRequired()` now tracks last-verified transcript index per conversation (O(N) amortized instead of O(N²) per-turn re-verification). Public keys cached per speaker to avoid redundant `agentStore` lookups.
- **signEnvelope private key caching:** Now uses `privateKeyCache.computeIfAbsent()` with versioned cache key, avoiding vault round-trips on every call.

### Architecture Fixes
- **DiscoverToolsTool CDI exclusion:** Added `@Vetoed` to prevent Quarkus CDI from auto-discovering the class as a bean (it is manually constructed by AgentOrchestrator).
- **LAZY tool activation:** Fixed gap where discovered tools couldn't actually be called. `collectEnabledTools()` now returns ALL tools (registering executors), while `executeWithTools()` initially presents only `discover_tools` spec. After the LLM calls `discover_tools`, matching built-in specs are activated via `activateDiscoveredTools()`.
- **PostgresTenantQuotaStore transactional delete:** `deleteQuota()` now wraps both `tenant_quotas` and `tenant_usage` deletes in a single transaction with rollback on failure.
- **PostgresTenantQuotaStore schema auto-creation:** Added `CREATE TABLE IF NOT EXISTS` with `ensureSchema()` pattern (matching `PostgresGlobalVariableStore`, `PostgresSecretPersistence`, etc.).
- **MongoTenantQuotaStore unique index:** Added unique ascending index on `tenantId` for both `tenant_quotas` and `tenant_usage` collections to prevent duplicate rows from upsert races.
- **DiscoverToolsTool JSON serialization:** Replaced manual `StringBuilder` JSON assembly with Jackson `ObjectMapper` for proper escaping of special characters in tool descriptions.
- **JacksonCanonicalizer overload rename:** `canonicalize(Object)` → `canonicalizeObject(Object)` to eliminate static dispatch ambiguity.
- **GroupConversationService FQN cleanup:** Replaced 5 fully-qualified class references (`ai.labs.eddi.configs.agents.crypto.*`) with proper imports.
- **AgentOrchestrator log fix:** Compute external tool count explicitly instead of `activeSpecs.size() - 1` to avoid misleading `-1` in logs.

### Changelog accuracy
- Fixed Item 1 and Item 2 descriptions below (see corrections inline).

**Files:** `NonceCacheService.java`, `GroupConversationService.java`, `MongoTenantQuotaStore.java`, `PostgresTenantQuotaStore.java`, `AgentSigningService.java`, `AgentOrchestrator.java`, `DiscoverToolsTool.java`, `JacksonCanonicalizer.java`, `SignedEnvelope.java`, `LogSanitizer.java`, `changelog.md`

---


## 🛡️ Crypto Security Review — Fail-Safe Remediations (2026-05-15)

**Repo:** EDDI (`feature/feature-gap-remediation`)
**What changed:** Security-focused code review identified 7 findings (2 high, 3 medium, 2 low). All remediated. Key principle: signing failures are **fail-safe** — discard the broken signature and fall back to unsigned, rather than storing broken data.

### S1+S2 (HIGH): Signing failures now fail-safe to unsigned
- Self-verify failure (`verifyEnvelope` returns false) → discard signature, fall back to unsigned entry
- Nonce validation failure → discard signature, fall back to unsigned entry
- Previously: logged warning/error but continued with broken signature stored permanently

### S3+S4 (MEDIUM): Null guards for crypto infrastructure
- Signing block: `agentStore`, `agentSigningService`, `nonceCacheService` all guarded for null
- `agentConfig.getIdentity()` guarded before `getKeyValidAt()` call

### S7 (LOW): NonceCacheService unused `ttlMs` variable
- Removed computed `ttlMs` that was never passed to cache factory
- Added documentation comment explaining the cache TTL configuration requirement

### Tests: 15 new tests (84 total affected)
- `TranscriptEntry`: full 13-param constructor, `hasEnvelopeData()` (4 edge cases), signature-only constructor

### Docs updated
- `docs/architecture.md`: added Cryptographic Agent Identity section
- `planning/manager-ui-handoff.md`: removed `signMcpInvocations`, `forkingEnabled`, `maxForksPerConversation`, updated Security section to show active signing flags

---

## 🔐 Cryptographic Agent Identity — End-to-End Hardening (2026-05-15)

**Repo:** EDDI (`feature/feature-gap-remediation`)
**What changed:** Evolved the partial SignedEnvelope infrastructure into a fully-wired, production-standard cryptographic identity system. Removed dead config fields, added peer verification, and made all security features functional.

### Config Cleanup — Remove Dead Fields
- **Removed:** `signMcpInvocations` from `SecurityConfig` (no MCP signing implementation exists)
- **Removed:** `forkingEnabled` + `maxForksPerConversation` from `SessionManagement` (no forking service exists)
- **Rationale:** "Configs without functionality" creates false confidence. Features are added alongside their implementation, not before.
- **Files:** `AgentConfiguration.java`, `RestAgentStore.java` (removed `validateSessionFlags()`), tests updated

### TranscriptEntry — Full Envelope Storage
- **Added:** `signatureNonce`, `signatureTimestampMs`, `signatureKeyVersion` fields to `TranscriptEntry` record
- **Added:** `hasEnvelopeData()` convenience method for verification checks
- **Backward-compatible:** Two compact constructors for unsigned and signature-only entries
- **Files:** `GroupConversation.java`

### GroupConversationService — End-to-End Crypto Wiring
- **Injected:** `NonceCacheService` for replay protection
- **Signing block:** Now creates full `SignedEnvelope` with nonce, immediately self-verifies, registers nonce, and stores all envelope fields in `TranscriptEntry`
- **Added:** `verifyPriorEntriesIfRequired()` — when receiving agent has `requirePeerVerification=true`, reconstructs envelopes from stored fields and verifies each speaker's signature against their public key
- **Defense-in-depth:** Signing self-verifies at creation time; peer verification at consumption time catches key rotation issues or data corruption
- **Files:** `GroupConversationService.java`

### LlmConfiguration — Configurable maxToolsInContext
- **Added:** `maxToolsInContext` field (default: 20) to `LlmConfiguration.Task` for LAZY tool loading
- **Previously:** Hardcoded `int maxToolsInContext = 20` in `AgentOrchestrator`
- **Files:** `LlmConfiguration.java`, `AgentOrchestrator.java`

### MongoTenantQuotaStore — TOCTOU Documentation
- **Added:** Comment documenting the minor TOCTOU race at window boundaries in multi-instance deployments
- **Files:** `MongoTenantQuotaStore.java`

### Test Fixes
- Updated `SessionManagementTest`, `AgentConfigurationTest`, `RestAgentStoreTest` — removed references to deleted fields
- Updated `GroupConversationServiceTest` — added `NonceCacheService` constructor parameter
- All 69 affected tests pass (0 failures, 0 errors)

---

## 🔧 Feature Gap Remediation — 6 Items Resolved (2026-05-15)

**Repo:** EDDI (`feature/feature-gap-remediation`)
**What changed:** Systematic audit found 8 gaps between documented features and actual implementation. Fixed 6 items (2 required no changes).

### Item 1: Session Forking — Config Removed
- **Problem:** `forkingEnabled=true` accepted silently but no `ConversationForkService` exists
- **Original fix:** Added `validateSessionFlags()` in `RestAgentStore` to reject the flag with a clear error
- **Final state:** Both `forkingEnabled` and `maxForksPerConversation` config fields were fully removed (config-without-functionality anti-pattern). `validateSessionFlags()` was also removed since there are no session flags left to validate.
- **Files:** `AgentConfiguration.java`, `RestAgentStore.java`

### Item 2: Signing Flags — Config Removed
- **Problem:** `signMcpInvocations` flag accepted silently but no MCP signing implementation exists
- **Original fix:** Split `validateSecurityFlags()` to reject `signMcpInvocations` while allowing `signInterAgentMessages` and `requirePeerVerification`
- **Final state:** `signMcpInvocations` field was fully removed from `SecurityConfig`. The validation method was also removed since both remaining flags (`signInterAgentMessages`, `requirePeerVerification`) now have runtime implementations.
- **Files:** `AgentConfiguration.java`, `RestAgentStore.java`

### Item 3: DiscoverToolsTool — Recovered + Wired
- **Problem:** Token-saving lazy tool loading deleted as dead code (commit `05edf602`)
- **Fix:** Recovered `DiscoverToolsTool.java` + test, added `ToolLoadingStrategy` enum (EAGER/LAZY) to `LlmConfiguration.Task`, wired LAZY branch into `AgentOrchestrator.collectEnabledTools()` — when LAZY, only `discover_tools` meta-tool is sent initially, LLM discovers available tools, specs injected mid-loop
- **Files:** `DiscoverToolsTool.java` (recovered), `LlmConfiguration.java`, `AgentOrchestrator.java`

### Item 4: Cryptographic Infrastructure — Recovered + Wired
- **Problem:** `SignedEnvelope`, `JacksonCanonicalizer`, `NonceCacheService` deleted as dead code (commit `4a717fa5`)
- **Fix:** Recovered all 3 files + tests, re-added `signEnvelope()`/`verifyEnvelope()`/`rotateKey()`/`generateKeyPairVersioned()` to `AgentSigningService`, upgraded `GroupConversationService` signing from simple string signing to full `SignedEnvelope` with nonce-based replay protection
- **Files:** `SignedEnvelope.java`, `JacksonCanonicalizer.java`, `NonceCacheService.java` (all recovered), `AgentSigningService.java`, `GroupConversationService.java`

### Item 5: Tenant Quota DB Persistence — Dual-Backend Stores
- **Problem:** `ITenantQuotaStore` only had `InMemoryTenantQuotaStore` — restarts reset all quota counters, no cross-instance synchronization
- **Fix:** Created `MongoTenantQuotaStore` (uses `findAndModify` for atomicity) and `PostgresTenantQuotaStore` (uses `UPDATE...WHERE...RETURNING`), wired into `DataStoreProducers` following existing dual-backend pattern
- **Files:** `MongoTenantQuotaStore.java` (new), `PostgresTenantQuotaStore.java` (new), `DataStoreProducers.java`

### Item 6: NATS Documentation
- NATS code works correctly for what it does (durable ordered processing with retry/dead-letter)
- No code changes needed — documentation accuracy to be addressed separately

### Items 7-8: No Changes Needed
- HIPAA docs accurately describe documentation, not code enforcement
- OpenTelemetry opt-in is standard industry practice


## How to Read This Document

Each entry follows this format:
Expand Down
39 changes: 15 additions & 24 deletions planning/manager-ui-handoff.md
Original file line number Diff line number Diff line change
Expand Up @@ -179,20 +179,18 @@ These fields live on the **agent** object itself.
{
"security": {
"signInterAgentMessages": false,
"signMcpInvocations": false,
"requirePeerVerification": false
}
}
```

| Field | Type | Default | UI Widget |
|-------|------|---------|-----------|
| `signInterAgentMessages` | `boolean` | `false` | Toggle (disabled) |
| `signMcpInvocations` | `boolean` | `false` | Toggle (disabled) |
| `requirePeerVerification` | `boolean` | `false` | Toggle (disabled) |
| `signInterAgentMessages` | `boolean` | `false` | Toggle |
| `requirePeerVerification` | `boolean` | `false` | Toggle |

> [!WARNING]
> These flags are **validated but not yet wired**. The backend rejects `true` with HTTP 400 until the full signing pipeline is activated. The Manager should render them as **disabled toggles** with a tooltip: *"Available in a future release"*.
> [!IMPORTANT]
> Both flags are **fully wired** and operational as of v6.0.2. Enabling either requires a valid Ed25519 keypair on the agent's identity block (the backend validates on save). The Manager should show a validation error if the toggle is enabled but no keypair exists.

---

Expand Down Expand Up @@ -236,9 +234,7 @@ These fields live on the **agent** object itself.
"enabled": false,
"triggerOn": ["before_tool"]
},
"forkingEnabled": false,
"maxCheckpointsPerConversation": 10,
"maxForksPerConversation": 5
"maxCheckpointsPerConversation": 10
}
}
```
Expand All @@ -247,12 +243,10 @@ These fields live on the **agent** object itself.
|-------|------|---------|--------|-----------|
| `autoSnapshot.enabled` | `boolean` | `false` | — | Toggle switch |
| `autoSnapshot.triggerOn` | `string[]` | `[]` | `before_tool`, `before_action` | Multi-select checkboxes |
| `forkingEnabled` | `boolean` | `false` | — | Toggle (disabled — future feature) |
| `maxCheckpointsPerConversation` | `int` | `10` | 1–100 | Number input |
| `maxForksPerConversation` | `int` | `5` | 1–50 | Number input (disabled — future feature) |

**UX Notes:**
- `forkingEnabled` and `maxForksPerConversation` should be rendered as **disabled** with tooltip *"Available in a future release"*
- Session forking (`forkingEnabled`, `maxForksPerConversation`) has been **removed** from the config — it will be re-added when the forking service is implemented
- When `autoSnapshot.enabled` is `false`, collapse sub-fields

---
Expand Down Expand Up @@ -369,11 +363,11 @@ Two new condition types are available in the behavior rule editor.
│ Identity: did:eddi:agent-1 [Generate Keypair] │
│ Public Key: MCowBQYDK2Vw... (read-only) │
│ │
│ ┌ Signing Flags (not yet wired) ───────────────────
│ │ Sign inter-agent messages (disabled) ││
│ │ ○ Sign MCP invocations (disabled)
│ │ ○ Require peer verification (disabled)
│ └──────────────────────────────────────────────────┘
│ ┌ Signing Flags ─────────────────────────────────┐
│ │ Sign inter-agent messages [ON/OFF]
│ │ ✓ Require peer verification [ON/OFF]
│ │ ⚠ Requires keypair to enable
│ └────────────────────────────────────────────────
│ │
│ ── Memory Tab ──────────────────────────────────────│
│ Strict Write Discipline: [OFF] │
Expand All @@ -383,7 +377,7 @@ Two new condition types are available in the behavior rule editor.
│ Auto Snapshot: [OFF] │
│ Trigger On: ☑ before_tool ☐ before_action │
│ Max Checkpoints: [10] │
Forking: (disabled — future release)
└─────────────────────────────────────────────────────┘
```

Expand Down Expand Up @@ -442,9 +436,8 @@ interface AgentConfiguration {
};

security?: {
signInterAgentMessages: boolean; // default: false, NOT YET WIRED
signMcpInvocations: boolean; // default: false, NOT YET WIRED
requirePeerVerification: boolean; // default: false, NOT YET WIRED
signInterAgentMessages: boolean; // default: false, requires keypair
requirePeerVerification: boolean; // default: false, requires keypair
};

memoryPolicy?: {
Expand All @@ -459,9 +452,7 @@ interface AgentConfiguration {
enabled: boolean; // default: false
triggerOn: string[]; // 'before_tool', 'before_action'
};
forkingEnabled: boolean; // default: false, NOT YET WIRED
maxCheckpointsPerConversation: number; // default: 10
maxForksPerConversation: number; // default: 5, NOT YET WIRED
};
}
```
Expand Down Expand Up @@ -508,4 +499,4 @@ interface CapabilityMatchConfigs {
| 🟢 P2 | Session Management | Small | Toggle + checkboxes, but forking is deferred |
| 🟢 P2 | Behavior Conditions | Medium | Extends existing condition editor with 2 new types |
| ⚪ P3 | Cryptographic Identity | Small | Mostly read-only display + one button |
| ⚪ P3 | Security Flags | Trivial | 3 disabled toggles with "coming soon" tooltip |
| ⚪ P3 | Security Flags | Small | 2 toggles (active), require keypair validation |
Loading
Loading