Feature/feature gap remediation#494
Conversation
…s + validation guards Items resolved: 1. Session forking: validateSessionFlags() rejects forkingEnabled=true 2. Signing split: signMcpInvocations blocked, signInterAgentMessages+requirePeerVerification allowed 3. DiscoverToolsTool: recovered from 05edf60, wired LAZY strategy into AgentOrchestrator 4. Crypto infra: recovered SignedEnvelope/JacksonCanonicalizer/NonceCacheService from 4a717fa 5. Quota stores: MongoTenantQuotaStore + PostgresTenantQuotaStore with atomic operations 6. AgentSigningService: restored envelope signing, versioned keys, key rotation GroupConversationService upgraded from simple string signing to full SignedEnvelope with nonce-based replay protection.
…config cleanup - Remove dead config fields: signMcpInvocations, forkingEnabled, maxForksPerConversation - Expand TranscriptEntry with signatureNonce, signatureTimestampMs, signatureKeyVersion - Inject NonceCacheService into GroupConversationService for replay protection - Add verifyPriorEntriesIfRequired() for requirePeerVerification=true agents - Make maxToolsInContext configurable in LlmConfiguration.Task (was hardcoded 20) - Document TOCTOU race in MongoTenantQuotaStore - Fix all 12 affected test files, 69 tests pass
…uards, tests, docs Security fixes: - S1: Self-verify failure now discards signature (was log-and-continue) - S2: Nonce validation failure now discards signature (was warn-and-continue) - S3: Signing block guarded for null agentStore/agentSigningService/nonceCacheService - S4: Identity null check before getKeyValidAt() prevents NPE - S7: NonceCacheService — document cache TTL requirement, remove unused ttlMs Tests (15 new, 84 total across affected classes): - TranscriptEntry: full constructor, hasEnvelopeData (4 variants), signature-only ctor - All 84 tests pass, 0 failures Docs: - architecture.md: added Cryptographic Agent Identity section - manager-ui-handoff.md: removed signMcpInvocations, forkingEnabled, maxForksPerConversation Updated Security section to show active signing flags with keypair prerequisite
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
📝 WalkthroughWalkthroughImplements inter-agent Ed25519 envelope signing with canonical JSON, versioned keys and rotation, nonce replay protection, self- and peer-verification, updated security flags/UI docs, lazy tool discovery, and dual Mongo/Postgres tenant quota stores. ChangesCryptographic Agent Identity and Infrastructure Hardening
Sequence DiagramsequenceDiagram
participant Sender as GroupConversationService (signing)
participant Signer as AgentSigningService
participant Cache as NonceCacheService
participant Store as TranscriptStore
participant Verifier as GroupConversationService (verifier)
Sender->>Signer: signEnvelope(canonicalForm, keyVersion)
Signer-->>Sender: SignedEnvelope(signature, nonce, timestamp, keyVersion)
Sender->>Cache: validate(nonce, timestamp)
Cache-->>Sender: NonceValidation result
Sender->>Store: persist TranscriptEntry with envelope fields
Verifier->>Store: read new entries since cursor
Verifier->>Signer: verifyEnvelope(reconstructedEnvelope, publicKey)
Signer-->>Verifier: verificationResult (true/false)
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly Related PRs
Suggested Reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR claims to close a number of feature gaps and add a production-grade Ed25519 cryptographic identity stack for agents. It introduces full signed envelopes with nonce-based replay protection, fail-safe signing, peer verification of prior transcript entries, a lazy discover_tools meta-tool, dual-backend (Mongo + Postgres) tenant quota stores, and removes dead config fields (signMcpInvocations, forkingEnabled, maxForksPerConversation). It also expands test coverage and updates architecture/manager-UI documentation accordingly.
Changes:
- Adds
SignedEnvelope,JacksonCanonicalizer,NonceCacheServiceand extendsAgentSigningServicewith versioned-key signing,signEnvelope/verifyEnvelope/rotateKey; wires full envelopes + peer verification intoGroupConversationServiceand extendsTranscriptEntrywith envelope fields. - Adds
DiscoverToolsTooland aToolLoadingStrategy(EAGER/LAZY) plusmaxToolsInContexttoLlmConfiguration, wired throughAgentOrchestrator. - Adds Mongo + Postgres
ITenantQuotaStoreimplementations, removes dead config fields, and updates docs/tests.
Reviewed changes
Copilot reviewed 26 out of 26 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| src/main/java/ai/labs/eddi/configs/agents/AgentSigningService.java | Adds versioned key generation, envelope signing/verification, rotation. |
| src/main/java/ai/labs/eddi/configs/agents/crypto/SignedEnvelope.java | New immutable signed-envelope record with canonical-form helper. |
| src/main/java/ai/labs/eddi/configs/agents/crypto/JacksonCanonicalizer.java | New RFC 8785-style canonicalizer used for signing input. |
| src/main/java/ai/labs/eddi/configs/agents/crypto/NonceCacheService.java | Nonce-based replay protection with freshness/skew/replay checks. |
| src/main/java/ai/labs/eddi/configs/agents/model/AgentConfiguration.java | Removes signMcpInvocations, forkingEnabled, maxForksPerConversation. |
| src/main/java/ai/labs/eddi/configs/agents/rest/RestAgentStore.java | Updates validation messages; drops signMcpInvocations from check. |
| src/main/java/ai/labs/eddi/configs/groups/model/GroupConversation.java | Extends TranscriptEntry with envelope fields + helper constructors. |
| src/main/java/ai/labs/eddi/engine/internal/GroupConversationService.java | End-to-end envelope signing, self-verify, replay registration, peer verification. |
| src/main/java/ai/labs/eddi/datastore/DataStoreProducers.java | Produces selected ITenantQuotaStore based on backend type. |
| src/main/java/ai/labs/eddi/engine/tenancy/MongoTenantQuotaStore.java | New Mongo-backed quota store with atomic counters. |
| src/main/java/ai/labs/eddi/engine/tenancy/PostgresTenantQuotaStore.java | New Postgres-backed quota store using ON CONFLICT / RETURNING. |
| src/main/java/ai/labs/eddi/modules/llm/model/LlmConfiguration.java | Adds ToolLoadingStrategy enum and maxToolsInContext config. |
| src/main/java/ai/labs/eddi/modules/llm/impl/AgentOrchestrator.java | LAZY tool-loading branch using DiscoverToolsTool. |
| src/main/java/ai/labs/eddi/modules/llm/tools/impl/DiscoverToolsTool.java | New meta-tool for lazy/dynamic tool discovery. |
| docs/architecture.md, docs/changelog.md, planning/manager-ui-handoff.md | Updates docs to reflect new crypto identity model and removed fields. |
| src/test/java/.../*Test.java | Adds/updates tests for envelope, canonicalizer, nonce cache, discover-tools, agent config, session management, group conversation service. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 16
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/main/java/ai/labs/eddi/configs/agents/rest/RestAgentStore.java (1)
317-327:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftDon’t let
duplicateAgent()clone a security-enabled identity without a private key.This gate only checks for public key material on
identity, but the duplicate flow copies config only; it does not provision the vault-backed private key. A cloned agent can therefore pass validation here and still fail every latersignInterAgentMessagescall. Either clear copied crypto state during duplication or generate a fresh keypair before allowing the clone.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/main/java/ai/labs/eddi/configs/agents/rest/RestAgentStore.java` around lines 317 - 327, The duplicateAgent() flow must not copy vault-backed private key state into the new agent; when cloning, detect the source ConfigIdentity via config.getIdentity() and either clear its crypto fields (remove publicKey, clear getKeys()/rotated keys and any vault references) on the cloned config or generate and provision a fresh keypair into the duplicate before enabling signInterAgentMessages or requirePeerVerification; update duplicateAgent() to sanitize the cloned identity (or call the key-generation provisioning path) so the subsequent validation around signInterAgentMessages and requirePeerVerification cannot pass with only public-key material copied.
🧹 Nitpick comments (1)
src/main/java/ai/labs/eddi/modules/llm/impl/AgentOrchestrator.java (1)
406-412: ⚡ Quick winAdd conversation context to the new tool-loading logs.
The new INFO logs are missing conversation context. Include at least conversation/agent identifiers (sanitized) in these messages.
💡 Suggested logging update
- LOGGER.infof("LAZY tool loading: presenting discover_tools meta-tool (%d tools available)", allSpecs.size()); + LOGGER.infof("LAZY tool loading: conversation=%s agent=%s presenting discover_tools meta-tool (%d tools available)", + sanitize(memory.getConversationId()), sanitize(memory.getAgentId()), allSpecs.size()); @@ - LOGGER.info("Enabled " + allTools.size() + " built-in tools for agent"); + LOGGER.infof("Enabled %d built-in tools for agent: conversation=%s agent=%s", + allTools.size(), sanitize(memory.getConversationId()), sanitize(memory.getAgentId()));As per coding guidelines
src/main/java/**/*.java: Use JBoss Logger for logging, not System.out. Include conversation context in log messages. Use appropriate levels: DEBUG (verbose), INFO (important events), ERROR (failures).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/main/java/ai/labs/eddi/modules/llm/impl/AgentOrchestrator.java` around lines 406 - 412, The two logging statements in AgentOrchestrator (the LOGGER.infof(...) for LAZY tool loading referencing allSpecs and the LOGGER.info(...) returning allTools) must include sanitized conversation and agent identifiers; update both to use the JBoss LOGGER with formatted messages (e.g., LOGGER.infof) that inject a sanitizedConversationId and sanitizedAgentId (or the available conversationId/agentId getters) so logs read like "conversation=%s agent=%s ..."; ensure any helper used to sanitize IDs is applied before logging and keep log levels (INFO for important events, DEBUG for verbose) consistent with guidelines.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/architecture.md`:
- Line 985: Update the replay-protection sentence to use proper time-unit
formatting: change "5min" to "5 min" (or "5-minute") in the description that
references NonceCacheService and its freshness/clock-skew checks so it reads
e.g. "freshness (5 min default)" or "freshness (5-minute default)" while leaving
the rest of the `NonceCacheService` replay protection text unchanged.
In `@docs/changelog.md`:
- Around line 79-83: The changelog incorrectly states that
validateSessionFlags() was added in RestAgentStore while another entry says it
was removed; update Item 1 to reflect the final code state by changing the Fix
line to accurately describe the removal (or non-introduction) of
validateSessionFlags() in RestAgentStore — e.g., replace "Added
validateSessionFlags() in RestAgentStore" with "Removed validateSessionFlags()
from RestAgentStore" or "No longer uses validateSessionFlags() in
RestAgentStore" so the entry is consistent with the rest of the changelog.
In `@src/main/java/ai/labs/eddi/configs/agents/AgentSigningService.java`:
- Around line 241-265: Validate the provided version/newVersion before
generating or rotating keys: in generateKeyPairVersioned (and the analogous
rotation method), ensure version > 0 and that the target vault key name
(vaultKeyNameVersioned(agentId, version)) does not already exist in the secret
store; if invalid or already present, throw AgentSigningException. Implement
this by constructing the SecretReference(ref) early, checking the secretProvider
for an existing secret (or a suitable "exists" call / attempt to read) and
failing fast if found, and only proceed to store the new private key and evict
privateKeyCache.remove(cacheKey(tenantId, agentId)) after a successful store.
Ensure error messages include context (agentId, tenantId, version).
In `@src/main/java/ai/labs/eddi/configs/agents/crypto/NonceCacheService.java`:
- Around line 81-83: In NonceCacheService.validate(String nonce, long
timestampMs) add a fast-fail check that rejects null or blank nonce values
before touching the cache: if nonce is null or only whitespace, immediately
return the appropriate failure NonceValidation (e.g., a replay/invalid result)
instead of proceeding to cache lookup; update the method to perform this
validation at the very start (before calling the cache or using timestampMs) and
document the quick-return path so callers get consistent behavior for invalid
nonces.
- Around line 99-109: The replay check in NonceCacheService is vulnerable to
TOCTOU because it calls nonceCache.get(nonce) then nonceCache.put(nonce,
Boolean.TRUE) non-atomically; replace this pattern with an atomic check-and-set
(e.g., use nonceCache.putIfAbsent/computeIfAbsent or the cache's atomic insert
API) to ensure only the first caller records the nonce and subsequent concurrent
callers are treated as replay; update the logic around the nonceCache, nonce and
replayRejections handling so you atomically insert the nonce and, if the insert
indicates a prior value existed, increment replayRejections and return
NonceValidation.REPLAY, otherwise return NonceValidation.VALID.
In `@src/main/java/ai/labs/eddi/configs/agents/crypto/SignedEnvelope.java`:
- Around line 64-73: The SignedEnvelope factory methods currently store the
incoming payload Map by reference, allowing external mutation to alter the
signed content; update each static factory that accepts a Map<String,Object>
(e.g., SignedEnvelope.forSigning and the other payload-taking factories) to
defensively copy the payload and store an immutable copy (e.g., new
LinkedHashMap<>(payload) wrapped with Collections.unmodifiableMap or similar)
before calling the SignedEnvelope constructor so the internal payload cannot be
mutated after creation and signatures remain stable.
In `@src/main/java/ai/labs/eddi/engine/internal/GroupConversationService.java`:
- Around line 634-640: The verification currently always resolves the public key
by timestamp via
agentConfig.getIdentity().getKeyValidAt(signedEnvelope.timestampMs()), which
ignores signedEnvelope.signatureKeyVersion and can pick the wrong key during
rotation; update the logic in GroupConversationService to first check
signedEnvelope.signatureKeyVersion and, if present, call
agentConfig.getIdentity().getKeyForVersion(signatureKeyVersion) to obtain the
exact publicKey, falling back to getKeyValidAt(timestampMs) only when
signatureKeyVersion is null (apply the same change for the other occurrence
around the 1130-1143 block), then pass that resolved publicKey into
agentSigningService.verifyEnvelope(signedEnvelope, publicKey).
- Around line 1075-1165: verifyPriorEntriesIfRequired currently only logs
unsigned/invalid entries but still allows the full GroupConversation gc to be
used; change it to enforce requirePeerVerification by either (A)
removing/quarantining unverified entries from gc.getTranscript() before
returning or (B) throwing a specific exception (e.g., SecurityException or a new
PeerVerificationException) when unsigned>0 or failed>0 so the caller aborts the
turn/forwarding; update the method verifyPriorEntriesIfRequired in
GroupConversationService to implement one of these behaviors and ensure callers
that forward groupTranscript handle the thrown exception or the modified gc
accordingly (use the local variables verified/failed/unsigned and the gc
parameter and keep the existing logging).
In `@src/main/java/ai/labs/eddi/engine/tenancy/MongoTenantQuotaStore.java`:
- Around line 116-141: In MongoTenantQuotaStore, the findOneAndUpdate calls that
use .upsert(true) cause inserts instead of returning null, breaking the
stale-window fallback; change those findOneAndUpdate invocations on the usage
collection (the ones assigning to result and the other checks that expect null)
to not upsert (remove or set upsert(false)) so they will return null when no
current slot exists, preserving the subsequent fallback reset logic that updates
'existing' and returns QuotaCheckResult.OK; keep upsert only where an explicit
insert-on-miss is intended (if any) and update the three affected call sites
that currently call new
FindOneAndUpdateOptions().upsert(true).returnDocument(...) accordingly.
In `@src/main/java/ai/labs/eddi/engine/tenancy/PostgresTenantQuotaStore.java`:
- Around line 156-179: In PostgresTenantQuotaStore the INSERT ... ON CONFLICT
fallback for the API-call counter re-opens the quota because it doesn't mirror
the minute-start conditional used in the explicit UPDATE; change the ON CONFLICT
DO UPDATE clause in the quoted SQL (the block that prepares the statement and
uses minuteStart and costMonth) to apply the same CASE/minute_start logic for
api_calls_this_minute and minute_start as is used for
conversations_today/day_start so that on conflict you reset
api_calls_this_minute to 1 when the stored minute_start is older than the
provided minuteStart, otherwise leave/increment it consistently; update the
prepared-statement parameter ordering to pass minuteStart where the new CASE
expressions expect it so the RETURNING value correctly reflects the
capped/incremented API-call count.
- Around line 277-280: The catch block for SQLException currently logs the error
and then returns QuotaCheckResult.OK, which lets callers continue spending on
storage failure; update the exception handler in the method (the one using
LOGGER.errorf and returning QuotaCheckResult.OK) to log the full exception and
return a non-OK result (e.g., QuotaCheckResult.ERROR or a DB-specific enum value
from QuotaCheckResult) or rethrow a checked/runtime exception so callers do not
"fail open" — ensure you reference LOGGER, the caught SQLException, and
QuotaCheckResult in your change so the method stops returning
QuotaCheckResult.OK on write failure.
In `@src/main/java/ai/labs/eddi/modules/llm/impl/AgentOrchestrator.java`:
- Around line 393-408: The LAZY branch in AgentOrchestrator currently returns
only a DiscoverToolsTool (built via ToolSpecifications.toolSpecificationsFrom)
which prevents built-in tool executors from being available later in
executeWithTools; modify the LAZY handling so it still adds the
DiscoverToolsTool to tools but also preserves/injects the actual built-in tool
instances (or their executors) into the returned tools list or into the context
used by executeWithTools (ensure the same collection used by the EDDI runtime
contains both the DiscoverToolsTool and the original built-in tool objects),
keeping the existing use of DiscoverToolsTool and maxToolsInContext and ensuring
executeWithTools can locate and invoke those built-in tools by name.
In `@src/main/java/ai/labs/eddi/modules/llm/model/LlmConfiguration.java`:
- Around line 136-157: Replace the mutable tool-loading fields in
LlmConfiguration with an immutable record: create a record
ToolLoadingConfig(ToolLoadingStrategy toolLoadingStrategy, int
maxToolsInContext) with defaults (EAGER, 20) and add a final field
toolLoadingConfig of that record type to LlmConfiguration; remove the mutable
fields toolLoadingStrategy and maxToolsInContext and any setters, update any
getters/usages to read from toolLoadingConfig.toolLoadingStrategy() and
toolLoadingConfig.maxToolsInContext(), and apply the same replacement for the
second occurrence of these fields in the file so all new tool-loading config is
stored immutably.
In `@src/main/java/ai/labs/eddi/modules/llm/tools/impl/DiscoverToolsTool.java`:
- Around line 71-83: The discoverTools method in DiscoverToolsTool builds JSON
by string concatenation (see StringBuilder sb, matches loop, ToolSpecification
spec and the earlier early-return) which fails to escape fields like name and
other JSON-sensitive characters; replace the manual assembly with a proper JSON
serialization: create a simple DTO (e.g., ToolsResponse with fields tools (list
of ToolDTO{name, description}), message, totalAvailable) or build a Map/List
structure and call your JSON serializer (Jackson's ObjectMapper or your
project's equivalent) to serialize the response instead of using sb.append;
ensure ToolSpecification.name() and description() are assigned to the DTO and
let the serializer handle escaping and formatting.
In
`@src/test/java/ai/labs/eddi/engine/internal/GroupConversationServiceTest.java`:
- Around line 68-69: The tests currently instantiate GroupConversationService
with null AgentSigningService, IAgentStore, and NonceCacheService which disables
signing/verification paths; update GroupConversationServiceTest to add two
focused setups: (1) a "signed happy path" where you create mocks for
AgentSigningService, IAgentStore, and NonceCacheService (e.g., using Mockito),
inject them into the GroupConversationService constructor call, stub
AgentSigningService.sign(...) to return a valid signature and
IAgentStore.lookup/resolve to return the expected agent identity and public key,
and stub NonceCacheService to accept nonces so requirePeerVerification logic
runs; and (2) a "verification failure" setup where
AgentSigningService.verify(...) (or the equivalent verify method used by
GroupConversationService) is stubbed to fail or IAgentStore returns a
missing/incorrect key and NonceCacheService flags replay, then assert the
service rejects the message. Ensure you reference the GroupConversationService
constructor call in the test and the methods AgentSigningService.sign/verify,
IAgentStore.lookup (or resolve), and NonceCacheService.validate/record when
wiring the mocks.
---
Outside diff comments:
In `@src/main/java/ai/labs/eddi/configs/agents/rest/RestAgentStore.java`:
- Around line 317-327: The duplicateAgent() flow must not copy vault-backed
private key state into the new agent; when cloning, detect the source
ConfigIdentity via config.getIdentity() and either clear its crypto fields
(remove publicKey, clear getKeys()/rotated keys and any vault references) on the
cloned config or generate and provision a fresh keypair into the duplicate
before enabling signInterAgentMessages or requirePeerVerification; update
duplicateAgent() to sanitize the cloned identity (or call the key-generation
provisioning path) so the subsequent validation around signInterAgentMessages
and requirePeerVerification cannot pass with only public-key material copied.
---
Nitpick comments:
In `@src/main/java/ai/labs/eddi/modules/llm/impl/AgentOrchestrator.java`:
- Around line 406-412: The two logging statements in AgentOrchestrator (the
LOGGER.infof(...) for LAZY tool loading referencing allSpecs and the
LOGGER.info(...) returning allTools) must include sanitized conversation and
agent identifiers; update both to use the JBoss LOGGER with formatted messages
(e.g., LOGGER.infof) that inject a sanitizedConversationId and sanitizedAgentId
(or the available conversationId/agentId getters) so logs read like
"conversation=%s agent=%s ..."; ensure any helper used to sanitize IDs is
applied before logging and keep log levels (INFO for important events, DEBUG for
verbose) consistent with guidelines.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: cb07f773-80a1-4901-90f3-afe42790d329
📒 Files selected for processing (26)
docs/architecture.mddocs/changelog.mdplanning/manager-ui-handoff.mdsrc/main/java/ai/labs/eddi/configs/agents/AgentSigningService.javasrc/main/java/ai/labs/eddi/configs/agents/crypto/JacksonCanonicalizer.javasrc/main/java/ai/labs/eddi/configs/agents/crypto/NonceCacheService.javasrc/main/java/ai/labs/eddi/configs/agents/crypto/SignedEnvelope.javasrc/main/java/ai/labs/eddi/configs/agents/model/AgentConfiguration.javasrc/main/java/ai/labs/eddi/configs/agents/rest/RestAgentStore.javasrc/main/java/ai/labs/eddi/configs/groups/model/GroupConversation.javasrc/main/java/ai/labs/eddi/datastore/DataStoreProducers.javasrc/main/java/ai/labs/eddi/engine/internal/GroupConversationService.javasrc/main/java/ai/labs/eddi/engine/tenancy/MongoTenantQuotaStore.javasrc/main/java/ai/labs/eddi/engine/tenancy/PostgresTenantQuotaStore.javasrc/main/java/ai/labs/eddi/modules/llm/impl/AgentOrchestrator.javasrc/main/java/ai/labs/eddi/modules/llm/model/LlmConfiguration.javasrc/main/java/ai/labs/eddi/modules/llm/tools/impl/DiscoverToolsTool.javasrc/test/java/ai/labs/eddi/configs/agents/crypto/JacksonCanonicalizerTest.javasrc/test/java/ai/labs/eddi/configs/agents/crypto/NonceCacheServiceTest.javasrc/test/java/ai/labs/eddi/configs/agents/crypto/SignedEnvelopeTest.javasrc/test/java/ai/labs/eddi/configs/agents/model/AgentConfigurationTest.javasrc/test/java/ai/labs/eddi/configs/agents/model/SessionManagementTest.javasrc/test/java/ai/labs/eddi/configs/agents/rest/RestAgentStoreTest.javasrc/test/java/ai/labs/eddi/configs/groups/model/GroupConversationTest.javasrc/test/java/ai/labs/eddi/engine/internal/GroupConversationServiceTest.javasrc/test/java/ai/labs/eddi/modules/llm/tools/impl/DiscoverToolsToolTest.java
💤 Files with no reviewable changes (1)
- src/test/java/ai/labs/eddi/configs/agents/model/AgentConfigurationTest.java
- NonceCacheService: atomic putIfAbsent replay detection + null guard - AgentSigningService: key version validation + signEnvelope cache - PostgresTenantQuotaStore: fail-closed cost accounting + ensureSchema - GroupConversationService: incremental peer verification (O(N) amortized) + public key cache + sanitizeForLog + FQN -> proper imports - AgentOrchestrator: fix LAZY tool activation (register all executors, present discover_tools initially, activate after discovery) - DiscoverToolsTool: Jackson ObjectMapper for JSON serialization - JacksonCanonicalizer: rename canonicalize(Object) -> canonicalizeObject - MongoTenantQuotaStore: sanitizeForLog for tenantId - Changelog: fix inaccurate Item 1/2 descriptions, add PR entry - Tests: stub putIfAbsent in NonceCacheServiceTest, update JSON format assertions in DiscoverToolsToolTest
There was a problem hiding this comment.
Actionable comments posted: 7
♻️ Duplicate comments (1)
src/main/java/ai/labs/eddi/engine/tenancy/PostgresTenantQuotaStore.java (1)
214-236:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winThe fallback UPSERT still re-opens exhausted quotas.
When the row is already in the current day/minute and the counter equals
limit, the firstUPDATE ... < ?returns no row, then the fallbackINSERT ... ON CONFLICTreturns the unchanged existing count. Because the code accepts<= limit, every request after the cap is reached is admitted without incrementing either counter.Suggested fix
- ON CONFLICT (tenant_id) DO UPDATE SET - conversations_today = CASE WHEN tenant_usage.day_start < ? THEN 1 ELSE tenant_usage.conversations_today END, - day_start = CASE WHEN tenant_usage.day_start < ? THEN ? ELSE tenant_usage.day_start END + ON CONFLICT (tenant_id) DO UPDATE SET + conversations_today = 1, + day_start = EXCLUDED.day_start + WHERE tenant_usage.day_start < EXCLUDED.day_start RETURNING conversations_today """)) { @@ - if (rs.next() && rs.getInt(1) <= limit) { + if (rs.next()) { return QuotaCheckResult.OK; }Apply the same pattern to the API-call UPSERT using
minute_start.Also applies to: 272-294
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/main/java/ai/labs/eddi/engine/tenancy/PostgresTenantQuotaStore.java` around lines 214 - 236, The UPSERT in PostgresTenantQuotaStore currently leaves api_calls_this_minute unchanged when minute_start is current and the counter equals limit, allowing further requests; update the INSERT ... ON CONFLICT block that writes/returns api_calls_this_minute to mirror the conversations_today pattern: make api_calls_this_minute use a CASE that resets to 1 when tenant_usage.minute_start < ? else increments (tenant_usage.api_calls_this_minute + 1), and update minute_start with a matching CASE that sets to ? when stale else preserves tenant_usage.minute_start; adjust PreparedStatement parameter order to supply the minute threshold and new minuteStart value in the same positions used in the CASE expressions so the returned api_calls_this_minute correctly reflects the incremented value for limit comparison in the method of PostgresTenantQuotaStore.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/changelog.md`:
- Line 7: Update the changelog heading "## 🛠️ PR Feedback Remediation —
Production Hardening (2026-05-17)" to use a non-future date or omit the date
entirely; replace "(2026-05-17)" with the actual merge/release date or remove
the parenthetical date so the heading becomes "## 🛠️ PR Feedback Remediation —
Production Hardening". Ensure the edited line in docs/changelog.md reflects this
change.
In `@src/main/java/ai/labs/eddi/configs/agents/AgentSigningService.java`:
- Around line 216-218: The deletion logic currently only removes the legacy
unversioned secret while new code persists versioned keys via
vaultKeyNameVersioned(agentId, version); update the agent deletion path (the
method that removes agent keys and any cache entries) to also remove all
matching versioned secrets for that agent (e.g., call the Vault delete for
VAULT_KEY_PREFIX + agentId + ":v" + N for all persisted versions or use a
prefix/list-delete approach), ensure you remove any cached entries keyed by the
versioned name, and apply the same change where keys are cleaned up in the other
deletion locations referenced (the blocks around where versioned secrets are
created and where deletion currently occurs).
- Around line 258-259: Evict the versioned cache entry as well as the
unversioned one: when invalidating cached private keys in AgentSigningService,
call privateKeyCache.remove with the versioned cache key built from
cacheKey(tenantId, agentId) plus the key version suffix (e.g., append ";v=" +
keyVersion) in addition to removing cacheKey(tenantId, agentId); update the
invalidation site(s) near the existing privateKeyCache.remove(cacheKey(tenantId,
agentId)) and the code paths that populate the cache in the private key loading
logic (where entries are stored under cacheKey(...)+ ";v=" + keyVersion) so
regenerated version N is evicted and the new private key is used immediately.
In `@src/main/java/ai/labs/eddi/configs/agents/crypto/JacksonCanonicalizer.java`:
- Around line 34-35: The ObjectMapper MAPPER in JacksonCanonicalizer is not
configured to reject duplicate JSON object keys, so readTree(...) will silently
accept duplicates; update the MAPPER initialization to enable
JsonParser.Feature.STRICT_DUPLICATE_DETECTION (e.g., via
MAPPER.getFactory().enable(JsonParser.Feature.STRICT_DUPLICATE_DETECTION) or
ObjectMapper.configure/enable) so any duplicate keys cause a parsing error and
canonicalization fails closed; ensure the change is applied where MAPPER is
declared and used by the readTree(...) calls.
In `@src/main/java/ai/labs/eddi/engine/tenancy/PostgresTenantQuotaStore.java`:
- Around line 349-363: getUsage currently returns the stored counters via
toSnapshot(tenantId, rs) even when the stored window timestamps are stale; this
causes callers to see expired minute/day/month values until the next write.
Update getUsage (and the similar block around toSnapshot at lines 412-422) to
normalize expired windows the same way getMonthlyCost(...) does: after
constructing the UsageSnapshot from toSnapshot, check
day_start/minute_start/cost_month against current time and zero the
corresponding counters and reset the start timestamps when they are stale, then
return the normalized UsageSnapshot; reference the getUsage method,
toSnapshot(String, ResultSet), and getMonthlyCost(...) for the normalization
logic and ensure UsageSnapshot.empty(tenantId) remains the fallback on SQL
exceptions.
- Around line 170-178: The deletion of tenant_quotas and tenant_usage in
PostgresTenantQuotaStore currently runs as two separate statements so the first
can commit before the second runs; change the logic that opens Connection conn
to run both PreparedStatement executions within a single database transaction by
setting conn.setAutoCommit(false) before executing both DELETEs, calling
conn.commit() after both succeed, and performing conn.rollback() in the
exception path (and restoring auto-commit in finally) so both deletes succeed or
neither does; reference the existing Connection conn, PreparedStatement usage
for the "DELETE FROM tenant_quotas WHERE tenant_id = ?" and "DELETE FROM
tenant_usage WHERE tenant_id = ?" statements and the tenantId parameter.
---
Duplicate comments:
In `@src/main/java/ai/labs/eddi/engine/tenancy/PostgresTenantQuotaStore.java`:
- Around line 214-236: The UPSERT in PostgresTenantQuotaStore currently leaves
api_calls_this_minute unchanged when minute_start is current and the counter
equals limit, allowing further requests; update the INSERT ... ON CONFLICT block
that writes/returns api_calls_this_minute to mirror the conversations_today
pattern: make api_calls_this_minute use a CASE that resets to 1 when
tenant_usage.minute_start < ? else increments
(tenant_usage.api_calls_this_minute + 1), and update minute_start with a
matching CASE that sets to ? when stale else preserves
tenant_usage.minute_start; adjust PreparedStatement parameter order to supply
the minute threshold and new minuteStart value in the same positions used in the
CASE expressions so the returned api_calls_this_minute correctly reflects the
incremented value for limit comparison in the method of
PostgresTenantQuotaStore.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 79a47200-9688-4d1f-8367-74cd8efef413
📒 Files selected for processing (12)
docs/changelog.mdsrc/main/java/ai/labs/eddi/configs/agents/AgentSigningService.javasrc/main/java/ai/labs/eddi/configs/agents/crypto/JacksonCanonicalizer.javasrc/main/java/ai/labs/eddi/configs/agents/crypto/NonceCacheService.javasrc/main/java/ai/labs/eddi/configs/agents/crypto/SignedEnvelope.javasrc/main/java/ai/labs/eddi/engine/internal/GroupConversationService.javasrc/main/java/ai/labs/eddi/engine/tenancy/MongoTenantQuotaStore.javasrc/main/java/ai/labs/eddi/engine/tenancy/PostgresTenantQuotaStore.javasrc/main/java/ai/labs/eddi/modules/llm/impl/AgentOrchestrator.javasrc/main/java/ai/labs/eddi/modules/llm/tools/impl/DiscoverToolsTool.javasrc/test/java/ai/labs/eddi/configs/agents/crypto/NonceCacheServiceTest.javasrc/test/java/ai/labs/eddi/modules/llm/tools/impl/DiscoverToolsToolTest.java
🚧 Files skipped from review as they are similar to previous changes (7)
- src/test/java/ai/labs/eddi/modules/llm/tools/impl/DiscoverToolsToolTest.java
- src/test/java/ai/labs/eddi/configs/agents/crypto/NonceCacheServiceTest.java
- src/main/java/ai/labs/eddi/configs/agents/crypto/NonceCacheService.java
- src/main/java/ai/labs/eddi/engine/tenancy/MongoTenantQuotaStore.java
- src/main/java/ai/labs/eddi/engine/internal/GroupConversationService.java
- src/main/java/ai/labs/eddi/modules/llm/tools/impl/DiscoverToolsTool.java
- src/main/java/ai/labs/eddi/configs/agents/crypto/SignedEnvelope.java
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 26 out of 26 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
src/main/java/ai/labs/eddi/engine/tenancy/MongoTenantQuotaStore.java:208
- In
tryAddCost, when the existing document has a stalecostMonth(different from the current month), the firstfindOneAndUpdatefilter (tenantId == X AND costMonth == monthKey) does not match. Becauseupsert=true, Mongo will insert a new document for the sametenantIdrather than updating the existing stale-month row. The fallback "reset" branch below then runs on the already-newly-created duplicate document. Without a unique index ontenantId(see related comment), this both duplicates rows and double-charges the cost. The filter should match ontenantIdonly, with the month transition handled inside the update (e.g., using$cond/aggregation pipeline update or by reading-then-writing).
Document result = usage.findOneAndUpdate(
Filters.and(
Filters.eq("tenantId", tenantId),
Filters.eq("costMonth", monthKey)),
Updates.combine(
Updates.inc("monthlyCostUsd", cost),
Updates.setOnInsert("tenantId", tenantId),
Updates.setOnInsert("costMonth", monthKey)),
new FindOneAndUpdateOptions().upsert(true).returnDocument(ReturnDocument.AFTER));
if (result == null) {
// Stale month — reset
usage.findOneAndUpdate(
Filters.eq("tenantId", tenantId),
Updates.combine(
Updates.set("monthlyCostUsd", cost),
Updates.set("costMonth", monthKey)),
new FindOneAndUpdateOptions().upsert(true));
return QuotaCheckResult.OK;
}
…ed sanitization - DiscoverToolsTool: add @Vetoed to prevent Quarkus CDI auto-discovery (fixes build failure: unsatisfied List<ToolSpecification> + int deps) - JacksonCanonicalizer: StreamReadFeature.STRICT_DUPLICATE_DETECTION (prevents collision attacks, fixes Jackson 2.21 compat) - LogSanitizer: centralize sanitizeForLog into LogSanitizer.sanitize(), add Unicode line separator (U+2028/U+2029) handling - GroupConversationService, MongoTenantQuotaStore, PostgresTenantQuotaStore: replace per-file sanitizeForLog with LogSanitizer.sanitize() - PostgresTenantQuotaStore: wrap deleteQuota in transaction, sanitize e.getMessage() in all log calls - AgentSigningService: version-specific cache eviction, versioned key deletion in deleteKeyPair, MAX_KEY_VERSION_SCAN=100 - MongoTenantQuotaStore: unique index on tenantId for both collections - AgentOrchestrator: fix -1 external count log message - Tests: duplicate key detection test, Unicode separator tests - Changelog: update PR entry with CI follow-up fixes
Timezone abbreviations (EST/EDT/JST/UTC) vary by JVM and CLDR data version — Java 25 may produce 'ET' instead of 'EST'/'EDT'. Replace fragile abbreviation checks with regex that validates the formatted datetime structure (yyyy-MM-dd HH:mm:ss <tz>) regardless of how the timezone suffix renders.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/main/java/ai/labs/eddi/configs/agents/AgentSigningService.java`:
- Around line 212-221: The current loop over MAX_KEY_VERSION_SCAN breaks on any
Exception which can prematurely stop deletion; change the control flow so only a
specific "not found" exception causes break while other exceptions are handled
without breaking the loop: invoke SecretReference/vaultKeyNameVersioned and call
secretProvider.delete as before, but replace the single catch(Exception) with a
catch for the secret-store's NotFound/ResourceNotFound exception (or detect
not-found via exception type/message) and break there, and add a separate
catch(Exception e) that logs the transient error (including e) and continues
scanning; also ensure privateKeyCache.remove(cacheKey(tenantId, agentId) + ";v="
+ v) runs only after a successful delete.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9de481e4-25e1-4a94-881b-536b142213bd
📒 Files selected for processing (12)
docs/changelog.mdsrc/main/java/ai/labs/eddi/configs/agents/AgentSigningService.javasrc/main/java/ai/labs/eddi/configs/agents/crypto/JacksonCanonicalizer.javasrc/main/java/ai/labs/eddi/engine/internal/GroupConversationService.javasrc/main/java/ai/labs/eddi/engine/tenancy/MongoTenantQuotaStore.javasrc/main/java/ai/labs/eddi/engine/tenancy/PostgresTenantQuotaStore.javasrc/main/java/ai/labs/eddi/modules/llm/impl/AgentOrchestrator.javasrc/main/java/ai/labs/eddi/modules/llm/tools/impl/DiscoverToolsTool.javasrc/main/java/ai/labs/eddi/utils/LogSanitizer.javasrc/test/java/ai/labs/eddi/configs/agents/crypto/JacksonCanonicalizerTest.javasrc/test/java/ai/labs/eddi/modules/llm/tools/impl/DateTimeToolTest.javasrc/test/java/ai/labs/eddi/utils/LogSanitizerTest.java
✅ Files skipped from review due to trivial changes (1)
- docs/changelog.md
🚧 Files skipped from review as they are similar to previous changes (7)
- src/test/java/ai/labs/eddi/configs/agents/crypto/JacksonCanonicalizerTest.java
- src/main/java/ai/labs/eddi/engine/internal/GroupConversationService.java
- src/main/java/ai/labs/eddi/modules/llm/tools/impl/DiscoverToolsTool.java
- src/main/java/ai/labs/eddi/configs/agents/crypto/JacksonCanonicalizer.java
- src/main/java/ai/labs/eddi/engine/tenancy/MongoTenantQuotaStore.java
- src/main/java/ai/labs/eddi/modules/llm/impl/AgentOrchestrator.java
- src/main/java/ai/labs/eddi/engine/tenancy/PostgresTenantQuotaStore.java
| for (int v = 1; v <= MAX_KEY_VERSION_SCAN; v++) { | ||
| try { | ||
| SecretReference vRef = new SecretReference(tenantId, vaultKeyNameVersioned(agentId, v)); | ||
| secretProvider.delete(vRef); | ||
| privateKeyCache.remove(cacheKey(tenantId, agentId) + ";v=" + v); | ||
| } catch (Exception ignored) { | ||
| // Version doesn't exist — stop scanning | ||
| break; | ||
| } | ||
| } |
There was a problem hiding this comment.
Versioned key deletion stops scanning on any exception, not just "not found".
The loop breaks on the first exception, which could leave orphaned keys if:
- Versions are non-contiguous (e.g., v1 was deleted earlier, but v2-v5 exist)
- A transient error occurs (network blip, timeout)
Consider continuing the scan instead of breaking, or at least only breaking on specific "not found" exceptions.
🔧 Proposed fix to continue scanning
// Delete versioned keys (scan reasonable range)
for (int v = 1; v <= MAX_KEY_VERSION_SCAN; v++) {
try {
SecretReference vRef = new SecretReference(tenantId, vaultKeyNameVersioned(agentId, v));
secretProvider.delete(vRef);
privateKeyCache.remove(cacheKey(tenantId, agentId) + ";v=" + v);
- } catch (Exception ignored) {
- // Version doesn't exist — stop scanning
- break;
+ } catch (ISecretProvider.SecretNotFoundException ignored) {
+ // Version doesn't exist — continue scanning in case of gaps
+ } catch (Exception e) {
+ LOGGER.debugf("Failed to delete versioned key v%d for agent '%s': %s", v, agentId, e.getMessage());
}
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/main/java/ai/labs/eddi/configs/agents/AgentSigningService.java` around
lines 212 - 221, The current loop over MAX_KEY_VERSION_SCAN breaks on any
Exception which can prematurely stop deletion; change the control flow so only a
specific "not found" exception causes break while other exceptions are handled
without breaking the loop: invoke SecretReference/vaultKeyNameVersioned and call
secretProvider.delete as before, but replace the single catch(Exception) with a
catch for the secret-store's NotFound/ResourceNotFound exception (or detect
not-found via exception type/message) and break there, and add a separate
catch(Exception e) that logs the transient error (including e) and continues
scanning; also ensure privateKeyCache.remove(cacheKey(tenantId, agentId) + ";v="
+ v) runs only after a successful delete.
This pull request delivers a comprehensive security and feature hardening update, closing key feature gaps and implementing robust cryptographic identity and signing infrastructure. It resolves multiple previously unimplemented or partially implemented features, especially around agent message signing, peer verification, and session management, while also cleaning up dead configuration fields and improving documentation and test coverage.
The most important changes are:
Cryptographic Identity & Signing Infrastructure:
AgentSigningService.java,SignedEnvelope.java,NonceCacheService.java,GroupConversationService.java,docs/architecture.md) [1] [2] [3] [4]Configuration & UI/UX Cleanup:
signMcpInvocations,forkingEnabled,maxForksPerConversation) and updated validation: only active, implemented flags can be enabled. UI/UX and docs now reflect that message signing and peer verification are fully operational and require a valid keypair. Session forking fields are removed until implemented. (planning/manager-ui-handoff.md,RestAgentStore.java,AgentConfiguration.java) [1] [2] [3] [4] [5] [6] [7] [8]Feature Gap Remediation:
DiscoverToolsTool), cryptographic envelope infrastructure, and dual-backend tenant quota persistence. Validation logic now prevents enabling features that aren’t implemented. (DiscoverToolsTool.java,LlmConfiguration.java,AgentOrchestrator.java,MongoTenantQuotaStore.java,PostgresTenantQuotaStore.java,DataStoreProducers.java)Documentation & Test Improvements:
docs/architecture.md,planning/manager-ui-handoff.md, various test files)Defensive Coding & Null Guards:
AgentSigningService.java,GroupConversationService.java)These changes collectively bring the system’s security and feature set in line with its documentation, ensuring all advertised features are implemented, robust, and clearly communicated to users and developers.
Summary by CodeRabbit
New Features
Removals
Improvements
Documentation