agent: harden runtime reliability#56
Conversation
…n; enhance episode generation and thread closure handling
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e4091d16f5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
Hardens the agent runtime’s correctness under concurrency by tightening per-thread state isolation, ensuring approval re-entry and cancellation are serialized/terminal-safe, and scoping delegated tool results to the originating WebSocket connection. It also aligns prompts/tool binding behavior with strict tool schemas and preserves user-scoped DB routing in background consolidation/episode generation paths.
Changes:
- Isolate companion conversation history per thread and refresh caches explicitly by
thread_id. - Serialize approval resume under the per-thread lock; preserve cancellation as terminal (including task-cancel cleanup).
- Scope delegated tool results by connection, preserve temperature through OpenAI-compatible tool binding, and remove schema-invalid “thinking argument” prompt wording.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| scratchboard/v2-memory-recall-reliability/todo.md | Records Phase 4 runtime reliability work + validation gates. |
| docs/superpowers/plans/2026-05-18-agent-runtime-reliability.md | Adds the implementation plan covering AR-001..AR-006. |
| apps/server/tests/test_p5_transcript_archive.py | Adds coverage ensuring consolidation uses user-scoped soul DB factories. |
| apps/server/tests/test_inner_thought.py | Updates assertions to match prompt wording (no required thinking arg). |
| apps/server/tests/test_client_action_tools.py | Adds regression ensuring duplicate tool_call_ids are scoped by connection. |
| apps/server/tests/test_approval_reentry.py | Adds regression proving approval resume takes the per-thread lock. |
| apps/server/tests/test_agent_system_prompt.py | Asserts prompt no longer references a “thinking argument”. |
| apps/server/tests/test_agent_service.py | Adds regression tests for thread-scoped history + cancellation terminal behavior. |
| apps/server/tests/test_agent_openai_compatible_client.py | Adds regression ensuring bind_tools() preserves temperature. |
| apps/server/tests/test_agent_episodes.py | Adds regression ensuring episode generation uses user-scoped DB factory. |
| apps/server/src/anima_server/services/agent/templates/system_rules.md.j2 | Removes schema-invalid “thinking argument” requirement wording. |
| apps/server/src/anima_server/services/agent/templates/system_prompt.md.j2 | Rewords cognitive loop to avoid requiring schema-absent args. |
| apps/server/src/anima_server/services/agent/service.py | Adds thread lock around approval resume; thread-scoped cache refresh; cancellation cleanup on task cancel. |
| apps/server/src/anima_server/services/agent/persistence.py | Makes finalize_run() refuse to overwrite cancelled/failed runs. |
| apps/server/src/anima_server/services/agent/openai_compatible_client.py | Preserves temperature when binding tools. |
| apps/server/src/anima_server/services/agent/episodes.py | Switches to get_user_session_factory(user_id) for soul DB access. |
| apps/server/src/anima_server/services/agent/eager_consolidation.py | Uses user-scoped soul DB factories per thread in consolidation/sweeps. |
| apps/server/src/anima_server/services/agent/companion.py | Stores conversation windows per thread and returns defensive history copies. |
| apps/server/src/anima_server/services/agent/client_actions.py | Keys pending delegated tool calls by connection identity + call id; validates tool_name on resolve. |
| apps/server/src/anima_server/api/routes/ws.py | Resolves delegated tool results via the authenticated connection context. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 21 out of 21 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
apps/server/src/anima_server/services/agent/client_actions.py:135
ClientActionRegistrynow scopes pending delegated tool calls by(id(conn), tool_call_id), buttool_call_idcan be deterministic (e.g.,AgentRuntimegeneratessynthetic-{tool}-{step_index}-{i}), so two concurrent runs on the same connection can still collide and overwrite each other’s pending entry. This can route a tool_result to the wrong awaiting turn. Consider namespacing the wire tool_call_id withrun_id/thread_id(or generating a server-side UUID and mapping back) so it’s unique per connection even when models reuse IDs.
if conn is None:
raise DelegationTimeout(
f"No connected client has registered action tool {tool_name!r}"
)
loop = asyncio.get_running_loop()
future: asyncio.Future[DelegatedToolResult] = loop.create_future()
pending_key = (id(conn), tool_call_id)
self._pending[pending_key] = _PendingActionCall(
connection=conn,
future=future,
tool_name=tool_name,
)
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Agent-Logs-Url: https://github.com/G9000/animaOS/sessions/34586ef7-68b9-4ffa-bd00-70db910b4620 Co-authored-by: G9000 <11317652+G9000@users.noreply.github.com>
Agent-Logs-Url: https://github.com/G9000/animaOS/sessions/34586ef7-68b9-4ffa-bd00-70db910b4620 Co-authored-by: G9000 <11317652+G9000@users.noreply.github.com>
|
Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
Summary
Test Plan