Skip to content

Agent server audit fixes (+ bundled diary/desktop WIP)#58

Open
G9000 wants to merge 12 commits into
mainfrom
agent-server-improvements
Open

Agent server audit fixes (+ bundled diary/desktop WIP)#58
G9000 wants to merge 12 commits into
mainfrom
agent-server-improvements

Conversation

@G9000

@G9000 G9000 commented Jun 17, 2026

Copy link
Copy Markdown
Owner

Summary

The primary work here is a three-phase audit-driven overhaul of the agent server (apps/server/src/anima_server/services/agent/), all tested. It also bundles pre-existing in-progress feature work (daily diary, desktop today/mood/journal, api-client) that was already uncommitted in the tree — per request, everything except personal interview notes is included.

Agent server changes (the audit work)

Reliability

  • Turn-setup / result-persist failures now mark the run failed and evict the orphaned user message instead of leaving a zombie running run that replays as unanswered history.
  • Context-overflow retry no longer re-executes already-run tools (only retries before any tool side effect).
  • End-to-end run cancellation: run row committed early so the cancel endpoint sees in-flight runs (also releases the thread-row lock held across the LLM call); run_started event carries the run id; WebSocket cancel handler implemented; cancel events race-safe.
  • Stream inactivity timeout so a stalled LLM stream can't pin the thread lock for ~10 min; raw exception text masked from clients; previously-silent except blocks now log.

Response quality

  • Prompt budget derived from the model's context window (was a fixed 24k chars); boundary-aware truncation (no mid-fact cuts); recency+heat blend in automatic retrieval; importance floor in heat scoring so important memories don't decay into invisibility; cross-block dedup; embedding-free dedup for paraphrased free-form claims.

Performance

  • Soul Writer LLM work deferred off the pre-turn path (TTFT); knowledge-graph block reuses the turn's existing query embedding (drops a blocking thread.join); post-turn compaction moved to a background task; static identity blocks cached via the companion version counter while volatile/query-ranked blocks rebuild per turn; non-blocking mod-tools fetch with negative cache.

Refactor / cleanup

  • New llm_json.py (call_llm_for_json/call_llm_for_text) with 6 call sites migrated; deleted dead predict_calibrate.py, streaming_utils.py, and consolidate_pending_ops().
  • Findings, fixes, and a self-review pass documented in docs/audits/2026-06-11-agent-server-audit.md.

An adversarial self-review of the diff caught and fixed three regressions before this PR (Stage-3 zombie run, stream-timeout generator leak, claim-dedup nondeterminism).

Tests: full server suite green — 1452 passed, 1 skipped.

Bundled pre-existing WIP (not part of the audit, included on request)

  • Daily diary feature: server routes/schemas/services + alembic migration + design docs.
  • Desktop: today/mood panel, journal, appearance settings, nav.
  • api-client updates.

Note: my audit edits were layered on top of pre-existing uncommitted edits in the same agent files, so per-file diffs may include unrelated pre-existing changes. Excluded: INTERVIEW_*.md, interview_link.md.

🤖 Generated with Claude Code

G9000 and others added 10 commits May 30, 2026 17:15
Agent server (audit-driven, primary work of this change):
- Reliability: fix zombie "running" runs on turn-setup/persist failure
  (evict orphaned user message, mark run failed); guard context-overflow
  retry against double tool execution; end-to-end run cancellation
  (early run commit, run_started event, WS cancel handler, race-safe
  cancel events); stream inactivity timeout; mask raw exception text to
  clients; log previously-silent excepts.
- Response quality: derive prompt budget from the model context window;
  boundary-aware block truncation; recency+heat blend in automatic
  retrieval; importance floor in heat scoring; cross-block dedup;
  embedding-free claim dedup for paraphrased facts.
- Performance: defer Soul Writer LLM work off the pre-turn path; reuse
  the turn's query embedding in the knowledge-graph block (drop blocking
  thread.join); background post-turn compaction; cache static identity
  blocks via the companion version counter, rebuild volatile/query
  blocks per turn; non-blocking mod-tools fetch with negative cache.
- Refactor/cleanup: add call_llm_for_json/llm_json helper and migrate
  6 modules; delete dead predict_calibrate.py, streaming_utils.py, and
  consolidate_pending_ops(). Audit + status notes in docs/audits.
- Tests for all of the above; full server suite green (1452 passed).

Also bundled (pre-existing in-progress work, not part of the audit):
- Daily diary feature (server routes/schemas/services + migration + docs)
- Desktop today/mood panel, journal, appearance settings
- api-client updates

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0e9a3795a3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread apps/server/src/anima_server/schemas/chat.py Outdated
Comment thread apps/server/src/anima_server/services/agent/tools.py
G9000 and others added 2 commits June 19, 2026 00:41
… context

- tools.py/service.py: when anima-mod tools load via the background fetch
  (cold cache on the event loop at startup), the already-built runner was
  cached without them and never picked them up. Fire a callback on first
  successful background load that invalidates the runner so the next turn
  rebuilds with mod tools. (Codex P2)
- schemas/chat.py: TodayContext rejected any date != server's date.today(),
  so a client a calendar day ahead/behind a differently-zoned server got a
  422 and chat broke. Accept server day +/- 1; still reject clearly stale
  dates and non-ISO input. (Codex P2)
- Tests for both.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Added `useBackground` hook to manage background configurations, including saving and resolving background URLs.
- Introduced `useTheme` hook to handle theme settings, allowing users to toggle between dark, light, and system themes.
- Created `BackgroundConfig` and `Theme` types for better type safety.
- Updated `AppearanceSettings` component to integrate background and theme management, replacing the previous banner functionality.
- Removed legacy banner handling code and associated preferences.
- Enhanced the theme management logic to respond to system theme changes.
- Added new icons for UI enhancements: `ChevronDownIcon` and `ChevronUpIcon`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant