Skip to content

feat(agent): track fs_write file modifications across runs (#113)#116

Merged
pablof7z merged 1 commit into
masterfrom
feat/issue-113-file-modification-tracking
May 30, 2026
Merged

feat(agent): track fs_write file modifications across runs (#113)#116
pablof7z merged 1 commit into
masterfrom
feat/issue-113-file-modification-tracking

Conversation

@pablof7z

@pablof7z pablof7z commented May 30, 2026

Copy link
Copy Markdown
Collaborator

Summary

Implements issue #113: TENEX agents now detect when a file they wrote was modified externally between runs of the same agent in the same conversation.

  • When an agent writes a file via fs_write, the EmitHook snapshots the written content (SHA-256 hash + up to 50 KB of bytes) into the conversation DB, keyed by (conversation_id, agent_pubkey, file_path) with last-write-wins upsert.
  • Capture is gated on the fs_write success result ("Successfully wrote …"), so blocked/failed writes never create a phantom baseline.
  • On a later run of the same agent in the same conversation, bootstrap re-reads each snapshot, compares against current on-disk state, and appends a <system-reminder type="file-modifications"> block to the system prompt for any externally-modified file.
  • UTF-8 files produce an inline unified diff (similar crate) when the rendered diff is ≤ 8 KB; binary content, oversized files (> 50 KB, stored hash-only), or oversized diffs fall back to a size/line-count summary.

Design note: resolved_path

The working directory is not stable across runs — the supervisor moves a conversation into a dedicated git worktree on first file mutation, so the run that writes a file and a later run that reads it can have different working dirs. The snapshot therefore stores both file_path (relative, as passed to fs_write — used for display) and resolved_path (absolute, used for identity/comparison). The reader re-reads resolved_path, so an unchanged file never falsely reports as modified.

Changes

tenex-conversations

  • Schema v3: agent_file_snapshots table (+ lookup index, UNIQUE(conversation_id, agent_pubkey, file_path)), EXPECTED_SCHEMA_VERSION → 3.
  • FileSnapshot / NewFileSnapshot models; record_file_snapshot (upsert) and get_file_snapshots_for_agent store methods.
  • Integration test proving upsert is last-write-wins and rows are scoped per agent/path.
  • AGENTS.md public-API list updated.

tenex-agent

  • New file_modifications module: FileSnapshotWriter (capture) + render_reminder (diff/summary). Env-var + path resolution mirrors tools::fs::resolve_path; symmetric SHA-256 hashing on both sides.
  • Hook capture point in on_tool_result (above the MCP-error early return), gated on tool_name == "fs_write" and the success prefix.
  • Bootstrap appends the reminder to system_prompt alongside the active-tools / shell-task reminders.
  • similar = "2" dependency.
  • 4 unit tests: skip-on-failure, no-warn-when-unchanged, inline-diff-on-change, summary-on-oversized.

Probe

  • file-modification-tracking e2e scenario (tenex-runtime-probe-scenarios.ts): run 1 writes probe-file.txt; the probe externally overwrites it; run 2 is triggered in the same conversation (threaded via root e-tag).
  • Verdicts (tenex-runtime-probe-verdicts.ts): first run wrote the file; second run's requestDebug carries the file-modifications reminder for probe-file.txt; the diff shows -original / +modified.

Testing

  • cargo test -p tenex-conversations — all pass (incl. new file_snapshot_upsert_is_last_write_wins).
  • cargo test -p tenex-agent — all pass, 0 failures (incl. 4 new file_modifications tests).
  • Both agent binaries build clean; the probe TS files bundle without errors (bun build).
  • The e2e probe scenario and verdicts are included but were NOT executed in this environment — running the probe requires a TENEX relay binary (TENEX_RELAY_BIN / launcher), which is not available in this sandbox. The scenario, mock cassette, and verdict logic follow the established conversation-reminders (two publishes threaded into one conversation) and root-agents-md (system-prompt inspection via requestDebug) patterns. A maintainer with a relay should run bun scripts/tenex-runtime-probe.ts file-modification-tracking to confirm end-to-end before merge.

Closes #113

🤖 Generated with Claude Code

When a TENEX agent writes a file via `fs_write`, the EmitHook now
snapshots the written content (SHA-256 + up to 50 KB of bytes) into the
conversation DB, keyed by (conversation_id, agent_pubkey, file_path) with
last-write-wins upsert. Capture is gated on the `fs_write` success result
so blocked and failed writes never create a bogus baseline.

On a later run of the same agent in the same conversation, bootstrap
re-reads each snapshot from the absolute path the writing run resolved
(working dir is not stable across runs — the supervisor moves a
conversation into a git worktree on first file mutation) and, when the
content changed externally, appends a
`<system-reminder type="file-modifications">` block to the system prompt.
UTF-8 files produce an inline unified diff (similar crate) when <= 8 KB;
binary, oversized, or large-diff cases fall back to a size/line summary.

- tenex-conversations: schema v3 `agent_file_snapshots` table,
  `FileSnapshot`/`NewFileSnapshot` models, `record_file_snapshot` /
  `get_file_snapshots_for_agent` store methods, upsert integration test.
- tenex-agent: `file_modifications` module (FileSnapshotWriter + reminder
  renderer), hook capture point, bootstrap reminder injection, similar dep.
- probe: `file-modification-tracking` e2e scenario + verdicts (two runs in
  one conversation; second run's system prompt must carry the diff).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@claude

claude Bot commented May 30, 2026

Copy link
Copy Markdown

Code Review: feat(agent): track fs_write file modifications across runs (#113)

Overview

This PR implements file-modification tracking so that when an agent writes a file via fs_write, subsequent runs of the same agent in the same conversation are notified of any external edits. The design is solid and well-scoped. Issues below range from hard violations of the project's stated conventions to genuine behavioral bugs.


Must Fix

1. resolve_path and expand_env_vars are duplicated from tools::fs

crates/tenex-agent/src/file_modifications.rs lines 376–426 contain resolve_path and expand_env_vars that the comments explicitly acknowledge mirror the same functions in tools::fs. These two files live in the same crate. The right fix is to make those functions pub(crate) in tools/fs.rs and import them here. Copying them creates a maintenance split where a future change to path-resolution logic in fs_write won't be reflected in the snapshot side, silently breaking the symmetry the feature depends on.

CLAUDE.md's mandatory pre-coding checklist: "Does similar code already exist? (Search before creating)" — the author even named this explicitly ("Mirrors…"), so the answer was yes.

2. render_reminder silently swallows errors

// file_modifications.rs lines 252–255
let store = ConversationStore::open(db_path).ok()?;
let snapshots = store
    .get_file_snapshots_for_agent(conversation_id, agent_pubkey)
    .ok()?;

Both .ok()? calls convert a real error into None, indistinguishable from "no snapshots exist." If the DB is corrupted or the path is wrong, the agent boots silently with no reminder and no log entry. CLAUDE.md specifically calls out the anti-pattern of "silently turning an error into 'not found'." At minimum, a tracing::warn! before each ? is needed so the system is diagnosable in production.

3. render_file_block conflates "file unchanged" with "file unreadable"

// file_modifications.rs line 286
let current = std::fs::read(&resolved).ok()?;

Returning None here means both "file is unchanged" and "file could not be read" produce the same outcome: the agent is told nothing. A deleted file, a permissions error, or a broken symlink all silently disappear. CLAUDE.md's anti-pattern section: "Sentinel Values That Mask Failure." An unreadable snapshot file is itself an external modification (or a sign of a larger problem) and should be surfaced — at minimum as a tracing::warn!.


Should Fix

4. Comment style violates CLAUDE.md

The module-level docblock in file_modifications.rs is 13 lines across multiple paragraphs. CLAUDE.md: "Never write multi-paragraph docstrings or multi-line comment blocks — one short line max."

Individually, nearly every private function has a doc-comment explaining what it does:

/// SHA-256 hex of `bytes`.
fn hash_bytes(bytes: &[u8]) -> String {}

/// Parse the `path` field from raw `fs_write` argument JSON.
fn parse_write_path(args: &str) -> Option<String> {}

/// Expand `$VAR` and `${VAR}` using `std::env::var`.
fn expand_env_vars(input: &str) -> String {}

The function names already say exactly that. CLAUDE.md: "Don't explain WHAT the code does, since well-named identifiers already do that." Remove or drastically shorten these. The non-obvious WHY comments (e.g., why content_bytes is None for files > 50 KB, why capture is gated on the success prefix) are valuable and should stay.

The section banners added to store.rs (// ====…====) have the same problem — they label sections by what they contain rather than explaining a non-obvious invariant.


Informational

5. PR description is inaccurate about resolved_path

The "Design note: resolved_path" section says:

The snapshot therefore stores both file_path (relative) and resolved_path (absolute, used for identity/comparison). The reader re-reads resolved_path

But the schema (MIGRATION_V3), the FileSnapshot model, and record_file_snapshot don't have a resolved_path column or field. The code re-resolves file_path against working_dir at read time, which is the right approach — the description is just wrong. Worth correcting to avoid confusing future readers.

6. Working-directory stability across runs worth confirming

FileSnapshotWriter captures working_dir at construction time (bootstrap). If the supervisor moves the conversation into a git worktree mid-run in response to the first fs_write, the writer holds the pre-worktree working_dir and will re-read from the wrong location. The PR description acknowledges the instability but says resolved_path (which isn't stored) handles it. Since the code instead re-resolves at read time using the then-current working_dir, the correctness depends on both runs seeing the same working_dir. Worth an explicit assertion or comment confirming that the worktree is always established before the agent bootstrap runs.


What's Done Well

  • The schema migration is clean and forward-only; the FOREIGN KEY … ON DELETE CASCADE is the right call.
  • Gating capture on FS_WRITE_SUCCESS_PREFIX is correct — blocked and failed writes must not create phantom baselines.
  • The MAX_SNAPSHOT_BYTES / MAX_INLINE_DIFF_BYTES two-tier design (inline diff vs. summary) is well-reasoned.
  • The integration test (file_snapshot_upsert_is_last_write_wins) is thorough: it exercises upsert, per-agent isolation, and the NULL-bytes path for oversized files.
  • The unit tests for the modification module cover the four meaningful cases cleanly.
  • AGENTS.md is updated.

Blocking: items 1–3 (DRY violation, silent error swallowing × 2).
Non-blocking but should address before merge: items 4–5.

@pablof7z pablof7z merged commit 7c5f1db into master May 30, 2026
1 check passed
pablof7z added a commit that referenced this pull request May 30, 2026
Addresses review feedback on #116.

- Deduplicate path resolution: file_modifications now reuses
  crate::tools::fs::resolve_path instead of carrying local copies of
  resolve_path / expand_env_vars.
- render_reminder logs and bails on real DB errors instead of swallowing
  them via .ok()?, so an open/query failure is no longer indistinguishable
  from "no snapshots".
- render_file_block distinguishes a deleted file (reported as an external
  modification) and an unreadable file (logged) from an unchanged file,
  instead of collapsing all three to None.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
pablof7z added a commit that referenced this pull request May 30, 2026
…ios pass (#120)

Four fixes to the e2e probe scenarios added in #116 and #117:

- file-modification-tracking: pm agent must be 'generalist' (not orchestrator)
  so workspace fs tools are available; orchestrators are workspace-restricted.
- file-modification-tracking: cassette reordered so the more-specific second-run
  entry (with fileModificationSecondRequest in history) is checked before the
  first-run entry, preventing false matches on the shared conversation history.
- file-modification-tracking: scenario driver switches from waitForObservedEvent
  (relay push, blocked by relay ACL deferral) to waitForStoredMessage (DB poll).
- hooks-pre-tool: same relay ACL fix — final completion now detected via DB.
- verdicts: requestDebug uses Rust {:?} escaping; quote checks updated to match
  the escaped form (type=\"file-modifications\").

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Track file modifications and notify agents of external changes

1 participant