fix(retention): narrow agent-owned dir match to path-boundary prefix#221
fix(retention): narrow agent-owned dir match to path-boundary prefix#221antfleet-ops wants to merge 1 commit into
Conversation
The previous `startswith(encoded_root)` allowed sibling cwds whose encoded form continues with a non-separator character (e.g. `/data/workspaces2` encoding to `-data-workspaces2`) to be classified as agent-owned and silently pruned. The module's docstring promises to never delete operator sessions. Tighten the check to require either an exact match against the encoded root or a path-boundary prefix (`encoded_root + '-'`) — the boundary character comes from the original `/` separator and is the only signal the encoding preserves that the next component is a descendant rather than a sibling. Adds regression tests for the previously-permissive cases (`-data-workspaces2`, `-data-workspacesBackup`) and a docstring note explaining the residual ambiguity for sibling roots that themselves contain `-` (e.g. `/data/workspaces-old/project`). A complete fix for that case would require a spawn-time marker file inside the agent's project directory — left as follow-up. Found via AntFleet two-model bench review (Claude Opus 4.7 + GPT-5): AntFleet/bench-roboco#2 (comment)
|
I have read the CLA Document and I hereby sign the CLA You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot. |
|
Thanks for opening your first pull request on RoboCo! Quick checklist before review (most of these are enforced by CI, but worth a glance):
See CONTRIBUTING.md for the full workflow and the Code of Conduct for the community standards we follow. Welcome aboard — a maintainer will review shortly. |
rennf93
left a comment
There was a problem hiding this comment.
Change Request for external PR #221
Summary
The PR narrows is_agent_owned_dir to prevent false-positive matches against sibling workspace roots (e.g. /data/workspaces2 encoded as -data-workspaces2) when the canonical root is /data/workspaces (encoded -data-workspaces). The prior startswith was too permissive. The fix uses an exact match or a path-boundary prefix (encoded + "-").
Findings (per criterion / concern)
-
Correctness — logic change is sound and directly addresses the stated bug
- File:
roboco/runtime/transcript_retention.py - Location:
is_agent_owned_dir, replacement of the final return (hunk @@ -26,15 +26,24 @@) - Before:
return bool(encoded_root) and dir_name.startswith(encoded_root) - After:
if not encoded_root: return False; return dir_name == encoded_root or dir_name.startswith(encoded_root + "-") - Expected: only the exact encoded root or descendants (next component separated by
-) are agent-owned. - Actual: correctly rejects
-data-workspaces2,-data-workspaces2-project,-data-workspacesBackup-project. - Evidence in diff: new tests assert these cases.
- File:
-
Test coverage — new tests are targeted and exercise the regression
- File:
tests/unit/runtime/test_transcript_retention.py - Added:
test_is_agent_owned_dir_rejects_non_separator_sibling_prefixes()(asserts exact root matches, sibling non-separator continuations do not) - Added:
test_select_ignores_non_separator_sibling_prefix_dirs()(end-to-end viaselect_prunable_transcripts) - These cover the false-positive class the commit message claims to fix.
- File:
-
Contract / documentation gap — acknowledged ambiguity lacks an explicit test or docstring entry
- File:
tests/unit/runtime/test_transcript_retention.py - Location: the NOTE comment inside the new
test_is_agent_owned_dir_rejects_non_separator_sibling_prefixes - The comment correctly states: "-data-workspaces-old-project is fundamentally ambiguous — it could encode either /data/workspaces/old-project (descendant) OR /data/workspaces-old/project (sibling)."
- Expected: the limitation should be (a) exercised by at least one assertion (e.g.,
assert is_agent_owned_dir("-data-workspaces-old-project", WORKSPACES) in {True, False}plus comment) OR (b) documented in the function docstring so callers understand the heuristic is lossy for sibling roots containing-. - Actual: only an inline NOTE; no test asserts behavior and the module-level docstring for
is_agent_owned_diris not updated to mention the ambiguity.
- File:
-
Edge cases around empty / falsy workspaces_root
- File:
roboco/runtime/transcript_retention.py - The added
if not encoded_root: return Falseis behavior-preserving relative to the priorbool(encoded_root) and ...expression (both yield False when root is empty or "/"). - The short-circuit for
dir_name == "-app"remains first and is unaffected. No defect, but worth a one-line test if not already present for("", "/data")or similar.
- File:
-
Security / data-safety impact
- This is a safety fix: reduces risk that operator-owned transcripts under sibling-named workspace roots are misclassified as agent-owned and pruned by retention.
- No new injection, secret, or supply-chain surface (pure deterministic string matching on derived directory names).
- No dependency changes.
-
Scope and style
- Diff touches only the predicate and its unit tests. No unrelated refactors.
- Commit title
fix(retention): ...follows conventional format. - No AGPL license or secret issues visible in the diff.
Recommendation
REQUEST_CHANGES. The core fix is correct and the added tests are good, but the acknowledged encoding ambiguity should be explicitly tested (or at minimum asserted with a comment) and preferably noted in the function's docstring so the contract and limitations are clear to future callers and maintainers.
…227) is_agent_owned_dir used a bare startswith on the encoded workspaces root, so a sibling root that shares the prefix (e.g. -data-workspaces2 vs -data-workspaces) was misclassified as agent-owned and its operator transcripts pruned. Require an exact match or a '-' path-boundary prefix. Supersedes #221 (the same fix from antfleet-ops, who flagged the over-match); re-done here as our own change so it can land without the contributor CLA. Co-authored-by: Renn F <rennf93@users.noreply.github.com>
|
Superseded by #227 (merged to master): same path-boundary fix to is_agent_owned_dir, re-done as our own change so it could land without the contributor CLA. Thanks for flagging the over-match. |
Summary
is_agent_owned_dirusesdir_name.startswith(encoded_root)to decide whether a~/.claude/projects/<encoded-cwd>/subdir belongs to an agent and is safe to prune. That raw prefix check matches sibling cwds whose encoded name happens to continue with a non-separator character — e.g. withworkspaces_root = "/data/workspaces", encoded-data-workspacesmatches-data-workspaces2,-data-workspacesBackup, etc. The module's docstring promises to never delete operator sessions, so those false positives result in silent data loss.What changed
Tighten the match to require either an exact match against the encoded root or a path-boundary prefix (
encoded_root + "-"). The boundary character comes from the original/separator and is the only signal the encoding preserves that the next component is a true descendant rather than a sibling.Residual ambiguity (not fixed by this PR)
The Claude Code encoding
path.replace("/", "-")is lossy:/data/workspaces/old-projectand/data/workspaces-old/projectboth encode to-data-workspaces-old-project. From the encoded dir name alone these are indistinguishable, so any sibling root whose own name contains-(e.g./data/workspaces-old) can still produce a name that matches the boundary check. The PR's docstring acknowledges this; the cleanest complete fix would be a marker file written by the agent at spawn time, scoped to a follow-up PR.Tests
test_is_agent_owned_dir_rejects_non_separator_sibling_prefixes(-data-workspaces2,-data-workspacesBackup)test_select_ignores_non_separator_sibling_prefix_dirs(end-to-end throughselect_prunable_transcripts)How it was found
AntFleet's two-model security review (Claude Opus 4.7 + GPT-5) on a mirror of this repo. Both models independently flagged the same defect.