feat: add read_interactions + bootstrap_body_state for body-side observation (#20)#21
Conversation
…rvation (#20) Two new MCP tools and one body-prompt rule give the model a read path into its own operational storage: - read_interactions(): chronological filter over interactions/<day>.jsonl via MemoryBackend. Filters: chat_id, sender, action, direction, since, limit. Default window today + yesterday; up to 7 days when since reaches back further. - bootstrap_body_state(): single-packet index of today's counts, top chats, all open promises, and watched-chat cursor freshness. Mirrors persona-sati's bootstrap_session shape. Index only — full content stays in read_interactions. - prompts/anatomy/identity-and-tools.md: pre-outbound-send observe rule scoped to send_teams_message / send_email / send_card / share_file. Read path is purely additive; interaction_log.py write path and the on-disk JSONL schema are unchanged. daily_summary regression check green (38 tests). Closes #20. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds body-side “observation” tooling so the model can read entrabot’s own operational storage (interaction log + a compact bootstrap index) during a turn, reducing unnecessary Graph calls and improving continuity.
Changes:
- Introduces a
read_interactions()implementation that filtersinteractions/<day>.jsonlchronologically with structured filters and a bounded multi-day scan window. - Introduces
bootstrap_body_state()to return a single index packet (counts, top chats, open promises, cursor freshness, watched chat count) suitable for session-start context. - Exposes both tools via FastMCP and documents the outbound pre-send “observe” discipline in the body prompt.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/tools/test_read_interactions.py | Adds unit tests for interaction-log filtering behavior and edge cases. |
| tests/tools/test_body_bootstrap.py | Adds unit tests for bootstrap index packet contents and invariants (index-only). |
| tests/test_mcp_server_body_tools.py | Verifies both tools are registered in FastMCP and return JSON strings via MCP. |
| src/entrabot/tools/read_interactions.py | Implements bounded, structured, chronological reads over the interaction JSONL log. |
| src/entrabot/tools/body_bootstrap.py | Implements session-start body bootstrap packet (index-only operational snapshot). |
| src/entrabot/mcp_server.py | Registers new MCP tools read_interactions and bootstrap_body_state. |
| prompts/anatomy/identity-and-tools.md | Documents the new tools and the pre-send body-side observation discipline. |
| if dt.tzinfo is None: | ||
| dt = dt.replace(tzinfo=UTC) | ||
| return dt |
There was a problem hiding this comment.
Verified — bug reproduces. Constructed a regression: since of cutoff_utc.astimezone(timezone(timedelta(hours=12))) (where cutoff_in_offset.date() > cutoff_utc.date()) with an entry 3 days back at 22:00 UTC. Old code skipped the entry's UTC day file and returned []. Fixed in 8729251 by normalizing cutoff and now to UTC inside _days_to_scan before extracting .date(). Regression test in tests/tools/test_read_interactions.py::TestSinceFilter::test_since_with_non_utc_offset_scans_correct_utc_day.
| last = datetime.fromisoformat(top["last_activity"].replace("Z", "+00:00")) | ||
| earlier = datetime.fromisoformat( | ||
| (result["top_chats_today"][0]["last_activity"]).replace("Z", "+00:00") | ||
| ) | ||
| assert last == earlier # sanity |
There was a problem hiding this comment.
Verified — last == earlier was parsing top["last_activity"] twice and comparing it to itself. Replaced with: capture earlier_ts / latest_ts before logging, then assert last_activity == latest_ts AND last_activity > earlier_ts. Now actually validates the recency-selection branch in _top_chats. Fixed in 8729251.
- _days_to_scan now normalizes cutoff + now to UTC before extracting calendar dates. A since with a non-UTC offset whose offset-local date differed from its UTC date would shift cutoff.date() and skip the earliest required UTC day file (silently losing matching entries). Regression test constructs a +12:00 since that exposes the bug — fails on old code, passes on new. - Replace tautological last_activity assertion in test_includes_last_activity_and_last_sender with a real recency check: parse the actual newer ts and assert last_activity equals it AND is strictly greater than the older entry. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Closes #20.
Summary
Three additions giving the model a read path into entrabot's operational storage so it can observe its own history during a turn — the body-side analogue of persona-sati's
observediscipline.read_interactions()MCP tool — new modulesrc/entrabot/tools/read_interactions.py. Chronological filter overinteractions/<day>.jsonlviaMemoryBackend(works on bothLocalBackendandBlobBackend). Filters:chat_id,sender(case-insensitive),action,direction,since(default = now − 24 h),limit(default 10). Default window scans today + yesterday;sincecan reach back up to 7 days (hard cap, logged when hit). No embeddings, no scoring, no caching — JSONL re-reads are cheap (sub-10 ms in the common case).bootstrap_body_state()MCP tool — new modulesrc/entrabot/tools/body_bootstrap.py. Single packet of today'stoday_counts(total / inbound / outbound / by_action / by_channel),top_chats_today(up to 5, ties broken by recency),open_promises(ALL open, not top-N — commitments are durable),cursor_freshness(watched_chat_count / cursors_present / cursors_stale / oldest+newest cursor_ts),watched_chat_count,generated_at. Index only — no message summaries leak into the payload; full content stays inread_interactions.prompts/anatomy/identity-and-tools.md— pre-send observe scoped to outbound publishing (send_teams_message,send_email,send_card,share_file). Reads, lists, and audit entries do not need it. Same cheap-not-precious posture as persona-sati's observe.Test plan
.venv/bin/pytest -v --tb=short→ 1339 passed (was 1281; +58 from new tests), 1 skipped (pre-existing), 8 warnings (all pre-existing_background_pollin chat_cursors tests, unrelated).venv/bin/ruff check .→ All checks passed.venv/bin/pytest tests/tools/test_daily_summary*.py -v→ all passed (explicit regression check —read_day()write path untouched)New tests:
tests/tools/test_read_interactions.py— 26 tests: filter logic (chat_id including outbound-recipient + inbound-metadata-chat_id paths, sender case-insensitivity, action, direction); since-cutoff including day-boundary crossover; 7-day cap; limit honored; sort order most-recent-first; missing day file handled; corrupt line skipped; all filters compose.tests/tools/test_body_bootstrap.py— 23 tests: empty-state sensible zeros; today_counts correctness; top_chats sorting + tie-break by recency + 5-cap + excludes entries without chat_id; all-open-promises (not top-N); cursor freshness picks up stale vs fresh + oldest/newest timestamps; watched_chat_count from persisted file; INDEX-only invariant (full summaries never leak into bootstrap payload).tests/test_mcp_server_body_tools.py— 7 tests: both tools registered with FastMCP, callable viamcp._tool_manager._tools, return JSON strings (matchesread_email/list_promisesconvention), validation errors come back as{"error": "..."}.Design choices made within latitude
sincereaching further back than 7 days is silently capped to a 7-day file scan; a warning is logged at the call site. Going deeper requires a follow-up change to raise the cap intentionally — the cost of unbounded JSONL scans is real, the cost of a follow-up is small.read_day()'s existingraw is None → []behavior). No error, no warning.chat_idfilter semantics: matchesrecipientfor outbound andmetadata.chat_idfor inbound — mirrorsdaily_summary._counterparty(). Consistent with how the existing log writers populate the schema.bootstrap_body_state()open_promises is ALL, not top-N: promises are durable commitments. Capping them would hide work the model owes humans.chat_cursors.is_stale()(24 h cap). Defining a second threshold here would risk drift.Out of scope (per issue)
read_teams_messagesetc. should be in the interaction log) — separate decision.Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com