Skip to content

fix(briefing): remind, don't assert progress on machine-notification threads (#265)#388

Merged
99Yash merged 2 commits into
mainfrom
fix/briefing-notification-reminder-265
Jul 3, 2026
Merged

fix(briefing): remind, don't assert progress on machine-notification threads (#265)#388
99Yash merged 2 commits into
mainfrom
fix/briefing-notification-reminder-265

Conversation

@99Yash

@99Yash 99Yash commented Jul 3, 2026

Copy link
Copy Markdown
Owner

What

Closes #265.

When a task assigned through a collaboration tool (ClickUp / Slack / Linear / GitHub) arrives as a notification email, the actual work happens in that tool or the IDE — never in a reply to the bot. The briefing composer was applying human-thread reply-latency reasoning to these machine threads and asserting claims the inputs never supported: "still no reply", "no progress on X", "you haven't started Y". Email silence on a notification thread is zero evidence of task progress.

Change

Prompt principle (apps/server/.../briefing/prompt.ts): surface a notification-driven item as a neutral open reminder ("you've got an open task: X"), never a progress/status assertion. Reserve reply-latency framing ("still no reply", "waiting on you") for genuine person-to-person threads, where silence does mean the user owes that human a reply.

Eval — new evalite lane in apps/server + briefing-notification-reminder.eval.ts:

  • Block A — 3 machine-notification cases (ClickUp / Slack / GitHub), each previously-surfaced so the composer is tempted to "close the loop." Deterministic scorer asserts the composed text carries no fabricated-progress phrasing.
  • Block B — person-to-person control: asserts the fix does not over-correct and gag a legitimate reply-owed item (Fabian is still surfaced).

Runs the real getBossModel through the real prompt + tool loop. 100% green with the fix.

Honest caveat

A teeth-check (stash the fix, re-run) showed Sonnet 4.6 passes even without the fix — it naturally writes "still open" rather than "no progress." The prod incident (see #265 / #256) was on gemini-2.5-pro (the boss fallback path). So this eval is a regression pin — same justification as the existing sender-suppression-grounding eval — not a present behavior-change on the primary model.

Verification

  • pnpm --filter server check-types
  • pnpm exec oxlint (changed files) ✅
  • pnpm check:web-boundaries
  • pnpm --filter server eval → 4/4 at 100% ✅

Parent: #218 · Related: #256

🤖 Generated with Claude Code

99Yash and others added 2 commits July 3, 2026 10:48
…threads (#265)

When a task assigned via a collaboration tool (ClickUp/Slack/Linear/GitHub)
arrives as a notification email, the work happens in that tool or the IDE, not
in a reply — so email-thread silence carries zero information about task
progress. The composer was applying human-thread reply-latency reasoning to bot
threads and fabricating 'still no reply / no progress / hasn't started' claims
the inputs never supported.

Add a briefing-prompt principle: surface a notification-driven item as a neutral
open reminder, never a progress/status assertion; reserve reply-latency framing
('still no reply', 'waiting on you') for genuine person-to-person threads where
silence does mean the user owes a reply.

Add an evalite lane to apps/server + briefing-notification-reminder.eval.ts:
three machine-notification cases (deterministic no-fabricated-progress scorer)
plus a person-to-person control (asserts the fix doesn't over-correct and gag a
legitimate reply-owed item). Runs the real boss model through the real prompt +
tool loop; green with the fix. Functions as a regression pin — the prod incident
was on gemini-2.5-pro, and current Sonnet 4.6 passes without the fix too.

Parent: #218.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@99Yash 99Yash merged commit 4dd7c9c into main Jul 3, 2026
4 checks passed
@99Yash 99Yash deleted the fix/briefing-notification-reminder-265 branch July 3, 2026 05:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Briefing asserts 'no progress / still no reply' on tasks assigned via notification threads (ClickUp/Slack) — should remind, not assert

1 participant