fix(briefing): remind, don't assert progress on machine-notification threads (#265)#388
Merged
Merged
Conversation
…threads (#265) When a task assigned via a collaboration tool (ClickUp/Slack/Linear/GitHub) arrives as a notification email, the work happens in that tool or the IDE, not in a reply — so email-thread silence carries zero information about task progress. The composer was applying human-thread reply-latency reasoning to bot threads and fabricating 'still no reply / no progress / hasn't started' claims the inputs never supported. Add a briefing-prompt principle: surface a notification-driven item as a neutral open reminder, never a progress/status assertion; reserve reply-latency framing ('still no reply', 'waiting on you') for genuine person-to-person threads where silence does mean the user owes a reply. Add an evalite lane to apps/server + briefing-notification-reminder.eval.ts: three machine-notification cases (deterministic no-fabricated-progress scorer) plus a person-to-person control (asserts the fix doesn't over-correct and gag a legitimate reply-owed item). Runs the real boss model through the real prompt + tool loop; green with the fix. Functions as a regression pin — the prod incident was on gemini-2.5-pro, and current Sonnet 4.6 passes without the fix too. Parent: #218. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Closes #265.
When a task assigned through a collaboration tool (ClickUp / Slack / Linear / GitHub) arrives as a notification email, the actual work happens in that tool or the IDE — never in a reply to the bot. The briefing composer was applying human-thread reply-latency reasoning to these machine threads and asserting claims the inputs never supported: "still no reply", "no progress on X", "you haven't started Y". Email silence on a notification thread is zero evidence of task progress.
Change
Prompt principle (
apps/server/.../briefing/prompt.ts): surface a notification-driven item as a neutral open reminder ("you've got an open task: X"), never a progress/status assertion. Reserve reply-latency framing ("still no reply", "waiting on you") for genuine person-to-person threads, where silence does mean the user owes that human a reply.Eval — new evalite lane in
apps/server+briefing-notification-reminder.eval.ts:Runs the real
getBossModelthrough the real prompt + tool loop. 100% green with the fix.Honest caveat
A teeth-check (stash the fix, re-run) showed Sonnet 4.6 passes even without the fix — it naturally writes "still open" rather than "no progress." The prod incident (see #265 / #256) was on gemini-2.5-pro (the boss fallback path). So this eval is a regression pin — same justification as the existing
sender-suppression-groundingeval — not a present behavior-change on the primary model.Verification
pnpm --filter server check-types✅pnpm exec oxlint(changed files) ✅pnpm check:web-boundaries✅pnpm --filter server eval→ 4/4 at 100% ✅Parent: #218 · Related: #256
🤖 Generated with Claude Code