Skip to content

fix(email-triage): don't surface newsletters as approval cards when triage 529s#363

Open
nolanmak wants to merge 1 commit into
mainfrom
fix/email-triage-retry-leak
Open

fix(email-triage): don't surface newsletters as approval cards when triage 529s#363
nolanmak wants to merge 1 commit into
mainfrom
fix/email-triage-retry-leak

Conversation

@nolanmak

@nolanmak nolanmak commented Jun 25, 2026

Copy link
Copy Markdown
Owner

Symptom

Newsletter / marketing emails (Substack, Patagonia, etc.) were surfacing as Discord approval "draft" cards even though the is_human_sender filter (#217) is supposed to skip them. The filter itself works — it correctly skipped 8 GitHub notification emails the same day.

Root cause

The leak only triggers when the first triage call hits a transient Claude 529 Overloaded (which surfaces as non-JSON text, so parse_decision fails). On that path:

  1. process_email logged a status='error' action row and returned before the DecisionKind::Reply arm — the only place is_human_sender / is_event_blast run.
  2. That one row caused two failures:
    • It tripped has_open_action (matches status IN ('pending','error')), permanently blocking the poll loop from re-triaging the email.
    • It was scooped by list_retryable_replies (WHERE status='error', no decision-kind filter) into retry_oncedispatch_reply, which force-drafts an empty body and posts an approval card with no automated-sender gate.

The retry queue was only ever meant for drafting-stage failures (triage already decided reply, a real draft exists, create_draft/post_approval failed). A triage-stage failure has run no gates and has no draft, so treating it as a retryable reply is wrong for junk and humans.

Evidence from the live DB: every leaked pending card had a draftBody length of 0 (no real draft ever generated), while every legitimate human reply had a non-empty draft. Logs showed triage parse failed: 529 Overloaded followed ~one retry-interval later by approval card posted.

Fix

Drop the log_action(Error) call in the triage parse-failure branch. With no action row written, the email stays unread/unprocessed and the next poll re-runs full triage + all gates — identical to how a network Err from reasoner.call (the ? one line above) already behaves. Fixes both the junk leak and the empty-draft brokenness in one edit.

Testing

  • New triage_parse_failure_does_not_create_retryable_action reproduces the exact production scenario (a substack.com sender whose first triage 529s) through the real process_email + retry_once paths and asserts: no action row, nothing retryable, no approval card, and a re-triage skip.
  • Full augmentagent-channel-email suite: 86 passed / 0 failed. Both existing retry tests stay green (the fix only touches the triage-stage branch, not the drafting-stage error paths). Release build clean.

Deliberately out of scope (follow-ups)

  • One-time cleanup of the ~11 already-leaked pending dlen=0 rows in the live data.db (operational, not a code change).
  • Optional defense-in-depth gate inside dispatch_reply itself (belt-and-suspenders; not needed for correctness once triage-stage failures stop entering the retry queue).

Summary by CodeRabbit

  • Bug Fixes

    • Improved handling of triage parse errors so failed emails are no longer marked with an open action.
    • Emails that hit a temporary triage failure will now be reprocessed on the next poll, avoiding incorrect retry behavior.
    • Automated-sender checks now apply correctly after a retry, preventing unnecessary drafting or notifications for skipped messages.
  • Tests

    • Added coverage for triage failure and retry behavior to prevent regressions.

When the triage model returned a transient 529 (surfacing as non-JSON
text), process_email logged a status='error' action row and returned
early — before the DecisionKind::Reply arm where the #217 is_human_sender
gate lives. That row then (1) tripped has_open_action, permanently
blocking the poll loop from re-triaging the email, and (2) was picked up
by list_retryable_replies -> retry_once -> dispatch_reply, which
force-drafted an empty body and posted an approval card with NO
automated-sender gate. Result: newsletters/marketing (substack.com,
patagonia, etc.) whose first triage hit a 529 leaked through as approval
cards — every leaked card had an empty (length-0) draft body, the
signature of the retry path.

The retry queue is for drafting-stage failures (triage already decided
reply, a real draft exists, create_draft/post_approval failed), not
triage-stage failures. A triage-stage failure has run no gates and has no
draft, so it must be re-triaged from scratch. Drop the log_action(Error)
call in the parse-failure branch: with no action row the email stays
unread/unprocessed and the next poll re-runs full triage + all gates —
identical to how a network Err from reasoner.call already behaves.

Adds triage_parse_failure_does_not_create_retryable_action, reproducing
the production scenario (a substack.com sender whose triage 529s) through
the real process_email + retry_once paths: asserts no action row, no
retryable reply, no card, and a re-triage skip.
@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

GmailChannel::process_email no longer records an error action row when triage parsing fails. A regression test now covers a non-JSON first poll, retry behavior, re-triage on the next poll, automated-sender skipping, and completion without Discord posting.

Changes

Triage parse failure retry handling

Layer / File(s) Summary
Process triage parse failures
crates/augmentagent-channel-email/src/channel.rs
process_email returns triage parse failures without persisting an error action row.
Regression test for retry flow
crates/augmentagent-channel-email/src/channel.rs
The new test covers a non-JSON first poll, retry_once, second-poll re-triage, automated-sender skip, and completion without Discord posting.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

  • nolanmak/MyAgentAssistant#223: Shares the GmailChannel::process_email triage/retry path and the automated-sender skip flow exercised by this PR’s regression test.

Poem

I hopped through triage, soft and bright,
And left no error row in sight.
A second poll, a careful glance,
Then skip! the newsletter had its chance.
🐰✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main fix: preventing newsletter approval cards after a triage 529 error.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/email-triage-retry-leak

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/augmentagent-channel-email/src/channel.rs (1)

2477-2503: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Assert the full no-state/no-Discord contract.

This test’s comments say “NO action row” and “no Discord posting,” but it only checks no open action and no approval-card posts. Add checks for zero action rows after pass 1 and zero flag notices so the regression can’t slip through via a non-open action or flag notice.

Suggested test hardening
         assert!(
             !store.has_open_action("m-529").unwrap(),
             "a triage-stage failure must NOT create an action row"
         );
+        let action_rows = store
+            .with_conn(|c| {
+                c.query_row(
+                    "SELECT COUNT(*) FROM actions WHERE messageId = 'm-529'",
+                    [],
+                    |r| r.get::<_, i64>(0),
+                )
+            })
+            .unwrap();
+        assert_eq!(action_rows, 0, "triage parse failure must not create any action row");
         assert!(!store.is_email_complete("m-529").unwrap());
         assert_eq!(broker.posts.lock().unwrap().len(), 0);
+        assert_eq!(broker.flag_posts.lock().unwrap().len(), 0);
 
         // --- Retry tick: with no errored row, nothing is retryable, so the leak
         // path (retry_once -> dispatch_reply, ungated) never runs.
         let retried = ch.retry_once().await.unwrap();
@@
         let out2 = ch.poll_once().await.unwrap();
         assert_eq!(out2.skipped, 1, "re-triage routes the automated sender to the skip gate");
         assert_eq!(out2.awaiting_approval, 0);
         assert_eq!(broker.posts.lock().unwrap().len(), 0);
+        assert_eq!(broker.flag_posts.lock().unwrap().len(), 0);
         assert!(store.is_email_complete("m-529").unwrap());
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/augmentagent-channel-email/src/channel.rs` around lines 2477 - 2503,
This test only asserts that there is no open action and no approval-card post,
but it does not fully enforce the no-state/no-Discord contract. Strengthen the
assertions around the existing retry and poll flow in channel.rs by checking
that the store has zero action rows after pass 1 and that no flag notices were
created or posted, using the same store/broker handles already exercised by
ch.retry_once() and ch.poll_once().
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@crates/augmentagent-channel-email/src/channel.rs`:
- Around line 2477-2503: This test only asserts that there is no open action and
no approval-card post, but it does not fully enforce the no-state/no-Discord
contract. Strengthen the assertions around the existing retry and poll flow in
channel.rs by checking that the store has zero action rows after pass 1 and that
no flag notices were created or posted, using the same store/broker handles
already exercised by ch.retry_once() and ch.poll_once().

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c474a128-11f3-4f36-99c6-74dd2bb34626

📥 Commits

Reviewing files that changed from the base of the PR and between 5779157 and 00253e7.

📒 Files selected for processing (1)
  • crates/augmentagent-channel-email/src/channel.rs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant