scribbe

Personal research outreach agent. Drafts personalized emails (and LinkedIn DMs) to researchers, doctors, and AI engineers overnight, ready for your morning review in Notion.

Plan & design rationale: /Users/theomalaper/.claude/plans/i-like-your-idea-jolly-porcupine.md

What it does

While you sleep, scribbe takes researcher targets you've added to a Notion DB and runs each through a 6-stage pipeline:

Research — pulls recent work from arXiv, PubMed, GitHub, lab page
Reading-prep — produces a study-grade brief on the researcher (you read this to engage substantively in any reply)
Draft (×2 variants) — Opus produces two distinct draft angles
Critique — combined critic flags authenticity / factual / question-quality issues
Revise — applies critic feedback, preserving voice
Cross-check — emits a claim-by-claim verification map so you can confirm nothing is hallucinated before sending

In the morning you review in Notion: prune to one variant, edit, approve. The ne`~xt night, scribbe creates Gmail Drafts from approved entries (you click Send manually). Sent emails feed back into the tone corpus (gated by an edit-distance threshold), so the system learns your voice over time.

Auto follow-ups: 14 days after sending, scribbe checks Gmail for a reply. If none, it drafts a low-pressure follow-up into the same thread.

Lean activation path (~45 min to first draft) ← start here

The full design has 5 Notion DBs and several integrations. You don't need them all to prove the core loop works. Activate the minimum first; layer more in once you trust it.

Minimum viable setup

Two Notion DBs only: Targets and ToneExamples. Skip Suggestions, FeedbackLog, RunLog for now.
- Targets minimal fields: name, channel, recipient_email, primary_link, why_interested (optional), primary_status, current_draft, research_brief, sent_version. The follow-up / quality_score / inferred-flag / suggestion-origin fields can be added later.
- ToneExamples: full schema as designed. This one is the high-leverage DB for voice match — don't skip.
Notion integration: create it, connect to both DBs, paste token into .env, paste both DB IDs into agent-config.yml. Leave feedback_log_db_id, run_log_db_id, suggestions_db_id as "REPLACE_ME" — the orchestrator skips those stages gracefully.
Personal config: edit profile.md and signature.txt (~10 min replacing TODOs).
Tone seed: paste 5-7 of your best past outreach emails into ToneExamples DB with channel tags.
Skip Gmail OAuth entirely if you have the Gmail MCP loaded (check /mcp in Claude Code — if you see Gmail tools, you're set; the orchestrator uses them automatically). Skip GitHub PAT for now too; the GitHub fetcher falls back to 60/hr unauthenticated rate limit (fine for a few targets/day).
Smoke test with one queued target.

That's it. ~45 min. You'll have a working "queue researcher → get reading-prep + draft by morning" loop.

What you skip in lean activation (what each adds when you turn it on later)

FeedbackLog DB — automatic edit-diff learning. Without it, the agent uses ToneExamples alone; you don't get the corpus self-extension or "X% of your edits this week were tone fixes" signal. Add when you have ~10 sent emails of feedback to process.
RunLog DB — observability dashboard in Notion (calendar, charts). Without it, agent writes the same data to runs.fallback.log locally. Add when you want to see trends.
Suggestions DB + active_topics — nightly candidate discovery from arXiv/PubMed/lab pages. Without it, you queue all targets manually. Add when you want a pipeline of new candidates without active searching.
Follow-up fields on Targets DB — auto-followup after N days. Without these fields and Gmail readonly access, you handle followups manually. Add when you start sending real emails and want auto-followup pressure.
Gmail OAuth — auto-create drafts directly in your Gmail. Without it, copy-paste from Notion manually. Add when copy-paste friction starts annoying you.

The orchestrator detects which DBs/integrations are configured and skips the corresponding stages cleanly — you don't need to "uninstall" anything to run lean.

Full v1 setup checklist (when you're ready for everything)

1. Notion workspace

Create 5 databases in a single Notion workspace (e.g. a page called "scribbe").

Targets — primary working surface. Properties:

name (title) — required
channel (select: email / linkedin / other) — required
recipient_email (email) — required for email channel (or fill at queue / let agent suggest)
recipient_linkedin (URL) — required for linkedin channel
primary_link (URL) — paper / lab page / profile; required if neither recipient field is filled
role (text) — optional; agent enriches if empty
field (select: AI / clinical / AI-medicine / other) — optional; agent enriches if empty
why_interested (text) — optional; agent generates a curiosity-frame opening from profile.md if empty
custom_questions (text) — optional
recipient_email_candidates (rich text) — agent populates when recipient_email is empty
primary_status (select) — values: queued, researching, draft-ready, approved, drafted-in-gmail, sent, needs-review, skip
followup_status (select): none, needed, approved, drafted-in-gmail, sent, not-needed
reply_status (select): unknown, replied, no-reply
followup_days_override (number) — optional, overrides global follow-up cadence for this target
skip_followup (checkbox) — optional, disables auto-followup for this target
research_brief (rich text) — agent-written; your study material (papers + concepts + sources)
current_draft (rich text) — agent-written; 2 variants delimited; you prune to 1 before approving
verification_map (rich text) — claim-grounding map per variant
quality_score (number) — best variant's % high-confidence claims
sent_version (rich text) — paste your final sent body here after sending
followup_draft (rich text), followup_sent_version (rich text)
sent_at, reply_received_at, followup_sent_at (date)
thread_id (text) — Gmail thread id, set on draft creation
duplicate_of (relation to Targets), last_attempt_at (date), attempt_count (number)
role_inferred (checkbox) — true if agent enriched the role
field_inferred (checkbox) — true if agent enriched the field
suggestion_origin (relation to Suggestions) — set if this Target was promoted from a Suggestions entry
created_at, target_send_date (date)

ToneExamples — subject (title), body (rich text), recipient_context (text), channel (select), type (select: cold-outreach / follow-up / thanks / other)

FeedbackLog — target (relation to Targets), channel (select), original_passage (text), final_passage (text), inferred_reason (select: tone / factual / question / length / structural / personal-context), created_at (date)

RunLog — run_at (date), targets_picked (number), targets_succeeded (number), targets_failed (number), sonnet_calls (number), opus_calls (number), wall_clock_seconds (number), errors (rich text), notes (rich text)

Suggestions — populated nightly by the auto-suggest stage AND on-demand by the /scribbe-search "<topic>" slash command:

name (title), topic (text — the topic that surfaced this candidate), match_score (number)
match_rationale (text) — one-sentence why-this-person
connection_to_targets (text) — names of existing Targets this person co-authors with
role_inferred, field_inferred (text)
primary_link (URL), channel_inferred (select)
matching_paper_titles (text — list of papers driving the match, empty for directory-only candidates)
sources (text — list of sources that produced the match: arxiv, pubmed, directory:<name>)
status (select): candidate, promoted, skip
created_at, promoted_at (date)

Sources for nightly suggestions (configured in agent-config.yml under topic_search):

arXiv recent papers (free API, clean)
PubMed recent papers (free API, clean)
Institution directory pages — top med-AI / AI labs publish open faculty directories with research summaries; the agent scrapes these via WebFetch and extracts faculty whose research mentions active topics. Default list includes Stanford HAI, MIT CSAIL, BAIR, Mt. Sinai AIM, Stanford AIMI; edit the institution_directories config list to taste.

LinkedIn is not supported as a discovery source (no clean people-search API; scraping is fragile and ToS-risky). It IS supported as an outbound channel — provide a profile URL on a Target with channel: linkedin and the agent drafts a DM you copy-paste.

2. Notion integration

Create a Notion integration: https://www.notion.so/my-integrations
Connect the integration to your scribbe page (so it has access to all 5 DBs)
Copy the secret token; save as NOTION_TOKEN in .env
Find each DB's ID from its URL (https://www.notion.so/<workspace>/<dbid>?v=...)
Paste DB IDs into agent-config.yml under notion.*_db_id (you'll need to add suggestions_db_id — see config notes)

3. Notion views (recommended)

Targets: kanban grouped by primary_status so you can see queued / draft-ready / approved at a glance
Suggestions: kanban grouped by status (candidate / promoted / skip); secondary table view filtered by topic so you can see candidates per search
RunLog: table view (chronological), plus a calendar view (which days ran), plus a chart view (opus_calls + sonnet_calls over time, targets_failed over time)

4. Gmail OAuth

You need both gmail.compose (to create drafts) and gmail.readonly (to detect replies for follow-ups).

Create a Google Cloud project, enable Gmail API
Create OAuth credentials (Desktop app type)
Run an OAuth flow once locally to get a refresh token (any Python google-auth-oauthlib script works)
Save GMAIL_CLIENT_ID, GMAIL_CLIENT_SECRET, GMAIL_REFRESH_TOKEN to .env

(Optional simplification: if OAuth is taking >1 hour to set up, the agent can use the already-loaded Gmail MCP from your Claude Code environment to create drafts, in which case you don't need the Gmail OAuth credentials in .env.)

5. GitHub PAT

Public repo read access only. https://github.com/settings/tokens → save as GITHUB_TOKEN in .env (raises rate limit from 60/hr to 5000/hr).

6. Fill in personal config

Edit profile.md — replace [TODO: ...] placeholders with your real info (school, projects, voice notes, links). The drafter reads this every run.
Edit signature.txt — your fixed email signature. Appended verbatim to email drafts.
Paste 5-10 of your best past outreach emails into the Notion ToneExamples DB. Mix email + LinkedIn entries with channel tags. Quality > quantity here — these are the tone seeds.

7. Smoke test

Before scheduling, run the agent manually with one queued target.

Add ONE entry to Targets DB (e.g. someone you've already cold-emailed before so you know what a good draft looks like). Set primary_status = "queued", channel = "email".

Invoke the agent in Claude Code:

Use the scribbe subagent. Run the nightly pipeline once.

Check Notion: the target should advance to draft-ready, with research_brief, current_draft (2 variants), verification_map, and quality_score populated.
Read the brief — does it teach you something useful about the researcher?
Read the drafts — do they sound like you?
Read the verification map — are all substantive claims grounded?

If any step fails, check runs.fallback.log and the RunLog DB for error details.

8. Schedule

Once smoke test passes:

Use the schedule skill to create a scheduled task that runs the scribbe agent at 02:00 daily.

The scheduled task fires nightly; you review drafts in Notion in the morning.

Using it day-to-day

Evening (~5 min): Add new targets to Targets DB. Minimum required: name, channel, AND one of {primary_link, recipient_email, recipient_linkedin}. Everything else is optional — the agent enriches role/field from research findings (tagged *_inferred), and falls back to a curiosity-frame opening if why_interested is empty. Set primary_status: queued.

Nightly candidate discovery: Set 1-3 topics you want fresh candidates for in agent-config.yml under topic_search.active_topics (e.g. ["biomedical AI", "clinical NLP"]). Each nightly run searches recent arXiv + PubMed papers in those topics, scores authors by relevance + connection to existing Targets, writes new candidates to the Suggestions DB (capped per topic + globally to avoid flooding). You browse Suggestions in the morning alongside your draft-ready Targets, promote interesting ones to Targets manually (copy fields into Targets, set primary_status: queued, optionally link suggestion_origin for audit).

Ad-hoc topic search: For one-off searches outside your active topics, run /scribbe-search "<topic>" in Claude Code. Same scoring and output, runs immediately rather than waiting for the next nightly fire.

The "connection to existing Targets" signal — researchers who co-author with people already in your Targets DB get a connection_boost and the connected names listed in connection_to_targets — is the differentiator. Listing senior researchers in a field is easy; finding the ones one degree from people you already talk to is harder, and that's what scribbe surfaces.

Morning (~10-15 min): Open the Scribbe page in Notion → "✏️ Drafts to review" section → click a card.

The card opens as a full document with formatted sections:

📚 Research brief — your study material
✏️ Email — Variant 1 with subject candidates and editable body
✏️ Email — Variant 2 with subject candidates and editable body
✓ Verification map — every claim and where it grounds in the brief
Workflow — quick checklist at the bottom

What you do:

Read the research brief
Pick the variant you prefer; delete the other variant's section entirely from the page body
Edit the chosen variant's body inline (this is where the magic happens — your edits become the email)
Pick one subject from the candidates; delete the other two bullet points
Skim the verification map — anything marked "low" or "NOT IN BRIEF" needs your attention
Set primary_status to approved (in the right-side property panel)
Use < > arrows at the top of the card to move to the next draft

Don't edit current_draft (the property in the right panel). That's the immutable baseline the agent uses for diff/learning. Your edits live in the page body. The agent reads the page body when creating Gmail Drafts, and diffs the baseline vs. your final sent_version for FeedbackLog.

Throughout the day: Open Gmail Drafts (drafts created at next 02:00 from your approved entries). One last skim, click Send (or Schedule send for tomorrow morning — Gmail's native button). Back in Notion: set primary_status = "sent" and paste your final sent body into sent_version. Fill in sent_at.

How feedback learning works (fully automatic): You never write FeedbackLog rows yourself. After you mark a Target sent with sent_version filled in, the next nightly run automatically:

Diffs current_draft (the frozen baseline — what the agent originally wrote) vs sent_version (what you actually sent — including all your page-body edits)
Classifies each meaningful change by reason (tone / factual / question / length / structural / personal-context) and writes one FeedbackLog row per change
If you edited > 20% of characters, ALSO auto-promotes your sent_version into ToneExamples DB so your voice corpus grows

The drafter on subsequent runs reads top-K most relevant FeedbackLog entries (channel-matched, excluding personal-context reason) as in-context corrections — so your edit patterns shape future drafts without you ever curating examples by hand.

Why two surfaces (property + page body)? The property is the agent's frozen baseline, used for measuring how much you changed. The page body is your editing surface — formatted nicely so it reads like a document, not a database form. The agent reads the page body when creating Gmail Drafts (that's how your edits get sent). It diffs the property vs your sent_version (that's how it learns your voice).

Saying no to suggestions you don't want:

Per-suggestion: in the Suggestions DB, set status: skip on any candidate you don't want. The agent dedupes against skipped Suggestions and won't re-propose them for the same topic.
Permanently across all topics: create a row in Targets DB with the person's name + primary_status: skip (no other fields needed). The suggest stage filters ALL Targets entries regardless of status, so they'll never appear as a candidate again.

Once a week or so: Check the RunLog calendar/chart view to see how the agent's been doing.

Tweaking

Quality not great? Edit the relevant prompt in agent-prompts/<stage>.md. Subagents read the prompt fresh on each invocation.
Want a different tone for a specific field? Add recipient_context clues to your ToneExamples entries; the drafter retrieves channel-matched examples.
Want more/fewer draft variants? Edit agent-config.yml → variant_count (1-3).
Want a different follow-up cadence? Edit agent-config.yml → followup_days_threshold. For per-target overrides, set followup_days_override on the Target itself.
Quota too high on Max 5x? Drop variant_count from 2 to 1. Halves the draft/critique/revise/cross-check work per target.

Project structure

scribbe/
├─ profile.md                    ← you (hand-edited)
├─ signature.txt                 ← your fixed email signature
├─ agent-config.yml              ← thresholds, models, Notion DB IDs
├─ agent-prompts/                ← stage system prompts (subagents read these)
│  ├─ research.md
│  ├─ drafter.md
│  ├─ drafter-career-template.md
│  ├─ critic.md
│  ├─ revise.md
│  ├─ cross-check.md
│  ├─ followup-drafter.md
│  ├─ feedback-classify.md
│  └─ recipient-email-extract.md
├─ fetchers/                     ← Python source fetchers (Bash-invoked by orchestrator)
│  ├─ arxiv.py
│  ├─ pubmed.py
│  └─ github.py
├─ runs.fallback.log             ← only written when Notion RunLog write fails
├─ .env                          ← gitignored; credentials
├─ .env.example                  ← template
├─ .gitignore
└─ README.md                     ← this file

~/.claude/agents/
├─ scribbe.md                    ← orchestrator (sonnet, scheduled nightly)
├─ scribbe-research.md           ← opus (rich brief: papers + concepts + sources)
├─ scribbe-drafter.md            ← opus
├─ scribbe-critic.md             ← opus
├─ scribbe-revise.md             ← opus
├─ scribbe-cross-check.md        ← sonnet
├─ scribbe-followup.md           ← opus
├─ scribbe-feedback.md           ← sonnet
├─ scribbe-email-extract.md      ← sonnet
└─ scribbe-suggest.md            ← opus (on-demand /scribbe-search)

~/.claude/commands/
└─ scribbe-search.md             ← /scribbe-search "<topic>" slash command

Don't forget to also add a notion.suggestions_db_id line to agent-config.yml after you create the Suggestions DB.

Quota expectations on Claude Max

Per nightly run with default config (2 variants, 5 targets + occasional follow-up):

~35-45 Opus calls + ~10 Sonnet calls
~5-7 minutes wall-clock

Per week (7 nights):

~250-300 Opus calls + ~70 Sonnet calls

Comfortably inside Max 5x per-block limits. Max 20x users have plenty of headroom.

Deferred / out of scope (v1)

See the design doc for the full list. Highlights:

LinkedIn auto-scraping as a research source (LinkedIn is supported as a channel — manual paste — but not as an information source)
Auto-send (always ends in your manual click)
Reply detection on LinkedIn channel (no reliable thread API)
Embedding-based retrieval over FeedbackLog/ToneExamples (planned for v1.5)
Stale-draft auto-refresh, voice-drift audit, QuestionBank (v1.5+)

Troubleshooting

Drafts sound generic → check ToneExamples is populated and channel-matched. Add more entries. Edit profile.md voice notes.
Drafts hallucinate facts → check verification_map in Notion; tighten the factual critic in agent-prompts/critic.md if needed.
Agent crashes mid-pipeline → next run retries (status filter queued only advances on successful publish). After 3 attempts, target moves to needs-review.
Notion write errors → check runs.fallback.log. Common cause: integration token doesn't have access to one of the 4 DBs.
Email candidate suggestions are bad → the lab page may not expose the email pattern. Just fill in recipient_email manually.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scribbe

What it does

Lean activation path (~45 min to first draft) ← start here

Minimum viable setup

What you skip in lean activation (what each adds when you turn it on later)

Full v1 setup checklist (when you're ready for everything)

1. Notion workspace

2. Notion integration

3. Notion views (recommended)

4. Gmail OAuth

5. GitHub PAT

6. Fill in personal config

7. Smoke test

8. Schedule

Using it day-to-day

Tweaking

Project structure

Quota expectations on Claude Max

Deferred / out of scope (v1)

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
agent-prompts		agent-prompts
fetchers		fetchers
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
agent-config.yml		agent-config.yml
profile.md		profile.md
signature.txt		signature.txt

Folders and files

Latest commit

History

Repository files navigation

scribbe

What it does

Lean activation path (~45 min to first draft) ← start here

Minimum viable setup

What you skip in lean activation (what each adds when you turn it on later)

Full v1 setup checklist (when you're ready for everything)

1. Notion workspace

2. Notion integration

3. Notion views (recommended)

4. Gmail OAuth

5. GitHub PAT

6. Fill in personal config

7. Smoke test

8. Schedule

Using it day-to-day

Tweaking

Project structure

Quota expectations on Claude Max

Deferred / out of scope (v1)

Troubleshooting

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages