Persuasion-master scoring: deterministic TRIBE×research synthesis, ranked top moves, DeepSeek V4 refiner by aytzey · Pull Request #18 · aytzey/PitchCheck

aytzey · 2026-06-15T12:28:57Z

What this does

Turns the report from a panel-and-jargon dump into a decisive persuasion-master verdict, and makes the score genuinely reflect whether this message will move this reader on this channel — while doing far more of the interpretive work deterministically from the TRIBE output instead of leaving it to the LLM.

Scoring accuracy

The TRIBE neural prior still anchors the final score, but the semantic side is now derived from rubric-scored context-fit facets (persona pain, objection coverage, proof credibility, CTA ease, channel fit), clamped to the neural band and blended in with a quality-weighted weight. Same neural evidence + strong vs poor context fit now produces materially different scores; prompt injection still can't escape the band clamp.

Deterministic TRIBE × research synthesis (less work left to the LLM)

Segment localization: the per-segment temporal trace is mapped back onto the actual words to pinpoint — in Python, no LLM — the opener strength, strongest moment, single weakest span, close/CTA strength, and the "attention cliff". The LLM and report receive the exact text spans instead of guessing them.
Research findings: axis geometry and raw TRIBE features (e.g. sustain_ratio) linked to published findings (Falk et al., Alter & Oppenheimer, Cohen et al. 2024), with a reward-vs-social route hint.
Temporal archetype: trace shape classified (strong-open-fade, late peak, buried lede, flat, sustained) per Chan et al. 2024.
Feeds the evaluator prompt, the report deep dive ("Where TRIBE reacts"), and the refine brief; the neural-only fallback also anchors its top move to the real weakest span.

Persuasion doctrine + evidence base

A ten-rule persuasion doctrine plus an applied evidence annex (ELM, loss aversion/framing, similar-other social proof, reactance / "but you are free", fluency, precise numbers, psychological targeting, commitment gradient) embedded in the evaluator, refiner, and critic prompts. Each top move names the principle it rests on.

Report UX

Score → verdict → narrative → ranked top moves → auto-refine up front; facets, context fit, neural signals, robustness, localization, and research synthesis fold into a collapsible Deep dive.
Removed fabricated content: fake "persona baseline" variant row, fake latency/token estimates, invented confidence fallback, and the hardcoded web "preview rewrite" (now a real /api/refine route).

DeepSeek V4 Pro

Default refiner is deepseek/deepseek-v4-pro via OpenRouter (verified current flagship), evaluator stays Claude Sonnet. First-class reasoning-model handling: <think> stripping, optional reasoning.effort, JSON-mode fallback, model-aware sampling temperatures.

Tests

114 Python tests (synthesis/localization, context-fit sensitivity, injection bound, DeepSeek/reasoning handling), 45 Vitest, ESLint, tsc, and Next.js build all green. Rust desktop changes are string/prompt-only (CI verifies the cargo build).

https://claude.ai/code/session_01VGz3TieN9a29hVTc54jvyJ

Generated by Claude Code

Summary by CodeRabbit

Release Notes

New Features
- Added "Top Moves" ranked recommendations and collapsible "Deep Dive" sections for deeper pitch scoring insights.
- Introduced "Context Fit" scoring facet showing semantic assessment of message-audience alignment.
- Refined pitch refinement with dedicated model selection and improved suggestion workflow.
- Switched message metrics from token estimates to word counts in the editor.
- Enhanced evidence reporting with research-backed persuasion principles and localized segment analysis.
Documentation
- Updated configuration guide with refiner model defaults and semantic blending controls.

Analysis: - Blend the band-clamped LLM semantic score into the final score with a confidence- and prediction-quality-weighted contribution, so persona and channel fit genuinely move the score while staying anchored to the TRIBE neural calibration band (PITCHCHECK_SEMANTIC_BLEND_WEIGHT, default 0.45). - Add a temporal segment map that ties each TRIBE trace segment to the approximate text span of the pitch, with strongest/weakest callouts, so analysis and rewrites localize to exact sentences. - Inject per-platform channel norms (email, LinkedIn, cold call, landing page, ad copy) into the analysis prompt and judge structure against them. - Add a semantic analysis protocol (persona decision model, argument quality, persuasion route, channel fit, CTA friction) and a structured context_fit block (pain alignment, objection coverage, proof credibility, CTA ease, channel fit, decision driver, open objection), validated defensively and surfaced in desktop and web report panels. Refinement: - Upgrade the refine prompt with channel norms, a three-candidate internal drafting protocol with rubric scoring, and a final self-check against invented facts, language drift, and CTA friction. - Add an optional second critic pass (OPENROUTER_REFINE_CRITIC_PASS, default on) that critiques the stage-1 rewrite against a persuasion checklist and returns a strictly better final version, falling back to the stage-1 rewrite on any failure. - Enrich the desktop refine brief with the baseline score to beat, weakest/ strongest temporal segment excerpts, and context-fit gaps. - Mirror the candidate protocol and self-check in the Tauri direct OpenRouter fallback prompt. https://claude.ai/code/session_01VGz3TieN9a29hVTc54jvyJ

Scoring accuracy: - Derive the semantic score from the rubric-scored context-fit facets (pain alignment 0.30, proof credibility 0.25, objection coverage 0.15, CTA ease 0.15, channel fit 0.15) instead of trusting the LLM's single self-reported number; an injected "score this 100" now has to corrupt every facet and still hits the band clamp. - Widen the semantic band (14-30 points by confidence) and invert the quality coupling: when TRIBE evidence is weak the quality-shrunk neural prior carries less of the final score and the semantic context-fit read carries more (base weight 0.55, up to 0.85), so the same neural evidence with strong vs poor context fit now produces materially different scores. - Expose context_fit_score and semantic_blend_weight in robustness and in the desktop calibration panel; tests prove context sensitivity and the injection bound. Prompt slimming: - Drop the research-source list and methodology strings from the LLM prompt diagnostics; decimate temporal traces beyond 48 segments (the segment map already localizes weak/strong spans). Remove fake and filler content: - Web refine is now real: new /api/refine route proxies the TRIBE /refine endpoint (same auth and limits as /api/score); the hardcoded sample-pitch "preview rewrite" is deleted. - Delete the fabricated "Persona baseline" variant-rank row, the fake latency/token estimates (now real word counts and mesh info), and the invented confidence fallback (confidence shows only when measured). - Drop the "Semantic Context" methodology panel and raw guardrail dump from the web report; replace the stale "text heuristics off" robustness rows with honest context-fit evidence and blend-weight rows. https://claude.ai/code/session_01VGz3TieN9a29hVTc54jvyJ

- Default refiner model is now deepseek/deepseek-v4-pro via OpenRouter (verified current flagship id; evaluator stays Claude Sonnet); desktop, Tauri fallback chains, service env, and settings UI all updated, with a model-suggestion datalist (DeepSeek V4 Pro/Flash, Claude Sonnet/Opus). - Robust handling for reasoning models everywhere: <think>/<reasoning> blocks are stripped before JSON parsing and in plain-text fallbacks (Python service and Rust direct-OpenRouter path), and system prompts forbid visible chain-of-thought. - Optional OPENROUTER_REASONING_EFFORT (high/xhigh for DeepSeek V4) is forwarded as reasoning.effort and dropped automatically when a provider rejects it; JSON-mode fallback retained. - Model-aware sampling: DeepSeek rewrites at temperature 0.7 (its scale runs flat at low temps), critic at 0.25; other models keep 0.35/0.2. - Tests: think-block parsing on both analysis and refine paths, reasoning effort forwarding, DeepSeek temperature, Rust strip/clean unit test. https://claude.ai/code/session_01VGz3TieN9a29hVTc54jvyJ

- Embed a ten-rule persuasion doctrine (reader-first openers, specificity over adjectives, earn-the-ask, objection pre-emption, honest proof hierarchy, one-message-one-idea, status-safe CTAs, lead with strength) into the evaluator system prompt, the refine prompt, the critic prompt, and the Tauri fallback prompt, so every judgment and rewrite is held to the same expert bar. - Evaluator now returns ranked "top_moves" (1-3 highest-leverage changes, each with paste-ready guidance and a plain-language reason); validated defensively, generated from the weakest evidence in the neural-only fallback (Turkish + English), exposed through schema/API/types, and fed to the refiner at the top of the repair brief. - All user-facing strings (verdict, narrative, strengths, risks, moves) are now required to be plain decisive language with the neuroscience jargon kept in the structured diagnostics. - Declutter the desktop report: score + verdict + narrative + top moves + auto-refine up front; writing facets, context fit, neural signals, robustness, and variant rank fold into a collapsible "Deep dive" section. Web report gains the same Top Moves panel. https://claude.ai/code/session_01VGz3TieN9a29hVTc54jvyJ

- Add an evidence annex to the evaluator, refiner, and critic prompts with applied findings from the literature: self-relevance and neural message effectiveness (Falk et al. 2010/2016; Scholz, Chan & Falk 2025), route matching (Petty & Cacioppo ELM), loss aversion and framing with regulatory fit (Tversky & Kahneman; Higgins), similar-other social proof (Goldstein, Cialdini & Griskevicius 2008), reactance and the but-you-are-free effect (Carpenter 2013 meta-analysis), processing fluency (Alter & Oppenheimer), precise-number credibility (Janiszewski & Uy 2008), psychological targeting (Matz et al. 2017), and the commitment gradient (Freedman & Fraser 1966). - The rewrite process now explicitly selects the persuasion route (argument-led vs cue-led) and frame (gain vs avoided-loss) for the persona before drafting; the critic checks route, frame, and reactance triggers; the analysis protocol gains a framing-fit step. - Each top move now carries the research principle it rests on ("principle" field end-to-end: prompt schema, validation, neural fallback in Turkish and English, Pydantic/TS types, both UIs). - Add the behavioral-science citations to the report's research-source metadata; tighten the critic JSON template; drop two unused locals. https://claude.ai/code/session_01VGz3TieN9a29hVTc54jvyJ

- New tribe_service/research_synthesis.py links each pitch's TRIBE geometry to published findings without any LLM: weak/strong self-value maps to the Falk et al. / Scholz-Chan-Falk behavior-change lever, low processing fluency to Alter & Oppenheimer, weak encoding/attention to Chan et al. 2024, and reward-vs-social axis dominance to Cohen et al. 2024 with a route hint (reward_led / social_led / balanced). - The temporal trace is classified into citation-anchored archetypes — flat, strong-open-fade, late peak, buried lede, sustained — each with a concrete rewrite lever. - The synthesis feeds three places: a "Neural × Research Synthesis" section in the evaluator prompt (with the instruction that top moves should normally execute its strongest levers), the report robustness payload rendered as a deep-dive panel (FIX/KEEP/USE), and the desktop refine brief as "Research lever" lines. - Add Alter & Oppenheimer 2009 to the research-source metadata; new unit tests cover gap ordering, route dominance, every archetype, and garbage-input safety (108 Python tests total). https://claude.ai/code/session_01VGz3TieN9a29hVTc54jvyJ

Previously the LLM had to find the weak spans itself from a raw segment list. Now a deterministic layer extracts that from the TRIBE output and hands it over pre-digested. - localize_pitch_segments() maps the per-segment temporal trace back onto the actual words to pinpoint the opener, the strongest moment, the single weakest span, the close/CTA strength (as percentiles of the pitch's own distribution), and the "attention cliff" — the adjacent segment pair with the largest engagement drop — each tied to real text. - Ground synthesis findings in raw TRIBE features: low sustain_ratio adds a citation-anchored "engagement holds for only N% of the pitch" gap. - build_tribe_synthesis() is the shared entry point feeding the evaluator prompt (a directive "Deterministic Segment Localization — trust these spans" block, instructing the LLM to localize to them rather than re-derive), the report deep dive (a "Where TRIBE reacts" panel showing PEAK/WEAK/DROP spans on the user's own text), and the refine brief (the rewrite targets the located weakest span and attention cliff first). - The neural-only fallback now anchors its first top move to the real weakest span too (Turkish + English), so TRIBE is used fully even without an LLM. - 18 new synthesis/localization unit tests; 114 Python tests total. https://claude.ai/code/session_01VGz3TieN9a29hVTc54jvyJ

coderabbitai · 2026-06-15T12:29:11Z

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This PR adds a deterministic TRIBE research synthesis layer that converts neural axis scores and fMRI trace data into citation-anchored evidence objects. It introduces blended neural+semantic calibration via configurable SEMANTIC_BLEND_WEIGHT, a ranked top_moves output, context-fit facet scoring, and a two-pass critic refine pipeline. A new Next.js /api/refine route, a refinePitch tribe-client function, and updated Rust/Tauri refiner config wire the refine flow end-to-end. The frontend adds Top Moves, Context Fit, and collapsible Deep Dive result panels, plus switches message length metrics to word counts.

Changes

Research Synthesis, Scoring Blend & Refine Pipeline

Layer / File(s)	Summary
Shared data contracts and Pydantic schemas `shared/types.ts`, `tribe_service/schemas.py`	Adds TypeScript interfaces `ResearchSynthesisItem`, `SegmentLocation`, `SegmentLocalization`, `ResearchSynthesis`, `TopMove`, `ContextFitFacet`, `ContextFitReport`; extends `RobustnessReport` and `PitchScoreReport` with optional `top_moves` and `context_fit`. Mirrors in Pydantic: new `TopMove` model, extended `PitchScoreReport`, and `PitchRefineResponse.critic_notes`.
Deterministic research synthesis module `tribe_service/research_synthesis.py`, `tribe_service/persuasion_features.py`, `tribe_service/tests/test_research_synthesis.py`	New Python module with `localize_pitch_segments` (fMRI trace → message excerpts with peak/weakest/cliff), `_classify_temporal_archetype` (trace shape → named archetype), `synthesize_research_findings` (citation-anchored items + route hint), and `build_tribe_synthesis` entry point. Expands `RESEARCH_SOURCES` with six additional persuasion citations. Full test suite covers axis gaps/strengths, temporal archetypes, raw feature grounding, and segment localization.
LLM scoring: prompt expansion, context-fit blend, top_moves `tribe_service/llm_layer.py`	Adds context-fit facet weights, platform norms, segment-map localization directives in prompts, trace decimation, `build_tribe_synthesis` embedding in user prompt, reasoning model think-block stripping, OpenRouter retry on 400/422, `_normalise_context_fit`/`_semantic_score_from_context_fit`/`_normalise_top_moves` utilities, and reworked `_calibrate_result` with band-clamped semantic blend. Updates `_generate_neural_report` with `fmri_summary` for localized top_moves.
Score report assembly `tribe_service/app.py`, `tribe_service/tests/test_app.py`	Builds `top_moves` list from `llm_result` (filtering, priority clamping, field coercion, max 3) and sets `top_moves`/`context_fit` on `PitchScoreReport`. Test asserts non-empty `top_moves` with `title`/`do` fields on the `/score` endpoint.
Two-pass critic refine pipeline `tribe_service/llm_layer.py`, `app/api/refine/route.ts`, `lib/tribe-client.ts`, `src-tauri/src/lib.rs`	Python: `REFINE_SYSTEM_PROMPT`/`REFINE_CRITIC_SYSTEM_PROMPT`, `_post_refine_chat` with reasoning graceful degradation, `_run_refine_critic_pass` (stage-2 with fallback), updated `refine_pitch_message`, and doctrine/platform-norms prompt additions. Rust/Tauri: `DEFAULT_OPENROUTER_REFINER_MODEL` constant wired through config/merge/`refine_pitch`, `strip_think_blocks` helper, extended refine prompt. Next.js: new `/api/refine` route with auth, input validation, and `refinePitch` call. `lib/tribe-client.ts`: adds `TRIBE_REFINE_TIMEOUT_MS` and exported `refinePitch`.
LLM layer and synthesis test coverage `tribe_service/tests/test_llm_layer.py`	Adds `top_moves` to `VALID_LLM_RESPONSE`; asserts semantic blend guardrails, `top_moves`/`research_synthesis.route_hint`; adds `TestRefinePitchMessage` critic-pass/DeepSeek/think-block/reasoning-effort tests; `TestPromptIncludesContextEvidence` for segment-map prompt; `TestContextFitNormalisation` for clamping/blend behavior; `test_think_block_wrapped_response_is_parsed`.
Frontend result UI: Top Moves, Context Fit, Deep Dive `components/DesktopWorkbench.tsx`, `components/ScoreDisplay.tsx`, `app/globals.css`, `tests/components/DesktopWorkbench.test.tsx`	`DesktopWorkbench`: adds `DEFAULT_OPENROUTER_REFINER_MODEL`, threads refiner model through refine flow, adds Top Moves/Deep Dive/narrative result sections, new `extractTemporalBrief`/`extractContextFitBrief`/`extractResearchBrief` helpers, `RobustnessPanel` context-fit/semantic-blend rows, `BrainPanel` shows `cortical_mesh`, word-count metrics. `ScoreDisplay`: Top Moves panel and Context Fit panel; removes `guardrails_applied`/`persuasion_evidence`. CSS: narrative, suggestion small, and deepdive collapsible styles. Tests updated for new default model and Top Moves UI assertions.
Configuration defaults and documentation `.env.example`, `README.md`	Updates `OPENROUTER_REFINER_MODEL` default to `deepseek/deepseek-v4-pro`; adds `OPENROUTER_REFINE_CRITIC_PASS`, `OPENROUTER_REASONING_EFFORT`, `PITCHCHECK_SEMANTIC_BLEND_WEIGHT`. README expands outputs description and rewrites the neural → LLM interpretation section to describe the two-model flow, blend logic, and new env vars.

Sequence Diagram(s)

sequenceDiagram
  participant Frontend as Next.js / Desktop
  participant RefineRoute as /api/refine route
  participant TribeClient as lib/tribe-client refinePitch
  participant TribeService as tribe_service refine_pitch_message
  participant ResearchSynth as research_synthesis.build_tribe_synthesis
  participant OpenRouter as OpenRouter API

  Frontend->>RefineRoute: POST /api/refine (message, persona, platform, model)
  RefineRoute->>RefineRoute: Auth check + input validation
  RefineRoute->>TribeClient: refinePitch(request)
  TribeClient->>TribeService: POST /refine
  TribeService->>ResearchSynth: build_tribe_synthesis(message, neuro_axes, fmri)
  ResearchSynth-->>TribeService: localization + research items
  TribeService->>OpenRouter: _post_refine_chat stage-1 (REFINE_SYSTEM_PROMPT + doctrine)
  OpenRouter-->>TribeService: refined_message JSON (strips think blocks)
  alt OPENROUTER_REFINE_CRITIC_PASS=true
    TribeService->>OpenRouter: _run_refine_critic_pass stage-2
    OpenRouter-->>TribeService: improved rewrite or error → fallback stage-1
  end
  TribeService-->>TribeClient: PitchRefineResponse (refined_message, critic_notes)
  TribeClient-->>RefineRoute: {ok, data, status}
  RefineRoute-->>Frontend: 200 refined_message or error

sequenceDiagram
  participant Frontend as Next.js / Desktop
  participant TribeService as tribe_service interpret_persuasion
  participant ResearchSynth as research_synthesis
  participant OpenRouter as OpenRouter API (evaluator)

  Frontend->>TribeService: POST /score (message, persona, platform, fmri_summary)
  TribeService->>ResearchSynth: build_tribe_synthesis(message, neuro_axes, fmri_summary)
  ResearchSynth-->>TribeService: localization + items + temporal_archetype
  TribeService->>OpenRouter: chat completion with localization section + JSON schema
  OpenRouter-->>TribeService: JSON with score, top_moves, context_fit (strips think blocks)
  TribeService->>TribeService: _normalise_context_fit → _semantic_score_from_context_fit
  TribeService->>TribeService: _calibrate_result (band-clamp + SEMANTIC_BLEND_WEIGHT)
  TribeService-->>Frontend: PitchScoreReport (top_moves, context_fit, robustness with blend metadata)

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

🐇 Hop hop, the rabbit digs deep,
Neural traces mapped while you sleep!
Top moves ranked, the blend is set,
DeepSeek thinks—then strips the fret.
Context fit, a critic's pass,
The pitch shines bright through neural glass! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 24.30% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main changes: deterministic TRIBE research synthesis, ranked top moves, and DeepSeek V4 refiner integration.
Description check	✅ Passed	The description is comprehensive, covering all major changes (scoring accuracy, deterministic synthesis, persuasion doctrine, report UX, DeepSeek integration, and testing), but does not strictly follow the provided template structure.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/content-persuasiveness-refinement-19aezz

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude added 7 commits June 10, 2026 07:53

aytzey merged commit ddf5d3d into main Jun 15, 2026
3 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persuasion-master scoring: deterministic TRIBE×research synthesis, ranked top moves, DeepSeek V4 refiner#18

Persuasion-master scoring: deterministic TRIBE×research synthesis, ranked top moves, DeepSeek V4 refiner#18
aytzey merged 7 commits into
mainfrom
claude/content-persuasiveness-refinement-19aezz

aytzey commented Jun 15, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 15, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aytzey commented Jun 15, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this does

Scoring accuracy

Deterministic TRIBE × research synthesis (less work left to the LLM)

Persuasion doctrine + evidence base

Report UX

DeepSeek V4 Pro

Tests

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aytzey commented Jun 15, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 15, 2026 •

edited

Loading