Skip to content

Persuasion-master scoring: deterministic TRIBE×research synthesis, ranked top moves, DeepSeek V4 refiner#18

Merged
aytzey merged 7 commits into
mainfrom
claude/content-persuasiveness-refinement-19aezz
Jun 15, 2026
Merged

Persuasion-master scoring: deterministic TRIBE×research synthesis, ranked top moves, DeepSeek V4 refiner#18
aytzey merged 7 commits into
mainfrom
claude/content-persuasiveness-refinement-19aezz

Conversation

@aytzey

@aytzey aytzey commented Jun 15, 2026

Copy link
Copy Markdown
Owner

What this does

Turns the report from a panel-and-jargon dump into a decisive persuasion-master verdict, and makes the score genuinely reflect whether this message will move this reader on this channel — while doing far more of the interpretive work deterministically from the TRIBE output instead of leaving it to the LLM.

Scoring accuracy

  • The TRIBE neural prior still anchors the final score, but the semantic side is now derived from rubric-scored context-fit facets (persona pain, objection coverage, proof credibility, CTA ease, channel fit), clamped to the neural band and blended in with a quality-weighted weight. Same neural evidence + strong vs poor context fit now produces materially different scores; prompt injection still can't escape the band clamp.

Deterministic TRIBE × research synthesis (less work left to the LLM)

  • Segment localization: the per-segment temporal trace is mapped back onto the actual words to pinpoint — in Python, no LLM — the opener strength, strongest moment, single weakest span, close/CTA strength, and the "attention cliff". The LLM and report receive the exact text spans instead of guessing them.
  • Research findings: axis geometry and raw TRIBE features (e.g. sustain_ratio) linked to published findings (Falk et al., Alter & Oppenheimer, Cohen et al. 2024), with a reward-vs-social route hint.
  • Temporal archetype: trace shape classified (strong-open-fade, late peak, buried lede, flat, sustained) per Chan et al. 2024.
  • Feeds the evaluator prompt, the report deep dive ("Where TRIBE reacts"), and the refine brief; the neural-only fallback also anchors its top move to the real weakest span.

Persuasion doctrine + evidence base

  • A ten-rule persuasion doctrine plus an applied evidence annex (ELM, loss aversion/framing, similar-other social proof, reactance / "but you are free", fluency, precise numbers, psychological targeting, commitment gradient) embedded in the evaluator, refiner, and critic prompts. Each top move names the principle it rests on.

Report UX

  • Score → verdict → narrative → ranked top moves → auto-refine up front; facets, context fit, neural signals, robustness, localization, and research synthesis fold into a collapsible Deep dive.
  • Removed fabricated content: fake "persona baseline" variant row, fake latency/token estimates, invented confidence fallback, and the hardcoded web "preview rewrite" (now a real /api/refine route).

DeepSeek V4 Pro

  • Default refiner is deepseek/deepseek-v4-pro via OpenRouter (verified current flagship), evaluator stays Claude Sonnet. First-class reasoning-model handling: <think> stripping, optional reasoning.effort, JSON-mode fallback, model-aware sampling temperatures.

Tests

  • 114 Python tests (synthesis/localization, context-fit sensitivity, injection bound, DeepSeek/reasoning handling), 45 Vitest, ESLint, tsc, and Next.js build all green. Rust desktop changes are string/prompt-only (CI verifies the cargo build).

https://claude.ai/code/session_01VGz3TieN9a29hVTc54jvyJ


Generated by Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Added "Top Moves" ranked recommendations and collapsible "Deep Dive" sections for deeper pitch scoring insights.
    • Introduced "Context Fit" scoring facet showing semantic assessment of message-audience alignment.
    • Refined pitch refinement with dedicated model selection and improved suggestion workflow.
    • Switched message metrics from token estimates to word counts in the editor.
    • Enhanced evidence reporting with research-backed persuasion principles and localized segment analysis.
  • Documentation

    • Updated configuration guide with refiner model defaults and semantic blending controls.

claude added 7 commits June 10, 2026 07:53
Analysis:
- Blend the band-clamped LLM semantic score into the final score with a
  confidence- and prediction-quality-weighted contribution, so persona and
  channel fit genuinely move the score while staying anchored to the TRIBE
  neural calibration band (PITCHCHECK_SEMANTIC_BLEND_WEIGHT, default 0.45).
- Add a temporal segment map that ties each TRIBE trace segment to the
  approximate text span of the pitch, with strongest/weakest callouts, so
  analysis and rewrites localize to exact sentences.
- Inject per-platform channel norms (email, LinkedIn, cold call, landing
  page, ad copy) into the analysis prompt and judge structure against them.
- Add a semantic analysis protocol (persona decision model, argument
  quality, persuasion route, channel fit, CTA friction) and a structured
  context_fit block (pain alignment, objection coverage, proof credibility,
  CTA ease, channel fit, decision driver, open objection), validated
  defensively and surfaced in desktop and web report panels.

Refinement:
- Upgrade the refine prompt with channel norms, a three-candidate internal
  drafting protocol with rubric scoring, and a final self-check against
  invented facts, language drift, and CTA friction.
- Add an optional second critic pass (OPENROUTER_REFINE_CRITIC_PASS,
  default on) that critiques the stage-1 rewrite against a persuasion
  checklist and returns a strictly better final version, falling back to
  the stage-1 rewrite on any failure.
- Enrich the desktop refine brief with the baseline score to beat, weakest/
  strongest temporal segment excerpts, and context-fit gaps.
- Mirror the candidate protocol and self-check in the Tauri direct
  OpenRouter fallback prompt.

https://claude.ai/code/session_01VGz3TieN9a29hVTc54jvyJ
Scoring accuracy:
- Derive the semantic score from the rubric-scored context-fit facets
  (pain alignment 0.30, proof credibility 0.25, objection coverage 0.15,
  CTA ease 0.15, channel fit 0.15) instead of trusting the LLM's single
  self-reported number; an injected "score this 100" now has to corrupt
  every facet and still hits the band clamp.
- Widen the semantic band (14-30 points by confidence) and invert the
  quality coupling: when TRIBE evidence is weak the quality-shrunk neural
  prior carries less of the final score and the semantic context-fit read
  carries more (base weight 0.55, up to 0.85), so the same neural evidence
  with strong vs poor context fit now produces materially different scores.
- Expose context_fit_score and semantic_blend_weight in robustness and in
  the desktop calibration panel; tests prove context sensitivity and the
  injection bound.

Prompt slimming:
- Drop the research-source list and methodology strings from the LLM
  prompt diagnostics; decimate temporal traces beyond 48 segments (the
  segment map already localizes weak/strong spans).

Remove fake and filler content:
- Web refine is now real: new /api/refine route proxies the TRIBE /refine
  endpoint (same auth and limits as /api/score); the hardcoded
  sample-pitch "preview rewrite" is deleted.
- Delete the fabricated "Persona baseline" variant-rank row, the fake
  latency/token estimates (now real word counts and mesh info), and the
  invented confidence fallback (confidence shows only when measured).
- Drop the "Semantic Context" methodology panel and raw guardrail dump
  from the web report; replace the stale "text heuristics off" robustness
  rows with honest context-fit evidence and blend-weight rows.

https://claude.ai/code/session_01VGz3TieN9a29hVTc54jvyJ
- Default refiner model is now deepseek/deepseek-v4-pro via OpenRouter
  (verified current flagship id; evaluator stays Claude Sonnet); desktop,
  Tauri fallback chains, service env, and settings UI all updated, with a
  model-suggestion datalist (DeepSeek V4 Pro/Flash, Claude Sonnet/Opus).
- Robust handling for reasoning models everywhere: <think>/<reasoning>
  blocks are stripped before JSON parsing and in plain-text fallbacks
  (Python service and Rust direct-OpenRouter path), and system prompts
  forbid visible chain-of-thought.
- Optional OPENROUTER_REASONING_EFFORT (high/xhigh for DeepSeek V4) is
  forwarded as reasoning.effort and dropped automatically when a provider
  rejects it; JSON-mode fallback retained.
- Model-aware sampling: DeepSeek rewrites at temperature 0.7 (its scale
  runs flat at low temps), critic at 0.25; other models keep 0.35/0.2.
- Tests: think-block parsing on both analysis and refine paths, reasoning
  effort forwarding, DeepSeek temperature, Rust strip/clean unit test.

https://claude.ai/code/session_01VGz3TieN9a29hVTc54jvyJ
- Embed a ten-rule persuasion doctrine (reader-first openers, specificity
  over adjectives, earn-the-ask, objection pre-emption, honest proof
  hierarchy, one-message-one-idea, status-safe CTAs, lead with strength)
  into the evaluator system prompt, the refine prompt, the critic prompt,
  and the Tauri fallback prompt, so every judgment and rewrite is held to
  the same expert bar.
- Evaluator now returns ranked "top_moves" (1-3 highest-leverage changes,
  each with paste-ready guidance and a plain-language reason); validated
  defensively, generated from the weakest evidence in the neural-only
  fallback (Turkish + English), exposed through schema/API/types, and fed
  to the refiner at the top of the repair brief.
- All user-facing strings (verdict, narrative, strengths, risks, moves)
  are now required to be plain decisive language with the neuroscience
  jargon kept in the structured diagnostics.
- Declutter the desktop report: score + verdict + narrative + top moves +
  auto-refine up front; writing facets, context fit, neural signals,
  robustness, and variant rank fold into a collapsible "Deep dive"
  section. Web report gains the same Top Moves panel.

https://claude.ai/code/session_01VGz3TieN9a29hVTc54jvyJ
- Add an evidence annex to the evaluator, refiner, and critic prompts with
  applied findings from the literature: self-relevance and neural message
  effectiveness (Falk et al. 2010/2016; Scholz, Chan & Falk 2025), route
  matching (Petty & Cacioppo ELM), loss aversion and framing with
  regulatory fit (Tversky & Kahneman; Higgins), similar-other social proof
  (Goldstein, Cialdini & Griskevicius 2008), reactance and the
  but-you-are-free effect (Carpenter 2013 meta-analysis), processing
  fluency (Alter & Oppenheimer), precise-number credibility (Janiszewski &
  Uy 2008), psychological targeting (Matz et al. 2017), and the commitment
  gradient (Freedman & Fraser 1966).
- The rewrite process now explicitly selects the persuasion route
  (argument-led vs cue-led) and frame (gain vs avoided-loss) for the
  persona before drafting; the critic checks route, frame, and reactance
  triggers; the analysis protocol gains a framing-fit step.
- Each top move now carries the research principle it rests on
  ("principle" field end-to-end: prompt schema, validation, neural
  fallback in Turkish and English, Pydantic/TS types, both UIs).
- Add the behavioral-science citations to the report's research-source
  metadata; tighten the critic JSON template; drop two unused locals.

https://claude.ai/code/session_01VGz3TieN9a29hVTc54jvyJ
- New tribe_service/research_synthesis.py links each pitch's TRIBE
  geometry to published findings without any LLM: weak/strong self-value
  maps to the Falk et al. / Scholz-Chan-Falk behavior-change lever, low
  processing fluency to Alter & Oppenheimer, weak encoding/attention to
  Chan et al. 2024, and reward-vs-social axis dominance to Cohen et al.
  2024 with a route hint (reward_led / social_led / balanced).
- The temporal trace is classified into citation-anchored archetypes —
  flat, strong-open-fade, late peak, buried lede, sustained — each with a
  concrete rewrite lever.
- The synthesis feeds three places: a "Neural × Research Synthesis"
  section in the evaluator prompt (with the instruction that top moves
  should normally execute its strongest levers), the report robustness
  payload rendered as a deep-dive panel (FIX/KEEP/USE), and the desktop
  refine brief as "Research lever" lines.
- Add Alter & Oppenheimer 2009 to the research-source metadata; new unit
  tests cover gap ordering, route dominance, every archetype, and
  garbage-input safety (108 Python tests total).

https://claude.ai/code/session_01VGz3TieN9a29hVTc54jvyJ
Previously the LLM had to find the weak spans itself from a raw segment
list. Now a deterministic layer extracts that from the TRIBE output and
hands it over pre-digested.

- localize_pitch_segments() maps the per-segment temporal trace back onto
  the actual words to pinpoint the opener, the strongest moment, the
  single weakest span, the close/CTA strength (as percentiles of the
  pitch's own distribution), and the "attention cliff" — the adjacent
  segment pair with the largest engagement drop — each tied to real text.
- Ground synthesis findings in raw TRIBE features: low sustain_ratio adds
  a citation-anchored "engagement holds for only N% of the pitch" gap.
- build_tribe_synthesis() is the shared entry point feeding the evaluator
  prompt (a directive "Deterministic Segment Localization — trust these
  spans" block, instructing the LLM to localize to them rather than
  re-derive), the report deep dive (a "Where TRIBE reacts" panel showing
  PEAK/WEAK/DROP spans on the user's own text), and the refine brief (the
  rewrite targets the located weakest span and attention cliff first).
- The neural-only fallback now anchors its first top move to the real
  weakest span too (Turkish + English), so TRIBE is used fully even
  without an LLM.
- 18 new synthesis/localization unit tests; 114 Python tests total.

https://claude.ai/code/session_01VGz3TieN9a29hVTc54jvyJ
@coderabbitai

coderabbitai Bot commented Jun 15, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This PR adds a deterministic TRIBE research synthesis layer that converts neural axis scores and fMRI trace data into citation-anchored evidence objects. It introduces blended neural+semantic calibration via configurable SEMANTIC_BLEND_WEIGHT, a ranked top_moves output, context-fit facet scoring, and a two-pass critic refine pipeline. A new Next.js /api/refine route, a refinePitch tribe-client function, and updated Rust/Tauri refiner config wire the refine flow end-to-end. The frontend adds Top Moves, Context Fit, and collapsible Deep Dive result panels, plus switches message length metrics to word counts.

Changes

Research Synthesis, Scoring Blend & Refine Pipeline

Layer / File(s) Summary
Shared data contracts and Pydantic schemas
shared/types.ts, tribe_service/schemas.py
Adds TypeScript interfaces ResearchSynthesisItem, SegmentLocation, SegmentLocalization, ResearchSynthesis, TopMove, ContextFitFacet, ContextFitReport; extends RobustnessReport and PitchScoreReport with optional top_moves and context_fit. Mirrors in Pydantic: new TopMove model, extended PitchScoreReport, and PitchRefineResponse.critic_notes.
Deterministic research synthesis module
tribe_service/research_synthesis.py, tribe_service/persuasion_features.py, tribe_service/tests/test_research_synthesis.py
New Python module with localize_pitch_segments (fMRI trace → message excerpts with peak/weakest/cliff), _classify_temporal_archetype (trace shape → named archetype), synthesize_research_findings (citation-anchored items + route hint), and build_tribe_synthesis entry point. Expands RESEARCH_SOURCES with six additional persuasion citations. Full test suite covers axis gaps/strengths, temporal archetypes, raw feature grounding, and segment localization.
LLM scoring: prompt expansion, context-fit blend, top_moves
tribe_service/llm_layer.py
Adds context-fit facet weights, platform norms, segment-map localization directives in prompts, trace decimation, build_tribe_synthesis embedding in user prompt, reasoning model think-block stripping, OpenRouter retry on 400/422, _normalise_context_fit/_semantic_score_from_context_fit/_normalise_top_moves utilities, and reworked _calibrate_result with band-clamped semantic blend. Updates _generate_neural_report with fmri_summary for localized top_moves.
Score report assembly
tribe_service/app.py, tribe_service/tests/test_app.py
Builds top_moves list from llm_result (filtering, priority clamping, field coercion, max 3) and sets top_moves/context_fit on PitchScoreReport. Test asserts non-empty top_moves with title/do fields on the /score endpoint.
Two-pass critic refine pipeline
tribe_service/llm_layer.py, app/api/refine/route.ts, lib/tribe-client.ts, src-tauri/src/lib.rs
Python: REFINE_SYSTEM_PROMPT/REFINE_CRITIC_SYSTEM_PROMPT, _post_refine_chat with reasoning graceful degradation, _run_refine_critic_pass (stage-2 with fallback), updated refine_pitch_message, and doctrine/platform-norms prompt additions. Rust/Tauri: DEFAULT_OPENROUTER_REFINER_MODEL constant wired through config/merge/refine_pitch, strip_think_blocks helper, extended refine prompt. Next.js: new /api/refine route with auth, input validation, and refinePitch call. lib/tribe-client.ts: adds TRIBE_REFINE_TIMEOUT_MS and exported refinePitch.
LLM layer and synthesis test coverage
tribe_service/tests/test_llm_layer.py
Adds top_moves to VALID_LLM_RESPONSE; asserts semantic blend guardrails, top_moves/research_synthesis.route_hint; adds TestRefinePitchMessage critic-pass/DeepSeek/think-block/reasoning-effort tests; TestPromptIncludesContextEvidence for segment-map prompt; TestContextFitNormalisation for clamping/blend behavior; test_think_block_wrapped_response_is_parsed.
Frontend result UI: Top Moves, Context Fit, Deep Dive
components/DesktopWorkbench.tsx, components/ScoreDisplay.tsx, app/globals.css, tests/components/DesktopWorkbench.test.tsx
DesktopWorkbench: adds DEFAULT_OPENROUTER_REFINER_MODEL, threads refiner model through refine flow, adds Top Moves/Deep Dive/narrative result sections, new extractTemporalBrief/extractContextFitBrief/extractResearchBrief helpers, RobustnessPanel context-fit/semantic-blend rows, BrainPanel shows cortical_mesh, word-count metrics. ScoreDisplay: Top Moves panel and Context Fit panel; removes guardrails_applied/persuasion_evidence. CSS: narrative, suggestion small, and deepdive collapsible styles. Tests updated for new default model and Top Moves UI assertions.
Configuration defaults and documentation
.env.example, README.md
Updates OPENROUTER_REFINER_MODEL default to deepseek/deepseek-v4-pro; adds OPENROUTER_REFINE_CRITIC_PASS, OPENROUTER_REASONING_EFFORT, PITCHCHECK_SEMANTIC_BLEND_WEIGHT. README expands outputs description and rewrites the neural → LLM interpretation section to describe the two-model flow, blend logic, and new env vars.

Sequence Diagram(s)

sequenceDiagram
  participant Frontend as Next.js / Desktop
  participant RefineRoute as /api/refine route
  participant TribeClient as lib/tribe-client refinePitch
  participant TribeService as tribe_service refine_pitch_message
  participant ResearchSynth as research_synthesis.build_tribe_synthesis
  participant OpenRouter as OpenRouter API

  Frontend->>RefineRoute: POST /api/refine (message, persona, platform, model)
  RefineRoute->>RefineRoute: Auth check + input validation
  RefineRoute->>TribeClient: refinePitch(request)
  TribeClient->>TribeService: POST /refine
  TribeService->>ResearchSynth: build_tribe_synthesis(message, neuro_axes, fmri)
  ResearchSynth-->>TribeService: localization + research items
  TribeService->>OpenRouter: _post_refine_chat stage-1 (REFINE_SYSTEM_PROMPT + doctrine)
  OpenRouter-->>TribeService: refined_message JSON (strips think blocks)
  alt OPENROUTER_REFINE_CRITIC_PASS=true
    TribeService->>OpenRouter: _run_refine_critic_pass stage-2
    OpenRouter-->>TribeService: improved rewrite or error → fallback stage-1
  end
  TribeService-->>TribeClient: PitchRefineResponse (refined_message, critic_notes)
  TribeClient-->>RefineRoute: {ok, data, status}
  RefineRoute-->>Frontend: 200 refined_message or error
Loading
sequenceDiagram
  participant Frontend as Next.js / Desktop
  participant TribeService as tribe_service interpret_persuasion
  participant ResearchSynth as research_synthesis
  participant OpenRouter as OpenRouter API (evaluator)

  Frontend->>TribeService: POST /score (message, persona, platform, fmri_summary)
  TribeService->>ResearchSynth: build_tribe_synthesis(message, neuro_axes, fmri_summary)
  ResearchSynth-->>TribeService: localization + items + temporal_archetype
  TribeService->>OpenRouter: chat completion with localization section + JSON schema
  OpenRouter-->>TribeService: JSON with score, top_moves, context_fit (strips think blocks)
  TribeService->>TribeService: _normalise_context_fit → _semantic_score_from_context_fit
  TribeService->>TribeService: _calibrate_result (band-clamp + SEMANTIC_BLEND_WEIGHT)
  TribeService-->>Frontend: PitchScoreReport (top_moves, context_fit, robustness with blend metadata)
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

🐇 Hop hop, the rabbit digs deep,
Neural traces mapped while you sleep!
Top moves ranked, the blend is set,
DeepSeek thinks—then strips the fret.
Context fit, a critic's pass,
The pitch shines bright through neural glass! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 24.30% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main changes: deterministic TRIBE research synthesis, ranked top moves, and DeepSeek V4 refiner integration.
Description check ✅ Passed The description is comprehensive, covering all major changes (scoring accuracy, deterministic synthesis, persuasion doctrine, report UX, DeepSeek integration, and testing), but does not strictly follow the provided template structure.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/content-persuasiveness-refinement-19aezz

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@aytzey aytzey merged commit ddf5d3d into main Jun 15, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants