chore: resolve issue #113 — sanitize crawled web text in intake prompts by matthewod11-stack · Pull Request #119 · matthewod11-stack/PeoplePartner

matthewod11-stack · 2026-06-12T08:22:36Z

Auto-generated by portfolio-orchestrator nightly run on 2026-06-12 (live mode). Resolves #113.

What

The scoring path already defends against prompt injection (recruiting/scoring/sanitize.rs — defangs the <evidence>/<profile> sandbox delimiters, strips control/zero-width chars, applied at signal_extract.rs:56,85). The intake research path did not: analyze_company and analyze_profile embedded raw Exa-crawled web text directly into the LLM prompt. A candidate can plant injection text on their own crawled page → steer the company analysis, profile analysis, search strategy, and findSimilar seeds. Same attacker-controlled-content class, sanitized in one path and raw in the other (audit finding 2.2).

How

Extracted two pure prompt-builder helpers — build_company_user_prompt(url, text) and build_profile_user_prompt(serialized, fetched) — that run the existing sanitize_untrusted_text over the crawled text before it enters the prompt. No change to the sanitizer itself (bail-if: sanitizer needs intake-specific behavior changes did not trigger).
Ported the scoring injection test (defangs_angle_brackets_to_fullwidth) to the intake path: 3 new tests assert forged delimiters are defanged, zero-width/control chars stripped, and that trusted inputs (the seed URL, the user's own structured profile JSON) are preserved verbatim.

Verification

TDD red→green: with sanitization removed, both injection tests fail (raw delimiter must not survive); with it, all pass.
cargo test --manifest-path src-tauri/Cargo.toml → 790 passed / 0 failed (3 new).
cargo clippy --lib → 0 new warnings vs origin/main (intake/prod.rs clippy-clean).
Scope: 1 file (recruiting/intake/prod.rs), within max-files-changed: 3; nothing touched outside src-tauri/src/recruiting/.

Reviewer notes

This is a RECRUITING_ENABLED-flip gate per the app backlog — bounded today (recruiting flag-dead in prod, user's own key, structured output), so low blast radius, but it removes the raw-embed asymmetry before the flip.
Defense applies to crawled web text only; the seed URL and the user's structured profile input are intentionally left unsanitized (trusted seeds).

🤖 Generated with Claude Code

The scoring path already defends against prompt injection via scoring::sanitize (defangs the <evidence>/<profile> sandbox delimiters, strips control/zero-width chars), but the intake research path embedded raw Exa-crawled web text directly into the analyze_company / analyze_profile prompts. A candidate could plant injection text on their own crawled page to steer analysis, search strategy, and findSimilar seeds. Extract pure prompt-builder helpers (build_company_user_prompt / build_profile_user_prompt) that run the existing sanitize_untrusted_text over the crawled text before embedding, and port the scoring injection test to the intake path. Trusted inputs (seed URL, the user's own structured profile JSON) are left verbatim. Verification: cargo test 790 passed / 0 failed (3 new); recruiting clippy clean (0 new warnings vs origin/main). Gates the RECRUITING_ENABLED flip per backlog. Resolves #113. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Adds prompt-injection sanitization to the recruiting intake research path so Exa-crawled web text is treated the same way as scoring evidence text, closing the gap described in issue #113.

Changes:

Introduces build_company_user_prompt / build_profile_user_prompt helpers that apply sanitize_untrusted_text to untrusted crawled text before embedding it into LLM prompts.
Updates analyze_company / analyze_profile to use the new prompt-builder helpers instead of embedding raw crawled text.
Adds targeted unit tests to ensure forged delimiters are defanged, zero-width/control characters are stripped, and trusted inputs remain verbatim.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+fn build_profile_user_prompt(serialized: &str, fetched: &str) -> String {
+    if fetched.is_empty() {
+        format!("Profile input:\n{serialized}")
+    } else {
+        format!(
+            "Profile input:\n{serialized}\n\nFetched content:\n{}",
+            sanitize_untrusted_text(fetched)
+        )
+    }
+}


Copilot AI review requested due to automatic review settings June 12, 2026 08:22

Copilot started reviewing on behalf of matthewod11-stack June 12, 2026 08:23 View session

matthewod11-stack mentioned this pull request Jun 12, 2026

Intake analyze prompts embed raw Exa-crawled web text without sanitize_untrusted_text (scoring path sanitizes; intake doesn't) #113

Open

4 tasks

Copilot AI reviewed Jun 12, 2026

View reviewed changes

matthewod11-stack mentioned this pull request Jun 15, 2026

fix(security): sanitize crawled web text in intake analyze prompts (#113) #121

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: resolve issue #113 — sanitize crawled web text in intake prompts#119

chore: resolve issue #113 — sanitize crawled web text in intake prompts#119
matthewod11-stack wants to merge 1 commit into
mainfrom
chore/orchestrator-issue-113-2026-06-12

matthewod11-stack commented Jun 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

matthewod11-stack commented Jun 12, 2026

What

How

Verification

Reviewer notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants