fix(security): sanitize crawled web text in intake analyze prompts (#113) by matthewod11-stack · Pull Request #121 · matthewod11-stack/PeoplePartner

matthewod11-stack · 2026-06-15T08:35:25Z

Auto-generated by portfolio-orchestrator nightly run on 2026-06-15 (live mode).

What (security — Audit finding 2.2)

The scoring signal-extraction path defends against prompt injection via sanitize_untrusted_text, but the intake research path embedded raw Exa-crawled web text directly into LLM prompts. A candidate could plant injection on their own crawled page → steer the analysis, search strategy, and findSimilar seeds. Same class of attacker-controlled content, sanitized in one path and raw in the other.

Fix

Apply the existing sanitize_untrusted_text to crawled text before embedding in both intake prompt builders:

analyze_company → sanitize content.text
analyze_profile → sanitize the fetched crawl text (the user's own structured input is left as-is — it isn't untrusted web text)

The sanitizer is reused unchanged (defangs </> so untrusted text can't forge the <evidence>/<profile> sandbox delimiters, strips control/zero-width chars, truncates). No sanitizer behavior change → no escalation per bail-if.

Test

Added analyze_company_sanitizes_crawled_text_before_llm: drives the real analyze_company path through a capturing provider and asserts the </evidence> forgery is defanged to full-width and C0 controls stripped before reaching the LLM seam — porting the scoring path's injection coverage to intake.

Verification

cargo test recruiting::intake → 74 passed (incl. the new test) ✓
Full lib suite → 789 passed, 0 failed ✓
Clippy: no new warnings in the changed file ✓
1 file changed (issue cap: max-files-changed: 3) ✓
do-not-touch respected: change is entirely within src-tauri/src/recruiting/intake/; sanitize.rs reused, not modified ✓

Resolves #113.

) The scoring signal-extraction path defends against prompt injection via sanitize_untrusted_text, but the intake research path embedded raw Exa-crawled web text directly into LLM prompts. A candidate could plant injection on their own crawled page to steer analysis, search strategy, and findSimilar seeds — the same class of attacker-controlled content sanitized in one path and raw in the other. Apply the existing sanitize_untrusted_text to crawled text before embedding in both intake prompt builders: - analyze_company: sanitize content.text - analyze_profile: sanitize the fetched crawl text (the user's own structured input is left as-is — it isn't untrusted web text) The sanitizer is reused unchanged (defangs angle brackets so untrusted text can't forge the <evidence>/<profile> sandbox delimiters, strips control/ zero-width chars, truncates). Added an intake injection test that drives the real analyze_company path through a capturing provider and asserts the </evidence> forgery is defanged and C0 controls stripped before reaching the LLM seam — porting the scoring path's injection-defense coverage. Verification: cargo test recruiting::intake 74 passed (incl. new test); full lib suite 789 passed, 0 failed; clippy introduces no new warnings in the changed file. Resolves #113. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

matthewod11-stack · 2026-06-15T08:37:40Z

Closing as a duplicate of #119 (opened 2026-06-12, still open), which already resolves #113 via an extracted-prompt-builder + TDD approach. This PR (#121) was opened by the 2026-06-15 orchestrator run, which verified the issue was open and the code still unsanitized on main but did not detect the existing open PR #119 (its fix is unmerged, so neither the issue state nor main reflected it). Deferring to the older #119; deleting this branch. Apologies for the noise.

Copilot

Pull request overview

This PR hardens the recruiting intake “research → analyze” LLM prompt construction by sanitizing Exa-crawled web text before it is embedded into prompts, aligning the intake path with the existing scoring prompt-injection defenses.

Changes:

Apply sanitize_untrusted_text to crawled company page text before it enters the analyze_company prompt.
Apply sanitize_untrusted_text to fetched profile crawl text before it enters the analyze_profile prompt.
Add an intake regression test that captures provider-bound messages and asserts sanitization occurred before reaching the LLM seam.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

                content: format!(
                    "Company page ({}):\n\n{}",
-                    content.url, content.text
+                    content.url,
+                    sanitize_untrusted_text(&content.text)
                ),


+            // `fetched` is crawled web text (attacker-controlled); sanitize before
+            // embedding. `serialized` is the user's own structured input. (#113)
+            format!(
+                "Profile input:\n{serialized}\n\nFetched content:\n{}",
+                sanitize_untrusted_text(&fetched)
+            )


Copilot AI review requested due to automatic review settings June 15, 2026 08:35

matthewod11-stack mentioned this pull request Jun 15, 2026

Intake analyze prompts embed raw Exa-crawled web text without sanitize_untrusted_text (scoring path sanitizes; intake doesn't) #113

Open

4 tasks

Copilot started reviewing on behalf of matthewod11-stack June 15, 2026 08:35 View session

matthewod11-stack closed this Jun 15, 2026

matthewod11-stack deleted the chore/orchestrator-issue-113-2026-06-15 branch June 15, 2026 08:37

Copilot AI reviewed Jun 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(security): sanitize crawled web text in intake analyze prompts (#113)#121

fix(security): sanitize crawled web text in intake analyze prompts (#113)#121
matthewod11-stack wants to merge 1 commit into
mainfrom
chore/orchestrator-issue-113-2026-06-15

matthewod11-stack commented Jun 15, 2026

Uh oh!

matthewod11-stack commented Jun 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

matthewod11-stack commented Jun 15, 2026

What (security — Audit finding 2.2)

Fix

Test

Verification

Uh oh!

matthewod11-stack commented Jun 15, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants