Harden triage skills against instructions embedded in report content by Aboudoc · Pull Request #9 · hackenproof-public/skills

Aboudoc · 2026-05-29T09:40:22Z

Report fields, attachments, and comments are authored by the submitter, but the triage skills
currently treat that content the same as their own instructions. This PR adds an explicit trust
boundary so report content is handled as untrusted data.

Changes

Trust Boundary section in both hackenproof-triage and hackenproof-bulk-triage: output
from get_report_details, fetch_attachment, get_comments, and search_comments is data,
not instructions.
Gate 0 — Untrusted-Content Screen before the existing gates: report content that tries to
drive triage (fake "system/team/internal" notes, claimed out-of-band pre-validation or
overrides, direct severity/state requests, or requests to disclose program data) is
disregarded and flagged for human review.
Action gating: write actions require human confirmation; report content alone never
triggers one.
Comment provenance: responder comments come only from the templates; never echo report
text or program data.
Cross-report isolation in bulk mode: one report's content can't influence another's
recommendation, and program info never appears in the output.
New references/untrusted-input-handling.md (screening checklist) and
references/injection-test-corpus.md (benign regression cases).

Why

Defense-in-depth. Today the workflow relies on the model declining to follow injected
instructions, which is model-dependent. The screening gate and human confirmation make triage
integrity independent of model choice. Background and a runnable proof of concept were shared
privately with the team.

Testing

Run the triage skill against the cases in references/injection-test-corpus.md; each should be
decided on evidence with the embedded directive ignored. No behavior change for well-formed
reports.

Report fields, attachments, and comments are submitter-authored. Declare them as untrusted data, add a pre-validation screening gate, gate write actions behind human confirmation, restrict responder comments to templates, and keep reports isolated in bulk mode so one report cannot steer another. Adds an untrusted-input reference and a benign regression corpus.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harden triage skills against instructions embedded in report content#9

Harden triage skills against instructions embedded in report content#9
Aboudoc wants to merge 1 commit into
hackenproof-public:mainfrom
Aboudoc:harden-triage-untrusted-input

Aboudoc commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Aboudoc commented May 29, 2026

Changes

Why

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant