Skip to content

Harden internal LLM prompt boundaries#454

Merged
mylukin merged 1 commit into
mainfrom
codex/internal-llm-xml-boundaries
Jul 4, 2026
Merged

Harden internal LLM prompt boundaries#454
mylukin merged 1 commit into
mainfrom
codex/internal-llm-xml-boundaries

Conversation

@mylukin

@mylukin mylukin commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add shared XML prompt-boundary helpers for untrusted text and JSON payloads.
  • Wrap Helm-owned internal LLM inputs in explicit XML sections, including classifier eval user text and memory LLM messages/observations.
  • Add regression coverage for XML breakout attempts and preserve the existing memory system-reminder filtering behavior.
  • Record the implementation decision in implementation-notes.md.

Validation

  • corepack pnpm exec vitest run apps/gateway/src/routes/classify.test.ts apps/gateway/src/memory-llm.test.ts
  • corepack pnpm typecheck
  • corepack pnpm lint
  • corepack pnpm build
  • git diff --check

Note

The inspected Feishu reply-gate prompt is an external caller request to /v1/chat/completions, so Helm cannot automatically infer trusted and untrusted sections inside that business prompt. This PR hardens Helm-owned internal LLM prompts; the caller that constructs the Feishu gate prompt should also XML-wrap its policy, runtime context, history, latest message, and output contract sections.

Co-Authored-By: Codex <noreply@openai.com>
@mylukin mylukin force-pushed the codex/internal-llm-xml-boundaries branch from 3354c82 to bb925ab Compare July 4, 2026 16:22
@mylukin mylukin merged commit fc38c14 into main Jul 4, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant