fix(update-check): replace prompt injection language with neutral notification#549
Open
jessekemp1 wants to merge 1 commit into
Open
Conversation
…ification
The update check hook was using imperative override phrases
('URGENT', 'you MUST', 'before doing ANYTHING else') to force the LLM
to display a specific message block. This pattern is indistinguishable
from a prompt injection attack — a security-aware assistant will (and
should) flag or refuse it.
Fix: emit the update notice to stderr so it appears directly in the
user's terminal without entering LLM context. Keep a minimal, neutral
stdout note so the AI can mention it naturally if relevant.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The update check hook (
check_update.sh) outputs text like this when a new version is available:This pattern has two problems:
1. It's indistinguishable from a prompt injection attack.
Security-aware LLMs (including Claude) are trained to flag imperative override phrases like
URGENT,you MUST, andbefore doing ANYTHING elseas signs of injection. A user whose assistant flags a legitimate tool's hook as a potential attack loses trust in both the tool and the assistant. In my case the assistant correctly refused to display the block and warned me — which prompted this investigation.2. It enters LLM context unnecessarily.
Claude Code hooks pipe stdout into the model's system context. Routing a UI notification through the LLM adds latency, consumes tokens, and injects tool-controlled text into a trust boundary that belongs to the user — even when the intent is benign.
Fix
Move the user-visible notification to stderr (surfaces directly in the terminal, never enters LLM context). Keep a short, neutral stdout note so the AI can mention the update if asked, without being instructed to override its behavior.
Before:
After:
Why this matters beyond aesthetics
Claude Code's hook system is powerful and relatively new. How tools use it sets a precedent. If it becomes normal for hooks to use imperative override language to force LLM output, users lose the ability to trust that their assistant's responses reflect their own instructions. The version check itself works well — this is a one-line design fix.