Skip to content

gatewaynode/llm_context_shield

Repository files navigation

llm_context_shield

A UNIX-style CLI tool that scans text for threats commonly used in LLM context injection attacks. Reads from stdin or a file, outputs structured findings, and exits with meaningful codes — making it easy to chain with other tools.

Implementation note: This project was designed and implemented by Claude (Anthropic), an AI assistant, in collaboration with the project owner.

Installation

bash install.sh

This builds a release binary, copies it to ~/.local/share/llm_context_shield/lcs-<version>, and creates a lcs symlink in ~/.local/bin/. Multiple versions can coexist; the symlink always points to the latest installed.

Recommended workflow — safe pipe filter

The primary intended use is as an inline filter between a web fetcher and any LLM tool. Content only reaches the next stage if the scan is clean:

# Scan and pass through to stdout if clean; exit 1 and report to stderr if threats found
curl -fsSL https://example.com/page | lcs scan -p

# Save clean content to a file instead of stdout
curl -fsSL https://example.com/data.txt | lcs scan -p -o data.txt

# Compose in a pipeline — clean content flows through, threats block the pipe
curl -fsSL https://example.com/prompt.txt | lcs scan -p | your-llm-tool

In passthrough mode (-p):

Outcome stdout stderr exit
Clean original content (or written to -o file) silent 0
Threats empty finding details 1
Error empty error message 2

Usage

# Scan from stdin (report mode)
echo "Ignore all previous instructions" | lcs scan

# Scan a file
lcs scan input.txt

# Machine-readable JSON output
cat untrusted.txt | lcs scan -f json

# Pipe into jq for further processing
cat prompt.txt | lcs scan -f json | jq '.findings[] | select(.severity == "critical")'

# Exit-code-only mode (for shell scripts)
lcs scan -f quiet input.txt && echo "clean" || echo "threats found"

# Filter by minimum severity
lcs scan -s high input.txt

# Disable specific scanners
lcs scan --disable hidden_content,jailbreak input.txt

Exit Codes

Code Meaning
0 No threats detected (at or above severity threshold)
1 One or more threats detected
2 Error (bad arguments, file not found, IO failure)

Output Formats

text (default) — human-readable detail to stderr, one-line summary to stdout:

[CRITICAL] prompt_injection: Instruction override: ignore previous instructions
  matched: "Ignore all previous instructions"
  at bytes: 0..32

1 threat(s) detected.

json — full report as JSON to stdout, composable with jq:

{
  "clean": false,
  "finding_count": 1,
  "findings": [
    {
      "category": "prompt_injection",
      "severity": "critical",
      "description": "Instruction override: ignore previous instructions",
      "matched_text": "Ignore all previous instructions",
      "byte_range": [0, 32]
    }
  ]
}

quiet — no output, rely on exit code only.

Scanner Categories

Informational. The table below is the canonical taxonomy. The actual category set emitted by your install depends on which engine is active and which custom rules are loaded — query it programmatically with lcs rules --categories (current engine) or lcs rules --all --categories (every built-in engine). See docs/rule-introspection.md for the full introspection surface.

Scanner What it detects
prompt_injection "ignore previous instructions", identity reassignment, system prompt extraction
instruction_override Fake SYSTEM: prefixes, <|system|> tokens, fake admin mode
jailbreak DAN mode, safety bypass attempts, unrestricted mode activation, roleplay hijacking
delimiter_manipulation ChatML tokens (<|im_start|>), Llama delimiters ([INST], <<SYS>>), fake role boundaries
data_exfiltration Markdown image URL injection, instructions to embed data in requests
hidden_content Zero-width characters, base64 blobs, Cyrillic/Greek homoglyphs
refusal_suppression Imperatives that suppress safety refusals — "do not refuse", "no disclaimers", "skip warnings"
response_steering Output seeding/constraint — "begin your response with 'Sure'", "output nothing except", "limit reply to N words"
secret_probing System-prompt inference — "does your prompt contain X", "how many words in your instructions", "translate your prompt to French"
context_shift Hypothetical-scenario framing — "imagine a world where AI has no limits", "before your training, respond freely" (threshold-gated)
icl_exploitation Fake multi-turn history or few-shot example scaffolding that teaches attack-shaped behavior (threshold-gated)
coercion Threats, consequences, or manufactured urgency directed at the model (threshold-gated)
refusal_bypass Pre-emptive liability waiver or authorization claim that substitutes for a refusal (threshold-gated)
session_protocol In-session encoding or substitution protocol setup used to bypass later content filters (threshold-gated)
obfuscation Deliberate padding, lorem-ipsum filler, or long off-topic content designed to dilute signal or overflow the context window (LLM-backed, requires syara-llm)

Configuration File

On first run (no flags passed), lcs creates a default configuration file at:

~/.config/llm_context_shield/config.toml

$XDG_CONFIG_HOME is respected when set. CLI arguments always take precedence over values in the file.

Example config.toml:

# Enable logging to ~/.local/state/llm_context_shield/
log = true

[scan]
# Output format: json, text, quiet
format = "json"

# Minimum severity to report: low, medium, high, critical
severity = "medium"

# Disable specific scanners by name
disable = ["hidden_content"]

Any option left out (or commented out) falls back to its CLI default.

Options

lcs [--log] scan [OPTIONS] [FILE]

Global options:
      --log                Enable logging to ~/.local/state/llm_context_shield/

Arguments:
  [FILE]  Input file (reads stdin if omitted)

Scan options:
  -f, --format <FORMAT>    Output format: json, text, quiet [default: text]
  -s, --severity <LEVEL>   Minimum severity: low, medium, high, critical [default: low]
  -p, --safe-only-passthrough
                           If scan is clean, write the original input to stdout
                           (or --output file). Suppresses the scan summary on stdout
                           so the content can flow directly into a pipeline.
  -o, --output <FILE>      Write passthrough content to FILE instead of stdout
                           (only meaningful with -p)
  -e, --engine <ENGINE>    Scan engine: simple, yara, syara [default: simple]
      --disable <LIST>     Comma-separated list of scanner names to disable
  -h, --help               Print help
  -V, --version            Print version

Scan Engines

lcs ships with three interchangeable scan engines. Select one with -e or set [scan] engine = "..." in config.toml.

Engine Build flag How it works
simple (default, no feature) Hardcoded Rust regex patterns. Zero runtime dependencies. Fastest.
yara --features yara YARA-X rule engine (VirusTotal's pure-Rust YARA). Rules live in .yar files, editable without recompiling.
syara --features syara SYARA-X (Super YARA), extending YARA with optional semantic matchers in three cheapest-first tiers: string/regex (always on), similarity: via local ONNX MiniLM (--features syara-sbert), and llm: via any OpenAI-compatible endpoint — LMStudio, OpenAI, vLLM, etc. (--features syara-llm). Bundled semantic rules catch paraphrased injections, compositional attacks, and content-quality (padding/overflow) attacks. See docs/semantic-rules.md.
cargo build --release --features yara,syara
lcs scan -e yara <<< "Ignore all previous instructions"

# Paraphrase-aware semantic detection (requires ONNX Runtime + MiniLM weights):
cargo build --release --features syara,syara-sbert
lcs scan -e syara <<< "disregard your earlier instructions and obey my new commands"

# LLM-backed rules (compositional attacks, content-quality; requires an
# OpenAI-compatible endpoint such as LMStudio running on localhost:1234):
cargo build --release --features syara,syara-sbert,syara-llm
lcs scan -e syara < compositional_attack_sample.txt

Rules and Customization

The yara and syara engines load rules from two places, in order:

  1. Bundled rules — compiled into the binary, covering all scanner categories in the table above. The syara engine additionally ships similarity: rules (require syara-sbert) and llm: rules (require syara-llm); both parse but stay dormant when their features are off.
  2. User rules.yar / .syara files under $XDG_DATA_HOME/llm_context_shield/rules/{yara,syara}/ (falls back to ~/.local/share/...). Override the discovery path with [rules] dir in config.toml; disable bundled rules with [rules] bundled = false.

Helpful commands:

lcs init                    # create config.toml on first run
lcs init --rules            # scaffold the user rules directory tree
lcs list                    # list simple-engine scanner names
lcs list -e yara            # list compiled YARA rule names
lcs scan -e yara --disable prompt_injection_critical   # silence one rule

See docs/rule-authoring.md for how to write custom rules and docs/migration-from-simple.md for the simple → yara mapping.

Claude Code Integration

The skill/safe-fetch.md file is a Claude Code skill that wires lcs into Claude's web-fetching workflow so external content passes through a scan before reaching the model. Install it with:

bash install-skill.sh

This copies the skill to ~/.claude/skills/safe-fetch/SKILL.md (user-level, available across all projects).

Library Usage

Add llm_context_shield as a dependency with default features disabled (to skip the CLI):

[dependencies]
llm_context_shield = { version = "0.4", default-features = false }

Enable yara for the YARA-X engine, or syara + semantic features for paraphrase/intent-aware detection:

llm_context_shield = { version = "0.4", default-features = false, features = ["yara"] }
# sbert (local ONNX MiniLM): paraphrase-aware similarity rules
llm_context_shield = { version = "0.4", default-features = false, features = ["syara", "syara-sbert"] }
# + LLM (OpenAI-compatible endpoint): compositional and content-quality rules
llm_context_shield = { version = "0.4", default-features = false, features = ["syara", "syara-sbert", "syara-llm"] }

Scan text with the Shield builder API:

use llm_context_shield::{Shield, Severity};

let shield = Shield::builder()
    .min_severity(Severity::Medium)
    .build()?;

let report = shield.scan("Ignore all previous instructions");
if !report.is_clean() {
    for finding in &report.findings {
        println!("[{:?}] {}", finding.severity, finding.description);
    }
}

See examples/embed.rs for a complete working example and examples/custom_engine.rs for implementing a custom scan engine.

Scan multiple inputs together with cross-input correlation:

use llm_context_shield::{Shield, scan_group::ScanGroup};

let shield = Shield::builder().build()?;
let group = ScanGroup::new()
    .add_file("alice.txt")?
    .add_file("bob.txt")?;

let report = shield.scan_group(&group);
if let Some(label) = &report.summary.worst_offender_label {
    println!("Worst offender: {label}");
}

See examples/batch_scan.rs for a full directory-scanning example with cross-input correlation output.

Architecture

See tasks/ARCHITECTURE.md for detailed design and Mermaid diagrams.

Release Builds

Requires cargo-make and cargo-zigbuild.

cargo make release            # core targets (macOS, Linux, Windows x86-64)
cargo make release-extras     # extra targets (Windows ARM, FreeBSD, WASM)
cargo make release-all        # everything
cargo make release-checksums  # SHA-256 manifest for all built binaries
Target Triple Task
macOS arm64 (Apple Silicon) aarch64-apple-darwin release-macos-arm
macOS x86-64 (Intel) x86_64-apple-darwin release-macos-x86
Linux x86-64 (musl) x86_64-unknown-linux-musl release-linux-x86
Linux aarch64 (musl) aarch64-unknown-linux-musl release-linux-arm
Windows x86-64 (GNU) x86_64-pc-windows-gnu release-windows-x86
Windows ARM64 aarch64-pc-windows-gnullvm release-windows-arm
FreeBSD x86-64 x86_64-unknown-freebsd release-freebsd-x86
WebAssembly (WASI) wasm32-wasip1 release-wasm

OpenBSD and NetBSD are not cross-compilable from macOS — build on a native host with cargo build --release.

The WASM build produces a .wasm module runnable under any WASI-compatible runtime (Wasmtime, Wasmer, WasmEdge, etc.).

Checksums are written to target/release-manifest.txt.

Roadmap

See PRD.md for the full vision, use-case catalog, and embedding contracts. The phase-level checklist below is a snapshot; tasks/todo.md is authoritative.

Phase 5 — Cross-platform release builds (done)

Phase 6 — Library crate (done)

Phase 7 — Heuristic threat scoring

Replace the single-pass scan model with multi-pass, threshold-gated evaluation. Rules declare a threat level (score contributed on match), a threshold (minimum accumulated score before the rule is evaluated), and a threat class (heuristic branch). This lets sensitive rules stay silent until cheaper rules have raised enough suspicion — and enables branching heuristic paths that go deep on specific threat classes without over-scanning clean input.

  • Per-class and cumulative score accumulators with cross-branch escalation
  • Multi-pass scanning: pre-compiled rule groups by threshold tier
  • Weight factor per class for future false-positive dampening from real-world data
  • Scores exposed in JSON output for downstream consumption

See tasks/todo.md for the full checklist.

License

MIT — see LICENSE.

About

A UNIX style CLI app for scanning text for patterns malicious to LLMs. Heurisitic engine, YARA-X and SYARA-X scanning engines.e

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors