Zero-copy, sub-millisecond guardrails for LLM input/output. Pure-Rust scanners by default; opt-in ML tier when you need it.
| Crate | What it gives you | When to use |
|---|---|---|
llm-guard |
Rules-tier scanners: substring, regex, structural. Pure Rust, no ML, no network. Sub-millisecond per scan. | Always — this is the base layer. Covers the textbook attacks (prompt-injection, role-override, secret leakage, PII, IDN homograph, markdown smuggling). |
llm-guard-ml |
ONNX-runtime-backed scanner. Drop-in implementation of the same Scanner trait. |
Add when you need to catch paraphrased / novel attacks the rules tier can't. Caller supplies the model file (no auto-download, no bundling). |
- Base — the
llm-guardcrate, default features. Microsecond per-scan, no dependencies beyondaho-corasick/regex/base64/unicode-security. This is what almost everyone actually needs. - Fuzzy —
llm-guardwith--features fuzzy. Adds theFuzzyMatchscanner: trigram-containment paraphrase detection against a curated corpus. Still microsecond range. - ML — the
llm-guard-mlcrate. ONNX classifier (~3–10 ms p99 on CPU, much less on GPU). Caller supplies the model.
[dependencies]
llm-guard = "0.2"use llm_guard::{
Pipeline, PipelineMode, BanSubstrings, InvisibleText, RoleOverride, TokenLimit,
patterns::COMMON_INJECTION_PATTERNS,
};
let guard = Pipeline::new(PipelineMode::FirstHit)
.with(TokenLimit::new(8_000))
.with(InvisibleText::new())
.with(RoleOverride::new())
.with(BanSubstrings::new("injection", COMMON_INJECTION_PATTERNS));
let result = guard.scan(user_input);
if result.should_refuse() {
// refuse the request
}[dependencies]
llm-guard = "0.2"
llm-guard-ml = "0.1"Download a model once during deployment (see crates/llm-guard-ml/README.md for the curl recipe and recommended classifiers):
mkdir -p /var/lib/llm-guard-ml/protectai-deberta-v3
curl -L -o /var/lib/llm-guard-ml/protectai-deberta-v3/model.onnx \
https://huggingface.co/ProtectAI/deberta-v3-base-prompt-injection-v2/resolve/main/onnx/model_quantized.onnx
curl -L -o /var/lib/llm-guard-ml/protectai-deberta-v3/tokenizer.json \
https://huggingface.co/ProtectAI/deberta-v3-base-prompt-injection-v2/resolve/main/tokenizer.jsonuse llm_guard::{Pipeline, PipelineMode, RoleOverride};
use llm_guard_ml::OnnxScanner;
let ml = OnnxScanner::from_file(
"/var/lib/llm-guard-ml/protectai-deberta-v3/model.onnx",
"/var/lib/llm-guard-ml/protectai-deberta-v3/tokenizer.json",
)?;
let pipeline = Pipeline::new(PipelineMode::All)
.with(RoleOverride::new()) // cheap rules-tier first
.with(ml); // ML as the backstop
# Ok::<(), llm_guard_ml::FromFileError>(())# Build everything in the workspace.
cargo build --workspace
# Test the base crate (default features).
cargo test -p llm-guard
# Test the base crate with the fuzzy paraphrase matcher.
cargo test -p llm-guard --features fuzzy
# Test the ML crate (unit tests only; smoke test needs a real model).
cargo test -p llm-guard-ml
# Run the strict zero-copy / bounded-allocation contract test.
# Release mode only - debug builds add capacity tracking that
# disappears under opt-level >= 1.
cargo test --release -p llm-guard --features fuzzy \
--test zero_alloc -- --test-threads=1
# Run the ML smoke test against a real model.
LLM_GUARD_ML_MODEL=/path/to/model.onnx \
LLM_GUARD_ML_TOKENIZER=/path/to/tokenizer.json \
cargo test --release -p llm-guard-ml --test smoke -- --nocapture
# Clippy across the whole workspace.
cargo clippy --workspace --all-targets -- -D warnings
cargo clippy --workspace --all-targets --features llm-guard/fuzzy -- -D warnings| Property | NeMo Guardrails | Guardrails AI | ZenGuard | AI-Infra-Guard | this workspace |
|---|---|---|---|---|---|
| Language | Python | Python | cloud | Go | Rust |
| Default latency | 10s–100s ms | 10s–100s ms | RTT | n/a | µs (rules), ms (ML) |
| ML required by default | yes | yes | yes | yes | no, opt-in via separate crate |
| Network at scan time | sometimes | sometimes | always | no | never |
| Zero-copy borrowed match spans | no | no | no | no | yes (rules tier) |
| IDN-homograph defence | no | no | no | no | built in |
| Layered deobfuscation pre-pass | no | no | no | no | built in |
| Confidence + severity per hit | partial | partial | no | no | every match |
See crates/llm-guard/README.md for the full per-scanner table, FP discipline notes, and measured speed matrix.
.
├── crates/
│ ├── llm-guard/ # base crate (no ML deps)
│ │ ├── src/
│ │ ├── examples/
│ │ ├── tests/
│ │ └── README.md
│ └── llm-guard-ml/ # ONNX scanner
│ ├── src/
│ ├── examples/
│ ├── tests/
│ └── README.md
├── Cargo.toml # workspace root
└── README.md # this file
MIT OR Apache-2.0