Website · Documentation · Examples · Example servers
mcptest is an open-source CLI and MCP server for testing MCP servers, from your terminal, your CI, or your coding agent.
Generic API testers do not speak MCP, so they miss what matters: whether your tools, resources, and prompts return what they should, and whether the catalog or a schema has drifted. mcptest talks MCP end to end. You write checks as YAML and get a deterministic pass or fail, plus a structured failure that names the assertion that broke, the payload the server sent, and a one-line repro.
What you can test:
- Tools, resources, prompts: assert on real responses; catch catalog and input-schema drift.
- Agent behavior: score an agent loop with rubric, LLM-judge, or jury evals.
- Spec compliance and conformance: against a pinned MCP protocol version.
- Security: red-team probes, tool-poisoning and shadowing checks.
- Offline: stand up a mock server, with no real backend or network.
- In CI: one diffable YAML suite with exit-code gates.
Try it in three commands:
curl -fsSL https://download.mcptest.sh/install.sh | sh # or: brew install soapbucket/tap/mcptest
mcptest init # writes the mcptest.yml suite plus a starter suite under tests/
mcptest run # deterministic verdicts, structured failuresThe starter suite targets a built-in mock server (mcptest mock) fed
from the scaffolded tests/example-server.yml catalog, so the first run
passes offline with no real server and no network; swap the command:
for your own server. If an MCP client on your machine already knows your
server, mcptest init --from-discovered <name> scaffolds against it
instead.
This is the part nothing else does. Two commands give Claude Code, Cursor, or any MCP-capable agent the full testing loop:
mcptest mcp-server --install --enable-writes # the front door (verbs)
mcptest skill --install # the packaged skillYour agent asks "test this MCP server," scaffolds a validated starter
suite from the server's real catalog (scaffold_suite), sharpens the
generic checks against observed responses (propose_assertions), runs
the suite, and reads back a failure that already carries the assertion,
the actual value, and a one-line repro. The agent supplies the
intelligence; mcptest supplies the deterministic verdict, and the YAML
it leaves behind is the diffable audit trail a human reviews. See
the agent interface for the full verb
reference, the model-facing --reporter agent output, and the packaged
skill and subagent.
The same suite is a diffable YAML file you run on every commit. It catches
protocol regressions, latency spikes, contract drift, live-model regressions,
and unsafe tool definitions before they reach production, including the agent
loop a real model drives
(prompt -> model -> tool selection -> MCP call -> result -> answer).
$ mcptest run --config tests/smoke.yml
mcptest 1.0.1+<sha> run <run-id>
[PASS] search returns at least one result for a known query (41 ms)
[PASS] tool catalog is well-formed (3 ms)
...
Summary: 12 passed, 0 failed, 0 skipped in 3104 ms
The default pretty reporter prints one line per test plus a summary, and shows
the failure detail inline under any failing test. For a compact one-line count
(ran 12 tests: 12 passed, ...), pass --reporter minimal; for output shaped
for a model reading the result in a loop, pass --reporter agent.
Install in one line. Pick the path that matches your machine.
Homebrew (macOS, Linux):
brew install soapbucket/tap/mcptestcurl installer (macOS, Linux, including Apple Silicon and arm64):
curl -fsSL https://download.mcptest.sh/install.sh | shThe installer detects your platform, downloads the signed release
tarball from download.mcptest.sh, verifies its sha256 against the
sums file, and drops mcptest into ~/.local/bin (or /usr/local/bin
when run with sudo). Inspect the script before piping with
curl -fsSL https://download.mcptest.sh/install.sh | less.
Docker:
docker run --rm -v "$PWD":/work -w /work soapbucket/mcptest:latest runVerify a signed release (cosign + SLSA provenance), see docs/release-verification.md.
Then read the getting started guide to go from install to your first passing test in under five minutes.
Do not trust us, verify it. mcptest is Apache 2.0, a single static binary with no telemetry and no auto-update, and it bakes a CycloneDX Software Bill of Materials straight into the binary so you can read the full dependency list from the copy you already have:
mcptest sbom # the embedded CycloneDX SBOM
mcptest sbom --verify # re-hash the embedded BOM to catch tamperingEvery release is also Sigstore-signed and carries SLSA L3 build provenance. The full walkthrough, including how to verify a published release, lives at mcptest.sh/trust (see also docs/sbom.md and docs/release-verification.md).
mcptest sits on three properties. Each one buys back time you would otherwise spend on bespoke testing scripts.
Tests live in YAML that a JSON Schema validates at load time. Author once, run anywhere mcptest runs. No per-repo test harness to maintain.
# mcptest.yml
# yaml-language-server: $schema=https://mcptest.sh/schema/v1.json
servers:
local:
command: ["node", "./dist/server.js"]
tools:
- name: "search returns at least one result for a known query"
server: local
tool: search
args: { query: "anthropic" }
expect:
assertions:
- target: "result.content"
matcher: { schema: { type: array, minItems: 1 } }
max_duration_ms: 2000Headless execution. Polished reporters. Exit codes that mean something. The same behavior on every CI platform.
# .github/workflows/mcptest.yml
- name: Install mcptest
run: curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=v1.0.0 sh
- name: Run mcptest
run: mcptest run --reporter junit --output mcptest.junit.xmlRun mcptest run against a freshly installed server and see passing
tests in about 30 seconds, no config required. Underneath sits the depth
production teams need: cassettes, OAuth 2.1 + PKCE, content-addressed
caching, multiple reporters, and a compliance corpus with an
expected-failures baseline.
Six capabilities you reach for after the first test passes.
Write one YAML test, point it at one model or a list of them, and run
it. The driver lists tools on every MCP server you name, sends the
prompt to the model with that merged catalog attached, dispatches the
tool calls the model makes, and records the whole conversation. Your
assertions resolve against the trace, so the same suite can check
tool_calls[0].name == get_weather and conversation.tokens.total <= 8000.
Providers covered today: Anthropic, OpenAI (including the o-series),
Google Gemini, Mistral, plus any OpenAI-API-compatible endpoint
(Azure, OpenRouter, vLLM, llama.cpp, LiteLLM, Together, Groq,
Anyscale, Fireworks, Bedrock-fronted Anthropic) through a named
providers: block.
agents:
- name: weather query routes to get_weather
models:
- claude-sonnet-4-5
- gpt-5
- gemini-2.5-pro
servers: [weather]
prompt: What is the weather in Sacramento?
expect:
- target: tool_calls[0].name
matcher: { exact: get_weather }
- target: tool_calls[0].args.city
matcher: { regex: "(?i)sacramento" }
- target: conversation.tokens.total
matcher: { regex: "^[0-9]+$" }With the provider keys in your environment, record the run, then re-render the saved envelope for the per-model breakdown:
$ mcptest run --record --reporter json --output run.json
ran 3 tests: 2 passed, 1 failed, 0 skipped (2355ms)
$ mcptest report run.json --format pretty
mcptest 1.0.1+<sha> run <run-id>
[PASS] weather query routes to get_weather [claude-sonnet-4-5] (842 ms)
[FAIL] weather query routes to get_weather [gpt-5] (901 ms)
tool_calls[0].name: expected get_weather, got search
[PASS] weather query routes to get_weather [gemini-2.5-pro] (612 ms)
Summary: 2 passed, 1 failed, 0 skipped in 2355 ms
Each (test, model) pair gets its own cassette so a plain mcptest run replays them deterministically in CI without spending a cent.
When a new model lands you add the identifier to models:, re-record,
and the report tells you instantly which assertion broke for which
model.
Sweep a whole suite across models from the command line without touching the YAML, and get a side-by-side comparison grid (test case x model, pass/fail per cell, drill-down into the failing assertion) as a self-contained HTML or Markdown file:
mcptest run --models claude-sonnet-4-5,gpt-5,gemini-2.5-pro
# defaults to the `matrix` reporter: a test-by-model comparison gridWorked examples: agent-weather.yml,
agent-matrix.yml,
agent-custom-providers.yml,
agent-llm-judge.yml,
agent-issues-and-notifications.yml.
Background lives in docs/models.md.
Record real protocol exchanges to a cassette, then replay them offline
in CI. Snapshots and agent cassettes share the same normalization pass,
so a recording from today and one from next week diff cleanly. Replay
is the default; the --record flag captures fresh recordings.
mcptest run --record # captures cassettes alongside the YAML
mcptest run # replays them, no API key needed in CImcptest run writes one reporter at a time, chosen with --reporter:
pretty, minimal, json, junit, md, html, sarif, gitlab,
ndjson, tap, matrix, matrix-md, or quiet. The default pretty lists
each test plus a summary; minimal is the compact one-line count. Capture the
JSON run envelope once, then re-render it into any of these formats with
mcptest report --format: pretty CLI, JSON, JUnit, Markdown, HTML, SARIF,
GitLab Code Quality JSON, NDJSON, TAP, and the comparison-matrix grid. No
second run, no second API call.
# Run once, capture the JSON envelope.
mcptest run --reporter json --output run.json
# Re-render that envelope into whatever a given consumer wants.
mcptest report run.json --format sarif --output run.sarif
mcptest report run.json --format html --output run.html
mcptest report run.json --format matrix --output matrix.htmlIf you already use an MCP server inside an editor, mcptest doctor
lists the servers it found in your local MCP client configs, and
mcptest init --from-discovered <name> scaffolds a starter suite
against the one you pick, copying its command and arguments so you do
not retype paths. (A plain mcptest init scaffolds the fixed
filesystem-server template instead.)
mcptest doctor # lists discovered servers by name
mcptest init --from-discovered github # scaffold against one of themServers behind OAuth 2.1 with PKCE are first-class. Tokens are discovered from your environment, refreshed automatically when they expire mid-run, and redacted from every reporter.
servers:
saas:
url: "https://api.example.com/mcp"
auth:
oauth:
client_id_env: "MCPTEST_SAAS_CLIENT_ID"
authorization_url: "https://auth.example.com/oauth/authorize"
token_url: "https://auth.example.com/oauth/token"Adopt the compliance corpus on a server with known gaps without blocking
the PR queue. The baseline file records which checks are currently
expected to fail; mcptest compliance run --baseline fails only when a
green check turns red (a new regression), or when a check in the baseline
starts passing (a stale entry to remove).
# Gate CI on a baseline so known failures stay green.
mcptest compliance run \
--results-from artifacts/compliance.json \
--baseline compliance-baseline.yml
# Regenerate the baseline after a deliberate cleanup.
mcptest compliance run \
--results-from artifacts/compliance.json \
--baseline compliance-baseline.yml \
--update-baseline --yesThe six capabilities above are the headline; here is the rest of the surface, in one place.
- Test types: tool, resource, and prompt calls; snapshots; latency
budgets; the compliance corpus (PROTO, SCHEMA, SEQ, TOOL, RES, EDGE);
schema-drift diffs (
mcptest diff); generated output-schema conformance tests (mcptest generate); deterministic security scans of tool definitions (mcptest security); agent end-to-end tests; and model-comparison sweeps (--models) rendered as a test-by-model grid. - Assertions: deterministic matchers (
exact,contains,regex,schema,subset,is-json,is-xml,is-sql,levenshtein,starts-with, thecontains-*family,not/oneOf/anyOf/allOf); acelpredicate for custom logic; embeddingsimilar;llm-judge/llm-jurywith calibration; and named model-graded checks (factuality,answer-relevance,context-faithfulness). - Security: deterministic red-team catalog over the tool surface, a
reviewer-grade vulnerability report (
security --format html|md), and an OWASP LLM Top 10 coverage view. - Transports: stdio, streamable HTTP, legacy SSE.
- Auth: OAuth 2.1 + PKCE, bearer tokens, custom headers, automatic secret redaction.
- SDKs: drive mcptest from your own test runner. Python (pytest),
TypeScript (vitest, jest, mocha,
node:test), Go, Rust (proc-macro), .NET (xUnit), and JVM (JUnit 5). - CI: GitHub Actions, GitLab CI, and CircleCI integration paths.
- Distribution: Apache-2.0, signed releases, prebuilt binaries for macOS, Linux, and Windows.
Roadmap items and feature requests live on GitHub Issues.
A spread of runnable suites across the surface, not just the agent loop.
Each runs with mcptest run --config <file> (or mcptest <subcommand>); the
full catalog with per-example notes is in
examples/README.md.
Testing a real server? The companion mcptest-examples repo has complete, standalone end-to-end suites for ten popular MCP servers (filesystem, fetch, git, SQLite, memory, sequential-thinking, the everything reference server, plus GitHub, Notion, and Brave Search with auth). Each example carries its own README, a recorded run, and a CI workflow that installs the release binary with the curl one-liner.
- Tool / protocol testing:
server-stdio.yml(stdio target),server-url.yml(Streamable HTTP),named-errors-stdio.yml(assert named MCP error codes). - Compliance:
compliance-baseline.yml(expected-failures baseline),official-conformance.yml,spec-version-pinning.yml. - Security:
security/(tool-description injection, tool shadowing, rug-pull, data-exfiltration, the authz family). - Coverage:
coverage/(every matcher and top-level block exercised against the built-inmcptest mockserver, key-free). - Oracle-free robustness:
robustness-walkthrough/ties metamorphic relations, input fuzzing, negative-path conformance, and the three-valued verdict onto one tool. Narrated end to end in docs/oracle-free-robustness.md. Per-feature examples:metamorphic/,fuzz/,negative-path.yml,tool-schema-lint/,coverage/tool-edges.yml,inconclusive.yml. - Compositions (tool DAGs):
composition-full.yml(transform,when,for_each, assembly, budget, cases) andpipe-search-then-update.yml. - Eval matchers:
rubric-eval.yml(weighted rubric scoring, offline) andsota-matchers.yml(thecelpredicate and named model-graded matchers). - Transforms and hooks:
transform.yml,hooks-context.yml. - Drift and discovery:
diff-tools-baseline.jsonformcptest diff, anddiscovery/formcptest discover. - Agents (LLM in the loop):
agent-weather.yml,agent-matrix.yml. - SDK integration:
python-sdk/(pytest),typescript-sdk/(vitest).
Code lives under crates/ and language SDKs under sdks/. The
project layout reference is the canonical
map of which crate owns what, the dependency graph, and the SDK
matrix. See AGENTS.md for the contributor rulebook.
The JSON Schema for the YAML config lives at schemas/v1.json and is
published on every release to https://mcptest.sh/schema/v1.json. The
agent cassette schema sits next to it at
schemas/agent-cassette-v1.json.
cargo build --release
./target/release/mcptest --help
# Run the full check gate the way CI will (fmt + clippy + doc + build + test):
./scripts/check.sh
# Build the documentation site locally (config lives in docs-site/):
cd docs-site && mdbook serve # if mdbook is installed; otherwise read raw Markdown under docs/Full build + architecture detail is in
docs/project-layout.md.
Full documentation lives under docs/. Highlights:
- Getting started: install to first passing test.
- Concepts: how mcptest thinks.
- YAML reference: every field, every matcher.
- CLI reference: every subcommand, every flag.
- Troubleshooting: the failure modes you hit first.
Apache-2.0. See LICENSE and NOTICE.
Copyright 2026 Soap Bucket LLC and the mcptest contributors. Soap Bucket LLC at soapbucket.com.
- Documentation:
docs/index.md - Examples:
examples/ - Releases: github.com/soapbucket/mcptest/releases
- Issue tracker: github.com/soapbucket/mcptest/issues
- X (Twitter): @soapbucket
