Skip to content

soapbucket/mcptest

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

mcptest

mcptest

Website · Documentation · Examples · Example servers

CI Crates.io License Rust GitHub stars Docs

mcptest is an open-source CLI and MCP server for testing MCP servers, from your terminal, your CI, or your coding agent.

Generic API testers do not speak MCP, so they miss what matters: whether your tools, resources, and prompts return what they should, and whether the catalog or a schema has drifted. mcptest talks MCP end to end. You write checks as YAML and get a deterministic pass or fail, plus a structured failure that names the assertion that broke, the payload the server sent, and a one-line repro.

What you can test:

  • Tools, resources, prompts: assert on real responses; catch catalog and input-schema drift.
  • Agent behavior: score an agent loop with rubric, LLM-judge, or jury evals.
  • Spec compliance and conformance: against a pinned MCP protocol version.
  • Security: red-team probes, tool-poisoning and shadowing checks.
  • Offline: stand up a mock server, with no real backend or network.
  • In CI: one diffable YAML suite with exit-code gates.

A terminal session: a scaffolded YAML test case, then mcptest run passing it offline against the built-in mock server, then mcptest conformance run with every probed MCP spec requirement passing

Try it in three commands:

curl -fsSL https://download.mcptest.sh/install.sh | sh   # or: brew install soapbucket/tap/mcptest
mcptest init   # writes the mcptest.yml suite plus a starter suite under tests/
mcptest run    # deterministic verdicts, structured failures

The starter suite targets a built-in mock server (mcptest mock) fed from the scaffolded tests/example-server.yml catalog, so the first run passes offline with no real server and no network; swap the command: for your own server. If an MCP client on your machine already knows your server, mcptest init --from-discovered <name> scaffolds against it instead.

Use mcptest from your coding agent

This is the part nothing else does. Two commands give Claude Code, Cursor, or any MCP-capable agent the full testing loop:

mcptest mcp-server --install --enable-writes   # the front door (verbs)
mcptest skill --install                        # the packaged skill

Your agent asks "test this MCP server," scaffolds a validated starter suite from the server's real catalog (scaffold_suite), sharpens the generic checks against observed responses (propose_assertions), runs the suite, and reads back a failure that already carries the assertion, the actual value, and a one-line repro. The agent supplies the intelligence; mcptest supplies the deterministic verdict, and the YAML it leaves behind is the diffable audit trail a human reviews. See the agent interface for the full verb reference, the model-facing --reporter agent output, and the packaged skill and subagent.

Or run it yourself, in CI

The same suite is a diffable YAML file you run on every commit. It catches protocol regressions, latency spikes, contract drift, live-model regressions, and unsafe tool definitions before they reach production, including the agent loop a real model drives (prompt -> model -> tool selection -> MCP call -> result -> answer).

$ mcptest run --config tests/smoke.yml
mcptest 1.0.1+<sha> run <run-id>

  [PASS] search returns at least one result for a known query  (41 ms)
  [PASS] tool catalog is well-formed  (3 ms)
  ...
Summary: 12 passed, 0 failed, 0 skipped in 3104 ms

The default pretty reporter prints one line per test plus a summary, and shows the failure detail inline under any failing test. For a compact one-line count (ran 12 tests: 12 passed, ...), pass --reporter minimal; for output shaped for a model reading the result in a loop, pass --reporter agent.

Install in one line. Pick the path that matches your machine.

Homebrew (macOS, Linux):

brew install soapbucket/tap/mcptest

curl installer (macOS, Linux, including Apple Silicon and arm64):

curl -fsSL https://download.mcptest.sh/install.sh | sh

The installer detects your platform, downloads the signed release tarball from download.mcptest.sh, verifies its sha256 against the sums file, and drops mcptest into ~/.local/bin (or /usr/local/bin when run with sudo). Inspect the script before piping with curl -fsSL https://download.mcptest.sh/install.sh | less.

Docker:

docker run --rm -v "$PWD":/work -w /work soapbucket/mcptest:latest run

Verify a signed release (cosign + SLSA provenance), see docs/release-verification.md.

Then read the getting started guide to go from install to your first passing test in under five minutes.

Can I trust mcptest?

Do not trust us, verify it. mcptest is Apache 2.0, a single static binary with no telemetry and no auto-update, and it bakes a CycloneDX Software Bill of Materials straight into the binary so you can read the full dependency list from the copy you already have:

mcptest sbom            # the embedded CycloneDX SBOM
mcptest sbom --verify   # re-hash the embedded BOM to catch tampering

Every release is also Sigstore-signed and carries SLSA L3 build provenance. The full walkthrough, including how to verify a published release, lives at mcptest.sh/trust (see also docs/sbom.md and docs/release-verification.md).

Why mcptest

mcptest sits on three properties. Each one buys back time you would otherwise spend on bespoke testing scripts.

Spec-driven

Tests live in YAML that a JSON Schema validates at load time. Author once, run anywhere mcptest runs. No per-repo test harness to maintain.

# mcptest.yml
# yaml-language-server: $schema=https://mcptest.sh/schema/v1.json

servers:
  local:
    command: ["node", "./dist/server.js"]

tools:
  - name: "search returns at least one result for a known query"
    server: local
    tool: search
    args: { query: "anthropic" }
    expect:
      assertions:
        - target: "result.content"
          matcher: { schema: { type: array, minItems: 1 } }
      max_duration_ms: 2000

CI-first

Headless execution. Polished reporters. Exit codes that mean something. The same behavior on every CI platform.

# .github/workflows/mcptest.yml
- name: Install mcptest
  run: curl -fsSL https://download.mcptest.sh/install.sh | MCPTEST_VERSION=v1.0.0 sh
- name: Run mcptest
  run: mcptest run --reporter junit --output mcptest.junit.xml

Sane defaults, advanced when needed

Run mcptest run against a freshly installed server and see passing tests in about 30 seconds, no config required. Underneath sits the depth production teams need: cassettes, OAuth 2.1 + PKCE, content-addressed caching, multiple reporters, and a compliance corpus with an expected-failures baseline.

Features

Six capabilities you reach for after the first test passes.

1. Agent end-to-end testing

Write one YAML test, point it at one model or a list of them, and run it. The driver lists tools on every MCP server you name, sends the prompt to the model with that merged catalog attached, dispatches the tool calls the model makes, and records the whole conversation. Your assertions resolve against the trace, so the same suite can check tool_calls[0].name == get_weather and conversation.tokens.total <= 8000.

Providers covered today: Anthropic, OpenAI (including the o-series), Google Gemini, Mistral, plus any OpenAI-API-compatible endpoint (Azure, OpenRouter, vLLM, llama.cpp, LiteLLM, Together, Groq, Anyscale, Fireworks, Bedrock-fronted Anthropic) through a named providers: block.

agents:
  - name: weather query routes to get_weather
    models:
      - claude-sonnet-4-5
      - gpt-5
      - gemini-2.5-pro
    servers: [weather]
    prompt: What is the weather in Sacramento?
    expect:
      - target: tool_calls[0].name
        matcher: { exact: get_weather }
      - target: tool_calls[0].args.city
        matcher: { regex: "(?i)sacramento" }
      - target: conversation.tokens.total
        matcher: { regex: "^[0-9]+$" }

With the provider keys in your environment, record the run, then re-render the saved envelope for the per-model breakdown:

$ mcptest run --record --reporter json --output run.json
ran 3 tests: 2 passed, 1 failed, 0 skipped (2355ms)

$ mcptest report run.json --format pretty
mcptest 1.0.1+<sha> run <run-id>

  [PASS] weather query routes to get_weather [claude-sonnet-4-5]  (842 ms)
  [FAIL] weather query routes to get_weather [gpt-5]  (901 ms)
      tool_calls[0].name: expected get_weather, got search
  [PASS] weather query routes to get_weather [gemini-2.5-pro]  (612 ms)

Summary: 2 passed, 1 failed, 0 skipped in 2355 ms

Each (test, model) pair gets its own cassette so a plain mcptest run replays them deterministically in CI without spending a cent. When a new model lands you add the identifier to models:, re-record, and the report tells you instantly which assertion broke for which model.

Sweep a whole suite across models from the command line without touching the YAML, and get a side-by-side comparison grid (test case x model, pass/fail per cell, drill-down into the failing assertion) as a self-contained HTML or Markdown file:

mcptest run --models claude-sonnet-4-5,gpt-5,gemini-2.5-pro
# defaults to the `matrix` reporter: a test-by-model comparison grid

Worked examples: agent-weather.yml, agent-matrix.yml, agent-custom-providers.yml, agent-llm-judge.yml, agent-issues-and-notifications.yml. Background lives in docs/models.md.

2. Cassette record and replay

Record real protocol exchanges to a cassette, then replay them offline in CI. Snapshots and agent cassettes share the same normalization pass, so a recording from today and one from next week diff cleanly. Replay is the default; the --record flag captures fresh recordings.

mcptest run --record   # captures cassettes alongside the YAML
mcptest run            # replays them, no API key needed in CI

3. Multiple reporters

mcptest run writes one reporter at a time, chosen with --reporter: pretty, minimal, json, junit, md, html, sarif, gitlab, ndjson, tap, matrix, matrix-md, or quiet. The default pretty lists each test plus a summary; minimal is the compact one-line count. Capture the JSON run envelope once, then re-render it into any of these formats with mcptest report --format: pretty CLI, JSON, JUnit, Markdown, HTML, SARIF, GitLab Code Quality JSON, NDJSON, TAP, and the comparison-matrix grid. No second run, no second API call.

# Run once, capture the JSON envelope.
mcptest run --reporter json --output run.json

# Re-render that envelope into whatever a given consumer wants.
mcptest report run.json --format sarif --output run.sarif
mcptest report run.json --format html --output run.html
mcptest report run.json --format matrix --output matrix.html

4. Auto-discovery from Claude Desktop, Cursor, and Claude Code

If you already use an MCP server inside an editor, mcptest doctor lists the servers it found in your local MCP client configs, and mcptest init --from-discovered <name> scaffolds a starter suite against the one you pick, copying its command and arguments so you do not retype paths. (A plain mcptest init scaffolds the fixed filesystem-server template instead.)

mcptest doctor                          # lists discovered servers by name
mcptest init --from-discovered github   # scaffold against one of them

5. OAuth 2.1 + PKCE for HTTP servers

Servers behind OAuth 2.1 with PKCE are first-class. Tokens are discovered from your environment, refreshed automatically when they expire mid-run, and redacted from every reporter.

servers:
  saas:
    url: "https://api.example.com/mcp"
    auth:
      oauth:
        client_id_env: "MCPTEST_SAAS_CLIENT_ID"
        authorization_url: "https://auth.example.com/oauth/authorize"
        token_url: "https://auth.example.com/oauth/token"

6. Expected-failures baseline (compliance corpus)

Adopt the compliance corpus on a server with known gaps without blocking the PR queue. The baseline file records which checks are currently expected to fail; mcptest compliance run --baseline fails only when a green check turns red (a new regression), or when a check in the baseline starts passing (a stale entry to remove).

# Gate CI on a baseline so known failures stay green.
mcptest compliance run \
  --results-from artifacts/compliance.json \
  --baseline compliance-baseline.yml

# Regenerate the baseline after a deliberate cleanup.
mcptest compliance run \
  --results-from artifacts/compliance.json \
  --baseline compliance-baseline.yml \
  --update-baseline --yes

At a glance

The six capabilities above are the headline; here is the rest of the surface, in one place.

  • Test types: tool, resource, and prompt calls; snapshots; latency budgets; the compliance corpus (PROTO, SCHEMA, SEQ, TOOL, RES, EDGE); schema-drift diffs (mcptest diff); generated output-schema conformance tests (mcptest generate); deterministic security scans of tool definitions (mcptest security); agent end-to-end tests; and model-comparison sweeps (--models) rendered as a test-by-model grid.
  • Assertions: deterministic matchers (exact, contains, regex, schema, subset, is-json, is-xml, is-sql, levenshtein, starts-with, the contains-* family, not/oneOf/anyOf/allOf); a cel predicate for custom logic; embedding similar; llm-judge / llm-jury with calibration; and named model-graded checks (factuality, answer-relevance, context-faithfulness).
  • Security: deterministic red-team catalog over the tool surface, a reviewer-grade vulnerability report (security --format html|md), and an OWASP LLM Top 10 coverage view.
  • Transports: stdio, streamable HTTP, legacy SSE.
  • Auth: OAuth 2.1 + PKCE, bearer tokens, custom headers, automatic secret redaction.
  • SDKs: drive mcptest from your own test runner. Python (pytest), TypeScript (vitest, jest, mocha, node:test), Go, Rust (proc-macro), .NET (xUnit), and JVM (JUnit 5).
  • CI: GitHub Actions, GitLab CI, and CircleCI integration paths.
  • Distribution: Apache-2.0, signed releases, prebuilt binaries for macOS, Linux, and Windows.

Roadmap items and feature requests live on GitHub Issues.

Examples

A spread of runnable suites across the surface, not just the agent loop. Each runs with mcptest run --config <file> (or mcptest <subcommand>); the full catalog with per-example notes is in examples/README.md.

Testing a real server? The companion mcptest-examples repo has complete, standalone end-to-end suites for ten popular MCP servers (filesystem, fetch, git, SQLite, memory, sequential-thinking, the everything reference server, plus GitHub, Notion, and Brave Search with auth). Each example carries its own README, a recorded run, and a CI workflow that installs the release binary with the curl one-liner.

Project layout

Code lives under crates/ and language SDKs under sdks/. The project layout reference is the canonical map of which crate owns what, the dependency graph, and the SDK matrix. See AGENTS.md for the contributor rulebook.

The JSON Schema for the YAML config lives at schemas/v1.json and is published on every release to https://mcptest.sh/schema/v1.json. The agent cassette schema sits next to it at schemas/agent-cassette-v1.json.

Build from source

cargo build --release
./target/release/mcptest --help

# Run the full check gate the way CI will (fmt + clippy + doc + build + test):
./scripts/check.sh

# Build the documentation site locally (config lives in docs-site/):
cd docs-site && mdbook serve   # if mdbook is installed; otherwise read raw Markdown under docs/

Full build + architecture detail is in docs/project-layout.md.

Documentation

Full documentation lives under docs/. Highlights:

License

Apache-2.0. See LICENSE and NOTICE.

Copyright 2026 Soap Bucket LLC and the mcptest contributors. Soap Bucket LLC at soapbucket.com.

Links

About

The test suite your MCP server is missing. Tool, resource, agent-loop, schema-drift, compliance, and security tests against any Model Context Protocol server, in CI on every commit. One YAML suite, deterministic cassette replay, multi-model comparison, and LLM-judge evals. Rust, Apache-2.0.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors