Spec-driven HTTP API test-case generator — turn OpenAPI into runnable Hurl, k6, or Postman cases
CaseForge reads your OpenAPI specification and generates structured, traceable test cases covering happy paths, edge cases, boundary values, and OWASP security scenarios. It outputs ready-to-run test files in multiple formats and can execute them against your API.
It works as a pure algorithmic generator out of the box (pairwise, boundary-value, combinatorial). Optionally, you can plug in an LLM (Anthropic / OpenAI / Gemini / any OpenAI-compatible API) to enrich edge-case discovery and mine response-body constraints.
- Multiple LLM providers (optional) — Anthropic, OpenAI, Gemini, or any OpenAI-compatible API (DeepSeek, Qwen, Moonshot, Azure). Disable entirely for pure-algorithm mode.
- Multiple output formats — Hurl, k6, Postman Collection v2.1, Markdown, CSV
- OWASP security testing — injection, auth bypass, and data exposure test cases
- Spec linting — validates OpenAPI specs with configurable severity thresholds and JSON output
- Spec diff — classifies breaking vs non-breaking changes; auto-generates cases for breaking ops
- Risk-based testing — detects which API operations are at risk from recent git changes via static analysis
- Test case scoring — multi-dimensional quality scoring with named coverage scenario tracking (breadth, boundary, security, execution, status coverage)
- Natural language input —
askgenerates cases from a plain-text description - Platform export — exports to Allure, Xray, or TestRail
- Webhook push — fires
on_generate/on_run_completeevents to configured endpoints - Watch mode — regenerates cases whenever the spec file changes
- Checkpoint resume — resumes interrupted
genruns from where they left off - Dynamic API exploration — probes a live API to discover implicit validation rules (DEA)
- Duplicate detection — finds and removes structurally similar test cases
- CI scaffolding — generates GitHub Actions, GitLab CI, Jenkins, or shell workflow configs
- MCP server — exposes CaseForge as an MCP tool for AI agent pipelines
- Onboarding wizard — interactive
onboardcommand walks through full setup in minutes - Auth bootstrap —
--auth-bootstrapprepends an auth setup step to all secured-endpoint cases so every technique works out of the box against authenticated APIs - Response body oracles —
--with-oraclesuses two-step LLM prompting (Observation-Confirmation) to mine response body constraints and inject them as assertions - Coverage gap filling —
score --fill-gapsdetects operations missing 2xx or 4xx coverage and auto-generates cases to close the gaps - Failure classification — failed
runcases are automatically taggedserver_error/missing_validation/auth_failure/security_regression - Conformance checking —
conformancecommand mines oracle constraints for all operations and reports spec-vs-implementation mismatches against a live API - Pure algorithm mode — works without an LLM key using pairwise, boundary-value, and combinatorial analysis
brew tap testmind-hq/tap
brew install caseforgego install github.com/testmind-hq/caseforge@latestgit clone https://github.com/testmind-hq/caseforge.git
cd caseforge
go build -o caseforge .# Interactive setup (recommended for first use)
caseforge onboard
# Check your environment
caseforge doctor
# Generate test cases from an OpenAPI spec
caseforge gen --spec openapi.yaml --format hurl
# Run the generated tests
caseforge run --cases ./cases --target http://localhost:8080
# Lint the spec
caseforge lint --spec openapi.yaml| Command | Description |
|---|---|
gen |
Generate test cases from an OpenAPI spec |
run |
Execute generated test cases (hurl or k6) |
ask |
Generate test cases from a natural language description |
lint |
Lint an OpenAPI spec for quality issues |
diff |
Compare two OpenAPI specs and classify breaking changes |
score |
Score the quality of generated test cases |
conformance |
Check spec-vs-implementation conformance using LLM-mined response body constraints |
| Command | Description |
|---|---|
mutate |
Run HTTP boundary mutations via a reverse proxy to find weak test assertions |
rbt |
Risk-based testing: assess which operations are at risk from recent git changes |
rbt index |
Auto-generate caseforge-map.yaml by analysing source code |
explore |
Dynamically probe a live API and infer implicit validation rules |
stats |
Show test case statistics for a cases directory |
dedupe |
Detect and optionally remove duplicate test cases |
| Command | Description |
|---|---|
sandbox |
Start a local HTTP mock server that generates realistic responses from an OpenAPI spec |
chain |
Generate multi-step chain cases via BFS over the dependency graph |
watch |
Watch a spec file and regenerate cases on change |
suite create |
Create a suite.json orchestration file |
suite validate |
Validate a suite.json against its index.json |
export |
Export index.json to Allure, Xray, or TestRail format |
ci init |
Generate a CI workflow config (GitHub Actions, GitLab CI, Jenkins, shell) |
| Command | Description |
|---|---|
onboard |
Interactive setup wizard — writes ~/.caseforge.yaml (global) |
init |
Write a .caseforge.yaml in the current directory (project-level override) |
config show |
Print the effective configuration |
doctor |
Check environment dependencies |
mcp |
Start CaseForge as an MCP server (stdio transport) |
pairwise |
Compute pairwise combinations for given parameters |
fake |
Generate fake data for a JSON schema |
completion |
Generate shell completion scripts |
--spec string OpenAPI spec file or URL (required)
--output string Output directory (default: ./cases)
--format string hurl | markdown | csv | postman | k6 (default: hurl)
--no-ai Disable LLM; use pure algorithm mode
--technique string Only run named techniques, comma-separated
e.g. equivalence_partitioning,boundary_value
--priority string Filter output by minimum priority: P0|P1|P2|P3
--operations string Comma-separated operationIds to process (default: all)
--concurrency int Operations processed in parallel (default: 1)
--resume Resume an interrupted run; skip completed operations
--tuple-level int N-way coverage level for pairwise (2=pairwise, 3=3-way, default 2)
--seed int Seed for deterministic generation (0 = random)
--max-cases-per-op int Cap cases per operation by priority (0 = unlimited)
--include-path string Regex to include operations by path (e.g. '^/users')
--exclude-path string Regex to exclude operations by path (e.g. '^/admin')
--include-tag string Comma-separated OpenAPI tags to include (e.g. 'users,auth')
--exclude-tag string Comma-separated OpenAPI tags to exclude (e.g. 'deprecated')
--auth-bootstrap Wrap all secured-endpoint cases with an auth setup step
--with-oracles Mine response body constraints via LLM and inject as assertions (requires LLM)
--with-sandbox Start a local sandbox server, run generated cases against it, exit non-zero on failure
--force Regenerate even when spec hash matches existing output
--annotation-batch N Number of operations to annotate per LLM call (0 = one call per op; recommended: 8–20)
Smart regeneration: gen hashes the spec file on each run. If the hash matches the previously generated output, it exits early with a ✓ Spec unchanged message. Use --force to bypass this.
Batch annotation: By default each operation is annotated in a separate LLM call. On large specs (20+ operations) this can take several minutes. --annotation-batch 10 groups 10 operations per call, reducing round-trips dramatically.
Rate-limit backoff: 429 responses from LLM providers trigger automatic exponential backoff (5 s → 15 s → 30 s → 60 s) before retrying.
Hurl output headers: Every generated .hurl file includes # case_id= and # case_name= comment headers for traceability.
--cases string Directory containing generated test files (required)
--format string hurl | k6 (default: hurl)
--target string API base URL, e.g. http://localhost:8080
--var key=value Variables injected into test files (repeatable)
--output string Directory to write run-report.json
--spec string OpenAPI spec file or URL (required)
--min-score int Fail if spec score is below threshold (0 = disabled)
--format string terminal | json (default: terminal)
--output string Write lint-report.json to this directory
--skip-rules string Comma-separated rule IDs to skip, e.g. L001,L003
--old string Old spec file (required)
--new string New spec file (required)
--cases string Cases directory; reads index.json to infer affected cases
--format string text | json (default: text)
--gen-cases string Generate test cases for breaking operations into this directory
--cases string Directory containing index.json (default: ./cases)
--format string terminal | json (default: terminal)
--fill-gaps Auto-generate cases for operations missing 2xx or 4xx coverage
--spec string OpenAPI spec path (required for --fill-gaps)
--min-score int Exit non-zero if overall score is below threshold (0 = disabled)
--save-history Append score to .caseforge-conformance.json for trend tracking
--spec string OpenAPI spec file (required)
--target string API base URL to test against (required)
--output string Directory to write conformance-report.json (optional)
--spec string OpenAPI spec file (required)
--cases string Directory containing test case JSON files (default: ./cases)
--src string Source code root directory (default: ./)
--base string Base git ref for diff (default: HEAD~1)
--head string Head git ref for diff (default: HEAD)
--generate Generate test cases for high-risk uncovered operations
--no-ai Algorithm-only mode for both route inference and generation
--gen-format string Format for generated cases: hurl|postman|k6|markdown|csv
--output string Directory to write rbt-report.json (default: ./reports)
--format string terminal | json (default: terminal)
--fail-on string Exit non-zero if risk >= level: none|low|medium|high (default: high)
--map string Path to caseforge-map.yaml
--dry-run Skip git diff; report all operations as risk=none
--spec string OpenAPI spec file (required)
--src string Source code root to analyse (default: ./)
--out string Output map file (default: caseforge-map.yaml)
--strategy string llm | embed | hybrid (default: llm)
--overwrite Overwrite existing map file
--depth int Call graph traversal depth (0 = dynamic)
--algo string Go call graph algorithm: rta | vta (default: rta)
--output string Output directory (default: ./cases)
--format string hurl | markdown | csv | postman | k6 (default: hurl)
--id string Suite ID (required)
--title string Suite title (required)
--kind string sequential | chain (default: sequential)
--cases string Comma-separated case IDs to include
--output string Output file path (default: suite.json)
--suite string Path to suite.json (required)
--cases string Cases directory containing index.json
--spec string OpenAPI spec file or URL (required)
--depth int Maximum chain depth 1..4 (default: 2)
--output string Output directory (default: ./chains)
--format string hurl | markdown | csv | postman | k6 (default: hurl)
--data-pool string JSON data pool file written by explore --export-pool
--seed-postman string Postman Collection v2.1 JSON to seed the data pool
Chain cases follow OpenAPI Links to wire producer $response.body fields into
consumer path/query parameters, and auto-append a DELETE teardown step for
depth-2 chains where the consumer is not a DELETE operation.
--spec string OpenAPI spec file
--target string Target API base URL (required without --dry-run)
--max-probes int Maximum HTTP probes per run (default: 50)
--output string Directory to write dea-report.json (default: ./reports)
--dry-run Seed hypotheses only; do not execute probes
--export-pool string Write observed 2xx response field values to a JSON data pool file
--prioritize-uncovered Two-pass scheduling: breadth-scan all ops in pass 1, then
focus remaining budget on operations that did not return 2xx
--max-failures int Stop after discovering this many rules (0 = unlimited)
--include-path string Regex to include operations by path (e.g. '^/users')
--exclude-path string Regex to exclude operations by path (e.g. '^/admin')
--include-tag string Comma-separated OpenAPI tags to include (e.g. 'users,auth')
--exclude-tag string Comma-separated OpenAPI tags to exclude (e.g. 'deprecated')
The data pool written by --export-pool can be loaded into caseforge chain
via --data-pool to seed realistic field values into generated chain probes.
--cases string Directory containing index.json (required)
--format string allure | xray | testrail (required)
--output string Output directory (default: ./export)
--platform string github-actions | gitlab-ci | jenkins | shell (default: github-actions)
--spec string OpenAPI spec path used in the generated workflow (default: openapi.yaml)
--output string Output file path (default: platform-specific)
--force Overwrite existing file without prompting
--cases string Directory of test case JSON files (default: ./cases)
--threshold float Jaccard similarity threshold 0.0–1.0 (default: 0.9)
--merge Auto-delete lower-scoring duplicates
--dry-run Report what would be deleted without deleting
--format string terminal | json (default: terminal)
Run HTTP boundary mutations through a reverse proxy between hurl and your API. For each operator × test case combination, the proxy alters the response before hurl evaluates assertions. Cases where hurl still passes are survivors — mutations your assertions failed to catch.
Requires hurl on PATH and test cases previously generated with caseforge gen.
--cases string Directory containing index.json and .hurl files (required)
--target string API base URL, e.g. http://localhost:8080 (required)
--output string Directory to write mutation-report.json (optional)
--operator string Comma-separated operator names to run (default: all 12)
--concurrency int Cases processed concurrently per operator (default: 4)
--spec string OpenAPI spec file (passed to LLM for context with --feedback)
--feedback Run LLM analysis on survivors and suggest stronger assertions
--auto-fix Patch index.json with suggested assertions (requires --feedback)
--yes Skip confirmation prompt for --auto-fix
12 operators: field_drop, field_type_swap, array_to_null, null_to_array, status_swap_2xx, error_inflation, pagination_off_by_one, empty_result_injection, content_type_swap, header_drop, date_format_swap, numeric_precision_loss
Exit codes: 0 — no survivors; 6 — one or more mutations survived.
Each run is persisted to .caseforge/mutation/runs/<timestamp>.json.
# Run all 12 operators, write JSON report
caseforge mutate --cases ./cases --target http://localhost:8080 --output ./reports
# Run a specific operator only
caseforge mutate --cases ./cases --target http://localhost:8080 --operator field_drop
# Phase 2: LLM feedback + auto-fix (requires provider in .caseforge.yaml)
caseforge mutate --cases ./cases --target http://localhost:8080 \
--feedback --auto-fix --yesStart a local HTTP mock server that serves realistic responses generated from an OpenAPI spec. Stop with Ctrl-C (SIGINT/SIGTERM triggers graceful shutdown).
- Use
sandboxfor interactive development and debugging — explore with curl or Postman, inspect logs, iterate on your spec. - Use
gen --with-sandboxfor CI one-shot validation — generates cases and runs them against the sandbox automatically.
--spec string OpenAPI spec file (required)
--port int Listen port; 0 = random (default: 0)
--host string Listen address (default: 127.0.0.1)
--log-level string info | warn | error | silent (default: info)
--log-file string Append JSON structured logs to file (optional)
--format string Response generation strategy: auto | schema | faker (default: auto)
--format auto tries strategies in order: first uses spec examples (x-examples / components/examples), then derives zero-values from the JSON Schema, then falls back to faker-generated values.
On startup prints: caseforge sandbox listening on http://127.0.0.1:<port>
Stateful CRUD: POST /resource generates a response body, stores it, and returns the resource ID in both the body and the X-Sandbox-ID response header. A subsequent GET /resource/{id} returns the same stored object (200) or 404 if absent; DELETE /resource/{id} removes it and returns 204; GET /resource (no ID) returns all stored objects as a JSON array.
CI one-shot tip: The sandbox always returns success responses, so test cases that assert 4xx status codes will fail. Pair --with-sandbox with --technique equivalence_partitioning and a spec that defines only success responses to generate only happy-path cases:
caseforge gen --spec api.yaml --no-ai \
--technique equivalence_partitioning \
--format hurl --output ./cases --with-sandbox--spec string OpenAPI spec file to watch (required, local file only)
--output string Output directory (default: ./cases)
--format string hurl | k6 | postman | markdown | csv
--no-ai Disable LLM
--cases string Directory containing index.json (default: ./cases)
--format string terminal | json (default: terminal)
CaseForge uses a two-level config lookup:
- Global —
~/.caseforge.yaml(created bycaseforge onboard, applies to all projects) - Project —
./.caseforge.yamlin the current directory (created bycaseforge init, takes priority)
Use --config <path> to point to an explicit file instead.
Example config:
ai:
provider: anthropic # anthropic | openai | openai-compat | gemini | noop
model: claude-sonnet-4-6 # model name for the chosen provider
# api_key: ... # or set via env var (see below)
# base_url: ... # openai-compat only (e.g. https://api.deepseek.com/v1)
output:
default_format: hurl # hurl | markdown | csv | postman | k6
dir: ./cases
lint:
fail_on: error # error | warning | info
# Webhook notifications (optional)
webhooks:
- url: https://hooks.example.com/caseforge
events: [on_generate, on_run_complete]
secret: your-hmac-secret # signs requests with X-CaseForge-Signature-256
timeout_seconds: 10
max_retries: 3| Provider | ai.provider |
Env var |
|---|---|---|
| Anthropic (default) | anthropic |
ANTHROPIC_API_KEY |
| OpenAI | openai |
OPENAI_API_KEY |
| DeepSeek / Qwen / Azure | openai-compat |
OPENAI_API_KEY + ai.base_url |
| Google Gemini | gemini |
GEMINI_API_KEY or GOOGLE_API_KEY |
| No AI | noop |
— |
| Event | Fires when |
|---|---|
on_generate |
Each operation completes generation (includes method, path, case count) |
on_run_complete |
The full gen run finishes (includes total cases, output directory) |
Requests are signed with HMAC-SHA256 when secret is set. Verify with:
X-CaseForge-Signature-256: sha256=<hex>
| Technique | Flag value |
|---|---|
| Equivalence Partitioning | equivalence_partitioning |
| Boundary Value Analysis | boundary_value |
| Decision Table | decision_table |
| State Transition | state_transition |
| Pairwise (IPOG) | pairwise |
| Idempotency | idempotency |
| OWASP API Top 10 (spec-based) | owasp_api_top10 |
| OWASP API Top 10 (LLM-annotated) | owasp_api_top10_spec |
| Classification Tree (MBT) | classification_tree |
| Orthogonal Array | orthogonal_array |
| Example Extraction | example_extraction |
| Positive Examples | positive_examples |
| Isolated Negative | isolated_negative |
| Required Field Omission | required_omission |
| Field Boundary | field_boundary |
| Schema Violation | schema_violation |
| Constraint Mutation | constraint_mutation |
| Variable Irrelevance | variable_irrelevance |
| Mutation | mutation |
| Type Coercion | type_coercion |
| Unicode Fuzzing | unicode_fuzzing |
| Mass Assignment | mass_assignment |
| IDOR | idor |
| Semantic Annotation (nullable/readOnly/writeOnly) | semantic_annotation |
| Auth Chain | auth_chain |
| CRUD Chain | chain_crud |
| Chain Sequence (Jaccard similarity) | chain_sequence |
| Business Rule Violation | business_rule_violation |
- Go 1.26+ (build from source)
- hurl — required for
caseforge run --format hurl - k6 — required for
caseforge run --format k6
Run caseforge doctor to verify your environment.
See CONTRIBUTING.md.
caseforge's design was informed by the following projects and academic research in the API testing space:
Open-source projects (concept-level references; no source code derived):
- Schemathesis — property-based testing patterns
- CATS — fuzzing techniques
- EvoMaster — coverage metric definition
- Tcases — isolated negative testing and N-way coverage principles
- RESTler — dependency-graph and N-step chain testing
- Portman — semantic annotation and field-boundary patterns
- Microcks — HAR-based traffic import and conformance gating
Academic research (concept-level references):
- RBCTest — Observation-Confirmation prompting pattern for response-body oracle mining
- AutoRestTest — failure classification and coverage-gap filling
- RESTifAI — LLM-driven business-rule violation and chain-sequence inference
Standards:
- OWASP API Security Top 10 — security category structure
None of these projects' source code is embedded in caseforge. See NOTICE for full attribution and the explicit "no source derived" declaration.
Apache License 2.0 — see LICENSE.
