ToolGate is an MCP gateway that enforces policy on every tool call an AI agent makes — logging decisions, requiring human approval for sensitive operations, and surfacing clean errors when upstream services fail.
- Docker + Docker Compose
- Go 1.22+
ANTHROPIC_API_KEYset in your environment (or in.env)
The demo UI lets you run three fault-injection scenarios against a live stack and watch the audit trail update in real time.
The compose stack mounts a pre-built binary instead of compiling inside Docker:
make build-compose-binssource .env # loads ANTHROPIC_API_KEY and optional overrides
docker compose up -d --waitServices started:
| Service | Host port | Purpose |
|---|---|---|
gateway |
18080 | ToolGate MCP gateway |
localstripe |
18420 | Fake Stripe API |
localstripe-mcp |
18421 | MCP server wrapping localstripe |
eval-trigger |
18086 | Python agent that the eval runner drives |
mock-lark |
18090 | Fake Lark (auto-approves for local dev) |
postgres |
15432 | Audit log store |
POSTGRES_DSN="postgres://gateway:gateway@127.0.0.1:15432/gateway?sslmode=disable" \
AGENT_URL="http://127.0.0.1:18086" \
go run ./cmd/eval-runner --serve evalsuite/resilience.yamlOpen http://localhost:8099 in your browser.
Each scenario requires a specific stack state. The Stack Health panel in the UI shows the current state of each service — use Refresh Health before running.
What it tests: Gateway surfaces a clean upstream_error when the upstream MCP server is unavailable.
Required state: Gateway up, MCP down, Lark any, Postgres up.
# Warm the gateway capability cache while MCP is healthy
SESSION=$(curl -s -D - -X POST http://localhost:18080/mcp \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":0,"method":"initialize","params":{"protocolVersion":"2025-03-26","capabilities":{},"clientInfo":{"name":"warmup","version":"1.0"}}}' \
| grep -i "^Mcp-Session-Id:" | awk '{print $2}' | tr -d '\r\n')
curl -s -X POST http://localhost:18080/mcp \
-H "Content-Type: application/json" \
-H "Mcp-Session-Id: $SESSION" \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' > /dev/null
# Inject the fault
docker compose stop localstripe-mcpClick MCP Crash → Run Scenario.
Expected result: list_recent_charges → allow → upstream_error — the gateway served the tool list from its capability cache and recorded the upstream failure.
What it tests: Budget limiter stops an agent from hammering a downed service.
Required state: Gateway up, MCP down (carry over from Scenario 1).
No additional setup needed. Click Retry Storm → Run Scenario.
Expected result: Five allow decisions followed by budgetExceeded.
What it tests: An approvalRequired decision expires gracefully when Lark is unreachable.
Required state: Gateway up, MCP up, Lark down, Postgres up.
# Restore MCP
docker compose start localstripe-mcp
# Wait for it to become healthy, then seed demo charges for alice@example.com
until docker inspect toolgate-localstripe-mcp-1 \
--format '{{.State.Health.Status}}' 2>/dev/null | grep -q healthy; do sleep 2; done
docker exec toolgate-eval-trigger-1 python3 -c "
import asyncio, sys
sys.path.insert(0, '/app')
from demo_webapp.stripe_client import StripeClient
from demo_webapp.seed import seed_demo_customer
async def main():
client = StripeClient('http://localstripe:8420', 'sk_test_12345')
cust = await client.find_customer_by_email('alice@example.com')
if cust is None:
cust = await client.create_customer('alice@example.com', 'Alice')
await seed_demo_customer(client, cust['id'])
await client.aclose()
asyncio.run(main())
"
# Re-warm gateway after MCP restart
SESSION=$(curl -s -D - -X POST http://localhost:18080/mcp \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":0,"method":"initialize","params":{"protocolVersion":"2025-03-26","capabilities":{},"clientInfo":{"name":"warmup","version":"1.0"}}}' \
| grep -i "^Mcp-Session-Id:" | awk '{print $2}' | tr -d '\r\n')
curl -s -X POST http://localhost:18080/mcp \
-H "Content-Type: application/json" \
-H "Mcp-Session-Id: $SESSION" \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}' > /dev/null
# Stop Lark
docker compose stop mock-larkClick Approval Timeout → Run Scenario. The case waits ~15 s for the approval TTL to expire.
Expected result: list_recent_charges → allow, create_refund → approvalRequired → expired.
To run all three scenarios headlessly in one shot:
make demo-resilienceThis script manages the full Docker lifecycle, runs each scenario in sequence, and tears down the stack on exit.
By default the stack uses mock-lark (port 18090), which auto-approves every request after 50 ms. To wire up a real Lark workspace so a human receives an interactive card and clicks Approve/Deny:
- A Lark developer account and an app created at open.larksuite.com
- ngrok (or any tunnel) to expose your local gateway to Lark's servers
- Go to Lark Open Platform → Create App → Custom App.
- Under Credentials & Basic Info, note your App ID and App Secret.
- Under Features → Bot, enable the Bot feature.
- Under Messaging API → Events, subscribe to
im.message.receive_v1so the bot can join groups. - Under Permissions, grant:
im:message,im:message:send_as_bot.
Add the bot to a group chat (or use your personal chat), then note the Chat ID (oc_…) from the group info or API.
- Start an ngrok tunnel pointing at the gateway's action endpoint:
ngrok http 18080
- Copy the HTTPS forwarding URL (e.g.
https://abc123.ngrok-free.app). - In your Lark app settings, go to Features → Bot → Card Request URL and set it to:
https://abc123.ngrok-free.app/lark/actions - Save and publish the app version.
Create a .env file in the project root (it is gitignored):
ANTHROPIC_API_KEY=sk-ant-…
LARK_APP_ID=cli_xxxxxxxxxxxx
LARK_APP_SECRET=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
LARK_CHAT_ID=oc_xxxxxxxxxxxxxxxxxxxxxxxxxxxx
LARK_VERIFICATION_TOKEN=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxUnset LARK_API_BASE_URL (or leave it absent) so the gateway sends cards to the real Lark API instead of mock-lark.
source .env
docker compose up -d --waitThe gateway reads the four LARK_* variables from the environment. When create_refund is triggered, a Lark card will arrive in the configured chat. Click Approve or Deny to resolve the approval hold.
The gateway caches the last successful initialize and tools/list responses from the upstream MCP server. When the upstream is unavailable, it serves tool metadata from this cache so agents can still discover tools — requests then fail with upstream_error at the call site rather than at tool-list time.
Important: the cache is populated the first time a successful tools/list reaches the gateway. Always warm it (see Scenario 1 setup above) before stopping the MCP server.
docker compose down -v # stops all services and removes volumes