A read-only cert-manager troubleshooting MCP server that diagnoses why your Kubernetes TLS certificates are not Ready.
When a cert-manager Certificate fails, you waste time clicking through kubectl describe on four different resources. CertSheriff does it for you — it walks the cert-manager resource chain, collects events, and returns a structured JSON diagnosis that any AI agent can explain in plain English.
You: "Why isn't my-cert ready?"
CertSheriff: diagnose_certificate("default", "my-cert")
→ { "summary": { "most_likely_stage": "Issuer",
"primary_reason": "IssuerNotFound", ... } }
Agent: "The Certificate references issuer 'missing-issuer' which doesn't exist.
Create the Issuer or fix the issuerRef."
┌──────────┐ ┌──────────────┐ ┌─────────────────────┐ ┌────────────────┐
│ You │────▶│ kagent │────▶│ CertSheriff MCP │────▶│ Kubernetes │
│ (chat) │◀────│ Agent │◀────│ Server │◀────│ API Server │
└──────────┘ └──────────────┘ └─────────────────────┘ └────────────────┘
│
│ reads (never writes)
▼
cert-manager CRDs
├── Certificate
├── CertificateRequest
├── Order (ACME)
├── Challenge (ACME)
├── Issuer
└── ClusterIssuer
| Tool | Input | What it does |
|---|---|---|
diagnose_certificate |
namespace, certificate_name |
Fetches the full resource chain (Certificate → CertificateRequest → Order → Challenge), the referenced Issuer/ClusterIssuer, and events. Returns a JSON diagnosis with a computed summary block. |
list_unready_certificates |
namespace |
Lists all Certificates where the Ready condition is missing or not True, with reason, message, and issuerRef. |
list_expiring_certificates |
namespace, within_days |
Lists Certificates expiring within N days, sorted by urgency (soonest first). Returns notAfter, days remaining, Ready status, dnsNames, and secretName. |
inspect_certificate_secret |
namespace, secret_name |
Decodes the X.509 certificate chain from a TLS Secret. Returns subject, SANs, issuer, expiry, key algorithm, chain length. Never exposes tls.key. |
check_issuer_health |
namespace (optional) |
Checks Ready status of all Issuers and ClusterIssuers. Returns issuer type, Ready condition, and errors. Unhealthy issuers listed first. |
- Python 3.11+
- A Kubernetes cluster with cert-manager installed (install guide)
uvpackage manager (curl -LsSf https://astral.sh/uv/install.sh | sh)
cd certsheriff-mcp
uv sync
uv run python -m certsheriff_mcp.serverThen open MCP Inspector and connect to the stdio server.
cd certsheriff-mcp
MCP_TRANSPORT_MODE=http uv run python -m certsheriff_mcp.serverThe server listens on http://0.0.0.0:8000/mcp.
cd certsheriff-mcp
uv sync --dev
uv run pytest -vkubectl apply -f examples/02-demo-failing-cert.yamlThis creates a Certificate demo-tls that references a non-existent Issuer missing-issuer.
Call the diagnose_certificate tool (via MCP Inspector, kagent, or any MCP client):
diagnose_certificate(namespace="default", certificate_name="demo-tls")
You'll see:
{
"summary": {
"ready": false,
"primary_reason": "IssuerNotFound",
"primary_message": "Referenced \"Issuer\" not found: issuer.cert-manager.io \"missing-issuer\" not found",
"most_likely_stage": "Issuer",
"next_commands": [
"kubectl describe certificate demo-tls -n default",
"kubectl describe issuer missing-issuer -n default"
]
}
}kubectl apply -f examples/03-demo-fix-issuer.yamlWait ~30 seconds for cert-manager to reconcile, then:
diagnose_certificate(namespace="default", certificate_name="demo-tls")
Now summary.ready should be true.
See examples/example-prompts.md for more ready-to-use prompts for each tool.
- Docker
- A Kubernetes cluster (kind, minikube, or remote)
- kubectl configured
- cert-manager installed (install guide)
- kagent installed (install guide)
- kmcp controller installed (
kmcp install)
cd certsheriff-mcp
docker build -t certsheriff-mcp:latest .For kind:
kind load docker-image certsheriff-mcp:latest --name <your-kind-cluster-name>For minikube:
minikube image load certsheriff-mcp:latestkubectl apply -f deploy/This creates:
ClusterRole+ClusterRoleBinding— read-only access to cert-manager resources and eventsMCPServerCR in thekagentnamespace — the kmcp controller automatically creates the Deployment and ServiceAgentCR in thekagentnamespace — the CertSheriff agent with cert-manager troubleshooting system prompt
Verify the deployment:
kubectl get mcpserver -n kagent
kubectl get pods -n kagent | grep certsheriff
kubectl get agents -n kagent | grep certsheriffcertsheriff/
├── certsheriff-mcp/ # MCP server code
│ ├── src/certsheriff_mcp/
│ │ ├── server.py # FastMCP entry point + five tools
│ │ ├── k8s.py # Kubernetes client helpers
│ │ ├── certmanager.py # cert-manager resource fetching + diagnosis
│ │ └── events.py # Event fetch + regex-based linking
│ ├── tests/
│ │ └── test_events_parse.py # Unit tests for regex extraction
│ ├── pyproject.toml # Dependencies (fastmcp, kubernetes)
│ ├── Dockerfile # Multi-stage build for K8s
│ └── kmcp.yaml # Tool configuration
├── deploy/ # Kubernetes manifests
│ ├── certsheriff-rbac.yaml # ClusterRole + ClusterRoleBinding (read-only)
│ ├── certsheriff-mcp-server.yaml # MCPServer CRD (kmcp controller manages deployment)
│ └── certsheriff-agent.yaml # Agent CRD
├── examples/ # Demo manifests + guides
│ ├── 01-install-cert-manager.md
│ ├── 02-demo-failing-cert.yaml
│ ├── 03-demo-fix-issuer.yaml
│ ├── example-prompts.md
│ └── agentgateway-config.yaml
├── steps/ # Build tutorial (step-by-step)
└── README.md # This file
- READ-ONLY: CertSheriff never creates, updates, or deletes any Kubernetes resource.
- Timeouts: All API calls have a 10-second timeout.
- Event capping: Events are limited to the most recent 30 per object.
- Graceful errors: Missing CRDs, RBAC failures, and 404s return clear JSON error messages.