Production-ready, multi-tenant runtime for enterprise AI agents. One binary serves a single team on-prem (
docker compose up) or thousands of tenants in SaaS — under MIT license, no AGPL trap, no patent clauses, no "open core" gatekeeping. The whole runtime is in this repository.
🚀 "I just registered → I have a workspace, an agent, and a chat in 30 seconds."
Registration auto-provisions a personal workspace with a default agent. No setup wizard, no admin handshake, no "contact sales" email loop. Slug clashes resolve automatically; per-identity quotas and rate limits keep abuse out without slowing real users down. Email verification is opt-in, OAuth slots in next to it, and the platform-admin can flip registration to invite-only with a single switch.
🧠 "My agent learns my preferences across sessions and grows skill packs by itself."
A 12-dimension user profile builds over time from the runs you actually do. A separate evolver agent watches what worked and what failed, drafts skill updates with cited evidence, and submits them to an approval queue — never silent changes. A nightly curator archives unused skills, a verifier replays past runs against every candidate before activation, and pinned skills are exempt from automated archival forever.
🛡️ "When a model goes down, my agent doesn't."
Provider failover keeps the prompt-cache prefix stable across providers, so swapping a backend model doesn't burn tomorrow's cache. A sub-agent zombie reaper, run heartbeats, checkpoint recovery, an MCP keepalive watchdog, and an inflight-run reaper survive backend restarts and flaky tool servers. Risky operations are default-deny with approval queues; nothing privileged moves without an audit row you can grep.
| Concern | Other tools | SenHarness |
|---|---|---|
| Multi-tenancy | Single-tenant first; tenant logic bolted on later | Every domain row carries workspace_id. Tenant isolation is a hard constraint, not an aspiration |
| Skill management | Author writes once, hopes it stays useful | 9-state lifecycle + immutable version snapshots + nightly curator + auto-verifier replay before activation |
| Self-evolution | Manual updates only | Evolver agent proposes patches with cited evidence; admin approves; nothing changes silently |
| Channel coverage | One IM platform | 10+ channels: Slack · Discord · Lark/Feishu · WeChat · Telegram · Microsoft Teams · DingTalk · WeCom · QQ · generic webhook |
| Reliability | "It usually works" | Sub-agent zombie reaper · provider failover (cache-prefix-safe) · checkpoint recovery · MCP keepalive · cache-aware memory writes |
| Audit & compliance | Hidden in logs | Every state transition emits a stable audit action key. Full lineage from skill version → run → message. GDPR cascade soft-delete on identity / workspace removal |
| Observability | Print statements | Runtime console · background-job dashboard · lineage replay · cross-session insights · per-event notification preferences |
| Approvals | "Press y/n in chat" | Stable resource-typed approvals with per-resource TTL, default actions on expiry, and a notification chain for both requestor and approver |
| Secrets | .env everywhere |
Envelope-encrypted vault with pluggable keyring (env · file · passphrase · AWS KMS · GCP KMS · Azure Key Vault · HashiCorp Vault) |
| Plugins | "Drop-in any Python file" | ed25519-signed bundles, capability scopes, platform-admin approval queue, default-OFF master switch |
| License | AGPL or "open core with patent traps" | MIT. Take it, ship it, fork it, no clauses |
git clone https://github.com/senweaver/SenHarness.git
cd SenHarness
cp .env.example .env # fills sensible defaults; set at least one LLM key
docker compose up -d # full stack: postgres · redis · backend · frontend · worker
open http://localhost:3000 # register → instant workspace + chatThree commands. Three minutes. You're chatting with an agent on your laptop.
Already have an OpenAI- or Anthropic-compatible client? Point its base_url at http://localhost:3000/api/v1/openai and call /v1/chat/completions, /v1/messages (Anthropic format), or /v1/responses (OpenAI Responses format). Same workspace credentials, same audit trail, streaming + tool use + vision + file attachments all carry through. Drop-in for Claude Code, OpenAI Codex CLI, or anything else that speaks the protocol.
Need help on the host? make logs tails everything, make sh-backend drops you into the backend container, make migrate runs Alembic, make seed rebuilds the default workspace, and make create-admin mints a platform admin. Run make test for the full pytest + vitest matrix; make lint for ruff + eslint; make typecheck for ty + tsc.
Going to production? docker compose -f docker-compose.prod.yml up -d swaps in Traefik with TLS termination, hardened networking, and a worker process pool. Set ENVIRONMENT=production in .env and the backend will refuse to boot with insecure defaults — no JWT secret, dev-mode sandbox kind, plaintext keyring, or unset DB password will all halt startup with a clear error pointing at the offending field.
Agents learn skills from past runs. The evolver agent reviews what worked and what failed, then proposes new skill drafts with cited evidence — admin approves, the library grows itself.
- Cross-workspace federation is opt-in, sanitized (PII / emails / URLs / workspace slugs scrubbed), and gated by a 30-day human approval window.
- Subscriber workspaces pull updates as
PROPOSEDcandidates that still go through the local verifier before activation. - Pinned skills are exempt from automated archival forever, even when the curator votes them stale.
One agent definition. 10+ delivery channels: Slack, Discord, Lark/Feishu, WeChat, Telegram, Microsoft Teams, DingTalk, WeCom, QQ, and generic webhooks.
- Cross-platform session continuity — a conversation started in WeChat continues on Web with the same memory, skills, and audit chain.
- Default-on security — per-channel HMAC, sender allowlists, replay windows, and rate buckets are on out of the box.
- Custom channels plug in through a registry; an 11th channel is one adapter file plus a vault entry.
Switch a session to kind=squad and a coordinator agent dynamically mounts squad members as sub-agents. One parent, N children, isolated retry budgets, one shared spine for telemetry.
- Bounded fan-out —
delegate_batchparallelises sub-agents with per-branch concurrency caps. - Reliability gates — heartbeats, a zombie reaper, and a hallucination-review approval queue before risky tool calls land.
- Project boards — workspace- and squad-level kanban track what every sub-agent is actually shipping.
Point an existing client's base_url at http://localhost:3000/api/v1/openai and you get /v1/chat/completions, /v1/messages (Anthropic), and /v1/responses (OpenAI Responses) on day one. WebSocket streaming, tool use, vision, and file attachments all carry through.
- One audit trail — same workspace credentials, same audit chain whether the request comes from the UI or
curl. - Two-Model-ID — clients see a stable served name while you swap the upstream model without breaking the prompt cache.
- Plug-and-play for Claude Code, OpenAI Codex CLI, or anything else that speaks the protocol.
Cron flows in no_agent_script or no_agent_http mode let you say "every morning at 09:00, ping our SLA dashboard, escalate to the agent only on failure." 99% of those checks burn zero LLM tokens.
- Vault-backed credentials —
${vault://workspace/<key>}interpolates into HTTP headers and bodies at run time. - SSRF pinning resolves DNS once and rejects private IPs by default.
- Production guardrails — script-mode flows refuse to run with
sandbox.kind=localin production; SSH backend with command allowlist is the supported path.
Three transports — stdio, SSE, Streamable HTTP — with OAuth client-credentials built in. Image, audio, and file results pass through as first-class parts.
- Per-server keepalive plus concurrency caps mean a misbehaving tool server can't take down the whole worker pool.
- Vault-sealed OAuth tokens rotate automatically on expiry; the workspace never sees the bearer in plaintext.
- Zero-glue setup — paste the MCP endpoint URL and an optional client-id; the first tool call begins the audit trail.
Built-in KB connectors ingest URLs, files, and S3 buckets with document-level ACLs and SSE-streamed sync progress. Knowledge is workspace-scoped just like agents and skills.
- Pluggable connectors — write a new source via
register_connector(same registry pattern as channels). - Tight ACLs — every document carries a workspace + owner row; cross-workspace reads require an explicit platform-admin path.
- Live progress — sync jobs stream status over SSE so the UI never lies about whether a crawl is done.
Every skill change, every job retry, every memory write, every notification — all flow through stable audit keys you can grep. Default-deny on dangerous operations.
- Approval queues for risky changes with per-resource TTLs that escalate to admin before expiry.
- GDPR cascade soft-delete on user or workspace removal, retention watermarks, and an opt-in physical-purge ARQ task.
- Platform-admin settings — schema-driven forms with
.env-override badges and dangerous-change confirmations.
A runtime console lists every inflight run across the workspace, exposes provider routing and heartbeats, and force-recycles stuck runs without restarting the backend. Lineage replay turns compressed summary messages back into the original turns that produced them.
- Job observability — ARQ task lifecycle dashboard with manual retry and failure clustering.
- Skill knowledge graph —
derived/supersedes/fork/hub-pulledges visualised; click through to any node. - Trace replay — every artifact links back to the run, the message, and the skill version that produced it.
/insights --days 30 asks "based on my last month, where do I keep getting stuck?" The auxiliary LLM clusters error kinds, tools that misfired, and skills that helped, then renders a summary with links back to the supporting sessions.
- Privacy by default — even a workspace admin only sees clusters from their own runs; no cross-identity peek path.
- Always-on fallback — a heuristic clusterer kicks in when the auxiliary LLM is unavailable, so the command never returns empty.
- Evidence trail — every insight links back to the artifacts that triggered it.
| Surface | Built-in adapters |
|---|---|
| Model providers | OpenAI · Anthropic · Google · xAI · OpenRouter · Azure OpenAI · HuggingFace · DeepSeek · DashScope · Bailian · Moonshot · Kimi Code · Zhipu · SiliconFlow · MiniMax · Ollama · vLLM · custom |
| Agent backends | native (in-process) · openclaw (remote worker) · protocol_kernel (provider passthrough) — all behind the same AgentBackend protocol |
| IM channels | Slack · Discord · Lark/Feishu · WeChat · Telegram · Microsoft Teams · DingTalk · WeCom · QQ · generic webhook |
| MCP transports | stdio · SSE · Streamable HTTP (with OAuth client-credentials) |
| Sandbox kinds | local (dev only) · docker · state · ssh (opt-in; vault-backed keys + known-hosts pinning + command allowlist) |
| Keyring backends | env · file · passphrase · AWS KMS · GCP KMS · Azure Key Vault · HashiCorp Vault |
| Compatibility surfaces | OpenAI Chat Completions · Anthropic Messages · OpenAI Responses · WebSocket streaming · IM webhook ingress |
| Knowledge base connectors | url · file · s3 · custom (via register_connector) |
| Schedulers & job runners | APScheduler cron (Redis leader election) · ARQ worker queues · cron slot map shipped in docs |
| Notification transports | in-app inbox + WebSocket push · email (SMTP / log transport) · quiet hours + per-event preferences |
| Audit sinks | PostgreSQL audit_events (default) · pluggable forwarder via plugin (capability-scoped, write-only) |
| Approval resources | tool_call · skill_pack {create / patch / edit / delete / archive / write_file / remove_file} · flow_create · subagent_hallucination_review · hub_promotion |
| Evaluation / aux-LLM tasks | goal alignment · run-quality judge · evolver proposal · cross-session insights · sub-agent hallucination gate · reflection hook |
| Plugin extension points | ed25519-signed bundles · 6 lifecycle hooks (on_session_start/end, pre/post_llm_call, pre/post_tool_call) · register_model_provider / register_channel_kind / register_hook · platform-admin approval queue |
Adding an entry to any row is one adapter file plus tests. Built-in kinds can never be overwritten by a plugin — the registry refuses on register-time and writes an audit row, so a hostile drop-in cannot silently substitute the slack channel.
SenHarness ships as one Python 3.12 + FastAPI backend, a Next.js 15 + React 19 frontend, PostgreSQL (with pgvector) for state, Redis for queues / locks / rate limits, and an ARQ worker. Same image, same docker-compose.yml, single binary for dev and prod.
The runtime is six conceptual layers — end of story:
- Context — skills, memory, tools, and a locked goal. Cache-aware writes defer memory edits to "next session" so today's prompt cache stays warm.
- Tools — built-in tools, MCP servers, and signed plugins, with ACL / budget / approval gate around every call.
- Execution — run loop, sub-agent batching, provider routing, checkpoint recovery, heartbeats. Inflight runs survive backend restarts via a recovery sweeper.
- Memory — per-turn artifacts, session summaries, workspace memory, a 12-dimension user profile, and immutable lineage that never breaks trace replay.
- Evaluation — quality judge scores, alignment to the locked goal, auto-verifier replays. Aux-LLM calls sit behind a circuit breaker.
- Constraints & Recovery — approvals, shields, sandbox policy, provider failover, keyring-backed vault, channel security. Default-deny on dangerous ops, audit-on-write on everything else.
Around that core: skills are versioned markdown bundles with a 9-state lifecycle; channels and providers are registry-pluggable; plugins ship signed (ed25519) and gated by a platform-admin approval queue with capability scopes; admin settings expose every knob behind schema-driven forms with .env-override badges. Agent runs flow through the unified AgentBackend protocol so the inference library is swappable; the MCP transport is the official Python SDK.
A user types into a chat; the backend resolves workspace, picks the agent, builds context, routes through the configured AgentBackend, and streams the response over WebSocket while capturing a session artifact for asynchronous quality scoring. Tool calls checkpoint the run, sub-agents get isolated heartbeats and retry budgets, and a 503 from one provider rolls over to the next without breaking the cache prefix. Overnight, the curator sweeps stale skills, the evolver clusters recent failures into skill-update proposals for the approval queue, and the platform-admin dashboard shows queue depth, retry rates, and lineage at a glance — day-one behaviour, not future work.
Is it production-ready?
Yes. Audit-on-write, default-deny dangerous ops, sub-agent zombie reaping, inflight-run recovery, GDPR cascade soft-delete, a vault-backed keyring with seven providers, ed25519-signed plugins, and a hardened production compose file — we dogfood it.
Can I run it on a single VM?
Yes. docker compose up -d brings the full stack on a 4 GB / 2 vCPU box. Postgres + Redis ride alongside the backend; persistent volumes are bind-mounted so down + up keeps your data.
Does it work offline / air-gapped?
Yes. Once images are pulled, backend and frontend run fully offline. Bring your own model provider (Ollama or vLLM counts), disable email fan-out in platform settings, and keep federation / plugin-signing root keys on a thumb drive.
Can I bring my own model?
Yes. The provider catalog is pluggable with 17+ bundled adapters. The two-model-id pattern means agents see a stable served name while you swap the upstream model without breaking the prompt cache.
⭐ Star us if SenHarness saves your team time — it's the cheapest thank-you we'll ever ask for.
🐛 Issues / feature requests — GitHub issues with the bug or enhancement label.
🛠️ Pull requests — open a PR. Conventional Commits + pre-commit hooks shipped in repo.
MIT — see LICENSE.





