Bonsai is a platform for building and operating customer-facing AI agents that must sound like your brand, follow your rules, and stay operable in production.
As a system, Bonsai has two core parts:
- Bonsai Backend: the runtime, APIs, orchestration layer, provider integrations, conversation state, guardrails, auditability, and real-time voice/text session engine
- Bonsai Console: the visual admin panel for designing experiences, monitoring conversations and users, and administering projects, providers, and access
Together they give teams one operating system for conversational AI: design flows and agents, manage knowledge and providers, enforce guardrails and permissions, run live sessions, and monitor what happened in production.
Most AI agent stacks optimize for raw model access. Bonsai optimizes for the full governed system around customer-facing AI: structured stages, explicit actions, approved knowledge, project-wide guardrails, real-time voice and text, and the operational controls brands and enterprises need after launch.
This repository contains the backend half of that system
The companion UI for the full Bonsai platform lives in Bonsai Console.
Stage Edit |
Guardrails |
Playground |
Audit |
The fastest way to get Bonsai running locally is through the compose stack
git clone https://github.com/utter-one/bonsai.git
cd bonsai/compose
cp env.example .env
# Set JWT_SECRET to a random value of at least 32 characters
# Change POSTGRES_PASSWORD before using this outside local development
docker compose up -dOnce the stack is up:
- Console admin panel:
http://localhost - Backend API:
http://localhost:3000 - Swagger UI:
http://localhost:3000/api-docs - VitePress docs:
http://localhost:8080
For the full setup details, configuration options, see compose/README.md.
Brands do not need a chatbot that "usually behaves". They need an agent that protects tone of voice, respects boundaries, and stays aligned with approved messaging. Bonsai helps with that through:
- structured multi-stage conversations instead of one giant prompt
- reusable agents for personality and voice
- approved FAQ-style knowledge injection
- project-wide guardrails that fire on every stage
- content moderation before the main AI pipeline
- prescripted or generated responses depending on the risk level
Enterprises need more than generation quality. They need governance, traceability, security, and operational visibility. Bonsai includes:
- JWT-based operator auth plus project-scoped API keys
- RBAC permissions across operators, content, infrastructure, audit, and analytics
- immutable audit logs with before/after snapshots
- issue tracking linked to exact conversations, sessions, stages, and events
- latency analytics and per-conversation timing timelines
- environment records and migration workflows for moving configuration between instances
- optimistic locking on mutable entities to prevent silent overwrites
Bonsai is headless. You can bring your own frontend, channel, or client application while keeping one backend model for configuration, orchestration, and runtime:
- REST API for administration
- WebSocket API for live voice and text sessions
- generated OpenAPI and JSON Schema contracts
- sandboxed scripts, webhooks, and LLM-powered tools
- provider portability across LLM, ASR, TTS, moderation, and storage
Bonsai is built around a project model for conversational applications:
Project
│
├── Agents ← who speaks (persona, ToV, behavior)
├── Stages ← conversation/runtime states
│ └── Stage
│ ├── General ← basic stage configuration
│ ├── Prompt ← stage instructions / prompt template
│ ├── Features ← enabled capabilities for this stage
│ ├── Memory ← stage-level memory settings
│ ├── Actions ← actions available in this stage
│ ├── Lifecycle ← transitions, enter/exit behavior
├── Classifiers ← input / intent / routing / safety classification
├── Global Actions ← actions available across the project
├── Guardrails & Moderation ← moderation and enforced policies
├── Context Transformers ← context shaping / enrichment before inference
├── Global Memory
│ ├── User Profile ← persistent user profile schema
│ └── Constants ← project-wide constants
├── Knowledge ← approved knowledge, grounding, glossary
└── Tools ← registered callable capabilities (tools for the frontend)
Shared infrastructure
├── Providers
│ ├── LLM
│ ├── TTS
│ ├── ASR
│ └──Storage
└── Environments
In practice, that means you can model a conversational experience as explicit stages, attach the right personality and voice to each stage, classify user intent, extract structured context, run effects, call tools or webhooks, and stream the result back over WebSocket.
Voice or text input
-> ASR (if voice)
-> Moderation
-> Classifiers + Guardrails + Context Transformers (in parallel)
-> Actions and Effects
- scripts
- webhooks
- tools
- variable updates
- user profile updates
- stage navigation
-> LLM response generation
-> TTS synthesis (if voice output is enabled)
-> streamed text and audio back to the client
Each conversation stores state, variables, events, artifacts, and timing data, so teams can inspect what happened after the fact instead of treating production as a black box.
- Structured conversation design: Build flows with projects, stages, agents, classifiers, transformers, actions, and lifecycle hooks.
- Real-time voice and text runtime: Run live sessions over WebSocket with streaming transcription, streamed AI output, and streamed voice.
- Explicit behavior orchestration: Use effects such as
go_to_stage,run_script,call_webhook,call_tool,modify_variables,modify_user_profile,generate_response,end_conversation, andabort_conversation. - Reusable cross-stage behaviors: Define global actions once and attach them across stages, including lifecycle hooks like
conversation_startandconversation_end. - Always-on guardrails: Enforce project-level safety or policy rules on every stage through a shared guardrail classifier.
- Content moderation: Screen user input before the main pipeline and route blocked content through a dedicated moderation hook.
- Knowledge grounding: Attach FAQ-style knowledge categories and items to stages using tags, so the model answers from approved material.
- Context extraction and flow control: Run LLM-powered context transformers in parallel with classifiers to write typed values into stage state.
- Voice configuration: Configure agent-level TTS settings and optional filler responses to reduce perceived latency.
- Observability and operations: Track conversations, events, artifacts, issues, audit logs, and latency analytics.
- Environment promotion: Preview and pull configuration from remote Bonsai instances into the current one.
- Schema-first integration: Generate and expose OpenAPI and WebSocket JSON Schema contracts for tooling and client generation.
| Type | Providers |
|---|---|
LLM |
OpenAI, OpenAI legacy chat completions, Anthropic, Gemini, Groq, Mistral, DeepSeek, OpenRouter, Together AI, Fireworks AI, Perplexity, Cohere, and xAI |
TTS |
ElevenLabs, OpenAI, Azure, Deepgram, Cartesia, and Amazon Polly |
ASR |
Azure, ElevenLabs, Deepgram, AssemblyAI, and Speechmatics |
Storage |
S3, Azure Blob Storage, Google Cloud Storage, and local filesystem storage |
The REST API is used for administration and configuration:
- base path:
/api - authentication: JWT bearer tokens
- interactive docs:
/api-docs - machine-readable schema:
/openapi.json
Use it to manage projects, stages, agents, classifiers, context transformers, tools, global actions, guardrails, knowledge, providers, users, conversations, issues, environments, migration, audit logs, analytics, and more.
The WebSocket API is used for live sessions:
- endpoint:
/ws - authentication: project-scoped API key sent in the
authmessage - schema:
/websocket-contracts.json
Use it to authenticate sessions, start or resume conversations, stream user voice input, send text input, receive AI text and audio output, inspect conversation events, run actions, call tools, and control stage navigation.
| Path | Purpose |
|---|---|
src/ |
Application runtime, controllers, services, providers, WebSocket host, and utilities |
docs/ |
VitePress documentation source for the guide and API reference |
compose/ |
Fast local stack for backend, console, docs, and PostgreSQL |
drizzle/ |
Database migrations and schema snapshots |
schemas/ |
Generated WebSocket JSON Schema contracts |
| Command | Description |
|---|---|
npm run dev |
Generate WebSocket schema, run migrations, and start the server |
npm start |
Run migrations and start the server |
npm run build |
Generate WebSocket schema and compile TypeScript |
npm run db:generate |
Generate a new Drizzle migration |
npm run db:migrate |
Apply pending migrations |
npm run db:push |
Push schema changes directly with Drizzle |
npm run db:studio |
Open Drizzle Studio |
npm run schemas:generate |
Regenerate schemas/websocket-contracts.json |
- Node.js and TypeScript
- Express for the REST API
wsfor real-time WebSocket communication- PostgreSQL with Drizzle ORM
- Zod for contracts and OpenAPI generation
tsyringefor dependency injectionisolated-vmfor sandboxed scripts
Licensed under the Apache License 2.0.