GitHub - b-vetter/bonsai: Build safe on-brand voice and chat agents with structured flows, guardrails, and compliance

Bonsai is a platform for building and operating customer-facing AI agents that must sound like your brand, follow your rules, and stay operable in production.

As a system, Bonsai has two core parts:

Bonsai Backend: the runtime, APIs, orchestration layer, provider integrations, conversation state, guardrails, auditability, and real-time voice/text session engine
Bonsai Console: the visual admin panel for designing experiences, monitoring conversations and users, and administering projects, providers, and access

Together they give teams one operating system for conversational AI: design flows and agents, manage knowledge and providers, enforce guardrails and permissions, run live sessions, and monitor what happened in production.

Most AI agent stacks optimize for raw model access. Bonsai optimizes for the full governed system around customer-facing AI: structured stages, explicit actions, approved knowledge, project-wide guardrails, real-time voice and text, and the operational controls brands and enterprises need after launch.

This repository contains the backend half of that system

The companion UI for the full Bonsai platform lives in Bonsai Console.

Quick Look

Stage Edit	Guardrails
Playground	Audit

Quick Start

The fastest way to get Bonsai running locally is through the compose stack

git clone https://github.com/utter-one/bonsai.git
cd bonsai/compose

cp env.example .env
# Set JWT_SECRET to a random value of at least 32 characters
# Change POSTGRES_PASSWORD before using this outside local development

docker compose up -d

Once the stack is up:

Console admin panel: http://localhost
Backend API: http://localhost:3000
Swagger UI: http://localhost:3000/api-docs
VitePress docs: http://localhost:8080

For the full setup details, configuration options, see compose/README.md.

Who is Bonsai for?

For brands

Brands do not need a chatbot that "usually behaves". They need an agent that protects tone of voice, respects boundaries, and stays aligned with approved messaging. Bonsai helps with that through:

structured multi-stage conversations instead of one giant prompt
reusable agents for personality and voice
approved FAQ-style knowledge injection
project-wide guardrails that fire on every stage
content moderation before the main AI pipeline
prescripted or generated responses depending on the risk level

For enterprises

Enterprises need more than generation quality. They need governance, traceability, security, and operational visibility. Bonsai includes:

JWT-based operator auth plus project-scoped API keys
RBAC permissions across operators, content, infrastructure, audit, and analytics
immutable audit logs with before/after snapshots
issue tracking linked to exact conversations, sessions, stages, and events
latency analytics and per-conversation timing timelines
environment records and migration workflows for moving configuration between instances
optimistic locking on mutable entities to prevent silent overwrites

For product and engineering teams

Bonsai is headless. You can bring your own frontend, channel, or client application while keeping one backend model for configuration, orchestration, and runtime:

REST API for administration
WebSocket API for live voice and text sessions
generated OpenAPI and JSON Schema contracts
sandboxed scripts, webhooks, and LLM-powered tools
provider portability across LLM, ASR, TTS, moderation, and storage

The Model

Bonsai is built around a project model for conversational applications:

Project  
│  
├── Agents                  ← who speaks (persona, ToV, behavior)  
├── Stages                  ← conversation/runtime states  
│   └── Stage  
│       ├── General         ← basic stage configuration  
│       ├── Prompt          ← stage instructions / prompt template  
│       ├── Features        ← enabled capabilities for this stage  
│       ├── Memory          ← stage-level memory settings  
│       ├── Actions         ← actions available in this stage  
│       ├── Lifecycle       ← transitions, enter/exit behavior  
├── Classifiers             ← input / intent / routing / safety classification  
├── Global Actions          ← actions available across the project  
├── Guardrails & Moderation ← moderation and enforced policies  
├── Context Transformers    ← context shaping / enrichment before inference  
├── Global Memory  
│   ├── User Profile        ← persistent user profile schema  
│   └── Constants           ← project-wide constants  
├── Knowledge               ← approved knowledge, grounding, glossary  
└── Tools                   ← registered callable capabilities (tools for the frontend)

Shared infrastructure
├── Providers
│   ├── LLM
│   ├── TTS
│   ├── ASR
│   └──Storage
└── Environments

In practice, that means you can model a conversational experience as explicit stages, attach the right personality and voice to each stage, classify user intent, extract structured context, run effects, call tools or webhooks, and stream the result back over WebSocket.

How a Turn Works

Voice or text input
  -> ASR (if voice)
  -> Moderation
  -> Classifiers + Guardrails + Context Transformers (in parallel)
  -> Actions and Effects
     - scripts
     - webhooks
     - tools
     - variable updates
     - user profile updates
     - stage navigation
  -> LLM response generation
  -> TTS synthesis (if voice output is enabled)
  -> streamed text and audio back to the client

Each conversation stores state, variables, events, artifacts, and timing data, so teams can inspect what happened after the fact instead of treating production as a black box.

Core Capabilities

Structured conversation design: Build flows with projects, stages, agents, classifiers, transformers, actions, and lifecycle hooks.
Real-time voice and text runtime: Run live sessions over WebSocket with streaming transcription, streamed AI output, and streamed voice.
Explicit behavior orchestration: Use effects such as go_to_stage, run_script, call_webhook, call_tool, modify_variables, modify_user_profile, generate_response, end_conversation, and abort_conversation.
Reusable cross-stage behaviors: Define global actions once and attach them across stages, including lifecycle hooks like conversation_start and conversation_end.
Always-on guardrails: Enforce project-level safety or policy rules on every stage through a shared guardrail classifier.
Content moderation: Screen user input before the main pipeline and route blocked content through a dedicated moderation hook.
Knowledge grounding: Attach FAQ-style knowledge categories and items to stages using tags, so the model answers from approved material.
Context extraction and flow control: Run LLM-powered context transformers in parallel with classifiers to write typed values into stage state.
Voice configuration: Configure agent-level TTS settings and optional filler responses to reduce perceived latency.
Observability and operations: Track conversations, events, artifacts, issues, audit logs, and latency analytics.
Environment promotion: Preview and pull configuration from remote Bonsai instances into the current one.
Schema-first integration: Generate and expose OpenAPI and WebSocket JSON Schema contracts for tooling and client generation.

Currently Supported Providers

Type	Providers
`LLM`	OpenAI, OpenAI legacy chat completions, Anthropic, Gemini, Groq, Mistral, DeepSeek, OpenRouter, Together AI, Fireworks AI, Perplexity, Cohere, and xAI
`TTS`	ElevenLabs, OpenAI, Azure, Deepgram, Cartesia, and Amazon Polly
`ASR`	Azure, ElevenLabs, Deepgram, AssemblyAI, and Speechmatics
`Storage`	S3, Azure Blob Storage, Google Cloud Storage, and local filesystem storage

API Surfaces

REST API

The REST API is used for administration and configuration:

base path: /api
authentication: JWT bearer tokens
interactive docs: /api-docs
machine-readable schema: /openapi.json

Use it to manage projects, stages, agents, classifiers, context transformers, tools, global actions, guardrails, knowledge, providers, users, conversations, issues, environments, migration, audit logs, analytics, and more.

WebSocket API

The WebSocket API is used for live sessions:

endpoint: /ws
authentication: project-scoped API key sent in the auth message
schema: /websocket-contracts.json

Use it to authenticate sessions, start or resume conversations, stream user voice input, send text input, receive AI text and audio output, inspect conversation events, run actions, call tools, and control stage navigation.

Documentation

Repository Layout

Path	Purpose
`src/`	Application runtime, controllers, services, providers, WebSocket host, and utilities
`docs/`	VitePress documentation source for the guide and API reference
`compose/`	Fast local stack for backend, console, docs, and PostgreSQL
`drizzle/`	Database migrations and schema snapshots
`schemas/`	Generated WebSocket JSON Schema contracts

Development Scripts

Command	Description
`npm run dev`	Generate WebSocket schema, run migrations, and start the server
`npm start`	Run migrations and start the server
`npm run build`	Generate WebSocket schema and compile TypeScript
`npm run db:generate`	Generate a new Drizzle migration
`npm run db:migrate`	Apply pending migrations
`npm run db:push`	Push schema changes directly with Drizzle
`npm run db:studio`	Open Drizzle Studio
`npm run schemas:generate`	Regenerate `schemas/websocket-contracts.json`

Tech Stack

Node.js and TypeScript
Express for the REST API
ws for real-time WebSocket communication
PostgreSQL with Drizzle ORM
Zod for contracts and OpenAPI generation
tsyringe for dependency injection
isolated-vm for sandboxed scripts

License

Licensed under the Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 505 Commits
.github		.github
compose		compose
docs		docs
drizzle		drizzle
schemas		schemas
src		src
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
drizzle.config.ts		drizzle.config.ts
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsconfig.schema.json		tsconfig.schema.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quick Look

Quick Start

Who is Bonsai for?

For brands

For enterprises

For product and engineering teams

The Model

How a Turn Works

Core Capabilities

Currently Supported Providers

API Surfaces

REST API

WebSocket API

Documentation

Repository Layout

Development Scripts

Tech Stack

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Quick Look

Quick Start

Who is Bonsai for?

For brands

For enterprises

For product and engineering teams

The Model

How a Turn Works

Core Capabilities

Currently Supported Providers

API Surfaces

REST API

WebSocket API

Documentation

Repository Layout

Development Scripts

Tech Stack

License

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages