GitHub - arthurbm/gambi: Share local LLMs across your network. Connect Ollama, LM Studio, or any OpenAI-compatible endpoint to a shared hub. Includes CLI for hosting/joining rooms, SDK for AI SDK integration, and TUI for monitoring.

 ██████╗  █████╗ ███╗   ███╗██████╗ ██╗
██╔════╝ ██╔══██╗████╗ ████║██╔══██╗██║
██║  ███╗███████║██╔████╔██║██████╔╝██║
██║   ██║██╔══██║██║╚██╔╝██║██╔══██╗██║
╚██████╔╝██║  ██║██║ ╚═╝ ██║██████╔╝██║
 ╚═════╝ ╚═╝  ╚═╝╚═╝     ╚═╝╚═════╝ ╚═╝

Share local LLMs across your network, with an agent-friendly control plane.

What is Gambi?

Gambi is a local-first system for sharing OpenAI-compatible LLM endpoints across a trusted network. A central hub tracks rooms and participants, proxies inference requests, and publishes real-time events over SSE.

Participants now connect through a hub-managed tunnel. The hub never needs direct network reachability to the participant's provider endpoint, so localhost and provider credentials can remain local to the participant machine.

The public name Gambi is the short form of gambiarra. Here it means the good kind: creative improvisation under constraints, turned into a practical tool.

Two planes

Gambi exposes two distinct surfaces:

Management plane: native Gambi HTTP endpoints under /v1, plus the operational CLI and SDK management client.
Inference plane: OpenAI-compatible room-scoped endpoints under /rooms/:code/v1/*, consumed by createGambi() and other OpenAI-compatible clients.

That split is deliberate. The management plane is optimized for agents and automation. The inference plane is optimized for application compatibility.

The default inference protocol is the OpenAI Responses API. Chat Completions remains available as a compatibility surface.

Installation

CLI

Linux / macOS:

curl -fsSL https://gambi.sh/install | bash

Windows:

irm https://gambi.sh/install.ps1 | iex

npm / bun:

npm install -g gambi
# or
bun add -g gambi

Verify:

gambi --version

SDK

npm install gambi-sdk
# or
bun add gambi-sdk

TUI

gambi-tui is the human-first monitoring interface. It is separate from the CLI.

bun add -g gambi-tui

Quick start

1. Start the hub

gambi hub serve

Machine-readable dry run:

gambi hub serve --dry-run --format ndjson

2. Create a room

gambi room create --name "Demo"

With room defaults from JSON:

gambi room create --name "Demo" --config ./room-defaults.json

3. Register a participant

gambi participant join \
  --room ABC123 \
  --participant-id worker-1 \
  --model llama3 \
  --endpoint http://localhost:11434

gambi participant join probes the local endpoint, registers the participant, opens a participant tunnel back to the hub, and keeps the session alive until interrupted. This works the same way for local hubs and remote hubs on the same trusted network: the endpoint can stay loopback-only on the participant machine.

Preview the registration flow:

gambi participant join \
  --room ABC123 \
  --participant-id worker-1 \
  --model llama3 \
  --dry-run \
  --format ndjson

4. Watch room events

gambi events watch --room ABC123

As NDJSON for scripts:

gambi events watch --room ABC123 --format ndjson

Room event streams include lifecycle signals such as llm.request, llm.complete, and llm.error.

llm.complete includes baseline observability metrics when available:

ttftMs
durationMs
inputTokens
outputTokens
totalTokens
tokensPerSecond

5. Use the SDK for inference

import { createGambi } from "gambi-sdk";
import { generateText } from "ai";

const gambi = createGambi({
  roomCode: "ABC123",
  hubUrl: "http://localhost:3000",
});

const result = await generateText({
  model: gambi.any(),
  prompt: "Explain how SSE works.",
});

console.log(result.text);

6. Resolve a room dynamically with `resolveGambiTarget()`

import { createGambi, resolveGambiTarget } from "gambi-sdk";
import { generateText } from "ai";

const target = await resolveGambiTarget({
  roomCode: "ABC123",
  timeoutMs: 1500,
});

const gambi = createGambi({
  hubUrl: target.hubUrl,
  roomCode: target.roomCode,
});

const result = await generateText({
  model: gambi.any(),
  prompt: "Hello from a discovered room.",
});

Use this when your app is running on a local network and you want to resolve the hub and room before creating the provider. For fixed deployments, you can keep passing hubUrl and roomCode directly.

7. Use the SDK for management

import { createClient } from "gambi-sdk";

const client = createClient({ hubUrl: "http://localhost:3000" });

const created = await client.rooms.create({ name: "Ops" });
console.log(created.data.room.code);

const participants = await client.participants.list(created.data.room.code);
console.log(participants.data.length);

CLI overview

The CLI is resource-oriented:

gambi hub serve
gambi room create
gambi room list
gambi room get
gambi participant join
gambi participant leave
gambi participant heartbeat
gambi events watch
gambi self update

Agent-first behavior:

--format text|json|ndjson on the operational commands
--interactive and --no-interactive
default json or ndjson when stdout is piped
XDG config at ~/.config/gambi/config.json
--config - for stdin-driven JSON on commands that accept runtime config

Example config:

{
  "defaultEnv": "local",
  "envs": {
    "local": {
      "hubUrl": "http://localhost:3000",
      "endpoint": "http://localhost:11434"
    },
    "staging": {
      "hubUrl": "http://192.168.1.10:3000",
      "endpoint": "http://localhost:11434"
    }
  }
}

SDK overview

Use createGambi() when your application wants inference through the OpenAI-compatible room endpoints:

const gambi = createGambi({ roomCode: "ABC123" });

gambi.any();
gambi.participant("worker-1");
gambi.model("llama3");
gambi.openResponses.any();
gambi.chatCompletions.any();

The top-level helpers default to openResponses. Use the chatCompletions namespace only when you need explicit compatibility with legacy clients or providers.

Use resolveGambiTarget() when the room or hub should be discovered from the local network first:

import { createGambi, resolveGambiTarget } from "gambi-sdk";

const target = await resolveGambiTarget({
  roomCode: "ABC123",
});

const gambi = createGambi(target);

The SDK also exposes discoverHubs() and discoverRooms() for lower-level discovery workflows.

Use createClient() when your application needs operational control:

const client = createClient({ hubUrl: "http://localhost:3000" });

await client.rooms.list();
await client.rooms.get("ABC123");
await client.participants.upsert("ABC123", "worker-1", {
  nickname: "worker-1",
  model: "llama3",
  endpoint: "http://192.168.1.25:11434",
});
await client.participants.heartbeat("ABC123", "worker-1");
await client.participants.remove("ABC123", "worker-1");

Room event watching:

for await (const event of client.events.watchRoom({ roomCode: "ABC123" })) {
  console.log(event.type, event.data);
}

HTTP API overview

Management API:

GET /v1/health
GET /v1/rooms
POST /v1/rooms
GET /v1/rooms/:code
GET /v1/rooms/:code/participants
PUT /v1/rooms/:code/participants/:id
DELETE /v1/rooms/:code/participants/:id
POST /v1/rooms/:code/participants/:id/heartbeat
GET /v1/rooms/:code/events

Inference API:

GET /rooms/:code/v1/models
POST /rooms/:code/v1/responses
GET /rooms/:code/v1/responses/:id
DELETE /rooms/:code/v1/responses/:id
POST /rooms/:code/v1/responses/:id/cancel
GET /rooms/:code/v1/responses/:id/input_items
POST /rooms/:code/v1/chat/completions

Management responses use envelopes:

{
  "data": {
    "status": "ok",
    "timestamp": 1743884000000
  },
  "meta": {
    "requestId": "req_123"
  }
}

Management errors are structured:

{
  "error": {
    "code": "ROOM_NOT_FOUND",
    "message": "Room 'ABC123' not found.",
    "hint": "Create the room first or verify the code."
  },
  "meta": {
    "requestId": "req_456"
  }
}

Runtime defaults

Rooms and participants can both provide runtime defaults. The hub merges them at proxy time with this precedence:

room defaults
participant defaults
request-time overrides

Sensitive config is redacted from public management responses. Public room and participant payloads expose safe summaries instead of raw secrets or instructions.

Participant registrations also expose tunnel connection state through connection, including whether the tunnel is currently connected and the timestamp of the last tunnel heartbeat seen by the hub.

Streaming commands always emit NDJSON for machine-readable output. If you pass --format json to a streaming command, the CLI coerces it to ndjson.

Development

bun install
bun run dev
bun run dev:hub
bun run dev:cli -- --help
bun run dev:cli -- room list --format json
bun run dev:cli -- hub serve --dry-run --format ndjson
bun run build
bun run check-types

Root dev workflow:

bun run dev and bun run dev:hub start the hub with gambi hub serve
bun run dev:cli -- <subcommand...> forwards any CLI command from the repo root
bun run dev:monitor is a TUI alias for human-first monitoring
Prefer bun run dev:cli -- room create --help and bun run dev:cli -- participant join --help for CLI discovery during development

Workspace-specific:

bun run --cwd packages/core check-types
bun run --cwd packages/cli check-types
bun run --cwd packages/sdk check-types
bun run --cwd apps/tui test

Security

Gambi is designed for trusted local networks. The hub does not provide built-in authentication. Do not expose it directly to the public internet without an external proxy and auth layer.

For longer-term product direction, see:

docs/reference/architecture.md for the current transport and proxy model
docs/reference/observability.md for baseline metrics and future observability work
docs/product/vision.md for the future gambi agents direction above the current hub

Name		Name	Last commit message	Last commit date
Latest commit History 223 Commits
.agents/skills		.agents/skills
.claude/skills		.claude/skills
.github/workflows		.github/workflows
.vscode		.vscode
.zed		.zed
apps		apps
assets		assets
docs		docs
packages		packages
scripts		scripts
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CONTEXT.md		CONTEXT.md
LICENSE		LICENSE
README.md		README.md
biome.json		biome.json
bts.jsonc		bts.jsonc
bun.lock		bun.lock
package.json		package.json
skills-lock.json		skills-lock.json
tsconfig.json		tsconfig.json
turbo.json		turbo.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is Gambi?

Two planes

Installation

CLI

SDK

TUI

Quick start

1. Start the hub

2. Create a room

3. Register a participant

4. Watch room events

5. Use the SDK for inference

6. Resolve a room dynamically with `resolveGambiTarget()`

7. Use the SDK for management

CLI overview

SDK overview

HTTP API overview

Runtime defaults

Development

Security

About

Uh oh!

Releases 10

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

What is Gambi?

Two planes

Installation

CLI

SDK

TUI

Quick start

1. Start the hub

2. Create a room

3. Register a participant

4. Watch room events

5. Use the SDK for inference

6. Resolve a room dynamically with resolveGambiTarget()

7. Use the SDK for management

CLI overview

SDK overview

HTTP API overview

Runtime defaults

Development

Security

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

6. Resolve a room dynamically with `resolveGambiTarget()`

Packages