RFC: VTuber adapter — OpenAI-compatible skin frontend for OAB

> Discord discussion: https://discord.com/channels/1491295327620169908/1520790210320011274
> A PR implementing Tier 1 will follow and reference this issue.

## Problem

Desktop character apps (VRM / Live2D / pixel pets) — AniCompanion, Open-LLM-VTuber, ChatVRM, clawd-on-desk — are presentation shells with no real brain. Each talks to a raw LLM via `/v1/chat/completions` and can only chat: no tool use, no code editing, no memory, no MCP. They are "a face with no agent behind it."

OAB already has the brain (ACP agents) and the multi-platform plumbing. We want these character frontends to become first-class OAB clients — so the character inherits the agent's full capability and the same persona/steering as every other platform — instead of being a dumb LLM skin.

## Proposal in one line

Add a `vtuber` gateway adapter that exposes an **OpenAI-compatible `/v1/chat/completions` SSE endpoint**, backed by an OAB ACP agent. Any OpenAI-compatible skin points at it and gets a real agent with **zero client changes**.

## At a Glance

```
Skin (AniCompanion / Open-LLM-VTuber)
  │  POST /v1/chat/completions  (stream:true, Bearer key)
  │  SSE: choices[].delta.content   (incl. inline [emotion] tags)
  ▼
crates/openab-gateway/src/adapters/vtuber.rs
  messages[] → GatewayEvent → (request_id streaming) → GatewayReply → OpenAI deltas
  ▼
openab-gateway core → src/acp → coding agent (codex / claude / kiro)
```

## Why OpenAI-compatible (not a bespoke protocol)

The target skins already speak `/v1/chat/completions` SSE and already parse inline `[emotion]` tags client-side (AniCompanion: 16 tags, stripped before TTS; Open-LLM-VTuber: 8; ChatdollKit: `[face:X]`). Being OpenAI-compatible means AniCompanion connects by typing our URL into its existing "OpenAI-compatible" backend — no Swift change — and we inherit the entire OpenAI-compatible frontend ecosystem for free.

## Prior Art & Industry Research

**Hermes Agent** (AniCompanion's reference backend) — exposes `POST /v1/chat/completions` on `:8642`, auth `Authorization: Bearer` via `API_SERVER_KEY`, SSE `chat.completion.chunk`, plus a custom `event: hermes.tool.progress` for tool visibility "without polluting persisted assistant text." ([api-server docs](https://hermes-agent.nousresearch.com/docs/user-guide/features/api-server)) Hermes "Skins" are CLI-only (colors/spinners); no VRM/avatar; no structured agent-state push.

**OpenClaw** — plugin/channel model + gateway WS control plane. Primary surface is `POST /v1/responses` (OpenResponses, item-based), but it **also lists OpenAI-compatible `POST /v1/chat/completions`, `/v1/models`, `/v1/embeddings`** (Bearer, SSE `response.*` events). ([openresponses-http-api](https://docs.openclaw.ai/gateway/openresponses-http-api), [plugin architecture](https://docs.openclaw.ai/plugins/architecture)) A community guide covers a browser VRM avatar frontend, but avatar state there is frontend-inferred from tokens/audio, not pushed by the agent.

**Skin ecosystem (feature-specific):** AniCompanion (VRM, 16 inline tags, pluggable OpenAI backend), Open-LLM-VTuber (Live2D, 8 tags, WS), ChatdollKit (`[face:X]`), clawd-on-desk (hook-driven agent *state* → pet animation; a proven state vocabulary for Tier 2).

**Conclusion:** Both incumbents validate the OpenAI-compatible passthrough as the way to let an external client drive an agent — so Tier 1 reuses a proven shape rather than inventing one. Neither pushes structured agent-state to drive character animation; that gap is deferred to a Tier-2 RFC.

## Proposed Solution

A `vtuber` adapter exposing `POST /v1/chat/completions` (SSE) inside the gateway:
- `messages[]` (including the skin's own system/persona, which already instructs the model to emit `[emotion]` tags) is forwarded as the agent prompt — **no steering added by us**.
- The agent's streamed tokens (inline tags intact) are re-emitted as OpenAI `chat.completion.chunk` deltas, ending `data: [DONE]`, reusing the gateway's existing `request_id` streaming.
- The skin parses + strips tags and maps them to its own motion system (VRM expression / Live2D exp3 / VTube Studio hotkey). The adapter stays motion-agnostic.
- Session: one OAB thread per connection for the MVP.
- ACP subprocesses spawned with `augmented_path()` (per CONTRIBUTING dev tip), never a login shell.

## Why this approach

- **Zero client change** for AniCompanion and every OpenAI-compatible skin — maximum reach for minimum code (one adapter module).
- **Validated** by Hermes and OpenClaw in production.
- Tradeoff accepted: OpenAI `/v1/chat/completions` is a stateless pull — it carries final text (and inline tags) but **not** structured agent events (tool-use stages, permission requests) and **cannot** push proactive/ambient messages. Fine for the MVP: AniCompanion's proactive behavior is already a client-side timer, and emotion via inline tags is unaffected.

## Alternatives Considered

- **Bespoke skin WebSocket protocol up front** — richer (native agent-state + ambient) but every skin, including AniCompanion, needs new code; slower to a working demo; reinvents what OpenAI-compatible already gives. Deferred to Tier 2.
- **Connect skins straight to a raw LLM / openab-auth-proxy** — no agent ability (tool use, code, MCP); defeats the purpose.
- **Inline agent-state as custom SSE events in the Tier-1 stream** (Hermes `hermes.tool.progress` style) — avoids a second connection, but AniCompanion won't parse unknown events without changes, so no MVP benefit. Revisit in Tier 2.

## Scope

**Tier 1 (this RFC / first PR):** the OpenAI-compatible chat path above.

**Tier 2 (future, separate RFC):** agent-state + ambient push that OpenAI's pull model can't carry — Clawd-style "watch the agent work" animation + true proactive notification. Out of scope here.

## Open questions for triage

1. Adapter lives in `crates/openab-gateway/src/adapters/vtuber.rs` alongside telegram/line/googlechat — confirm placement.
2. The OpenAI-compatible endpoint is a different shape from the existing webhook adapters (persistent SSE response vs request/response webhook). Is an HTTP+SSE route inside the gateway adapter module the right home, or should it be a distinct surface?
3. Session mapping: one OAB thread per skin connection for the MVP — acceptable?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC: VTuber adapter — OpenAI-compatible skin frontend for OAB #1233

Problem

Proposal in one line

At a Glance

Why OpenAI-compatible (not a bespoke protocol)

Prior Art & Industry Research

Proposed Solution

Why this approach

Alternatives Considered

Scope

Open questions for triage

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

RFC: VTuber adapter — OpenAI-compatible skin frontend for OAB #1233

Description

Problem

Proposal in one line

At a Glance

Why OpenAI-compatible (not a bespoke protocol)

Prior Art & Industry Research

Proposed Solution

Why this approach

Alternatives Considered

Scope

Open questions for triage

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions