Skip to content

RFC: VTuber adapter — OpenAI-compatible skin frontend for OAB #1233

Description

@canyugs

Discord discussion: https://discord.com/channels/1491295327620169908/1520790210320011274
A PR implementing Tier 1 will follow and reference this issue.

Problem

Desktop character apps (VRM / Live2D / pixel pets) — AniCompanion, Open-LLM-VTuber, ChatVRM, clawd-on-desk — are presentation shells with no real brain. Each talks to a raw LLM via /v1/chat/completions and can only chat: no tool use, no code editing, no memory, no MCP. They are "a face with no agent behind it."

OAB already has the brain (ACP agents) and the multi-platform plumbing. We want these character frontends to become first-class OAB clients — so the character inherits the agent's full capability and the same persona/steering as every other platform — instead of being a dumb LLM skin.

Proposal in one line

Add a vtuber gateway adapter that exposes an OpenAI-compatible /v1/chat/completions SSE endpoint, backed by an OAB ACP agent. Any OpenAI-compatible skin points at it and gets a real agent with zero client changes.

At a Glance

Skin (AniCompanion / Open-LLM-VTuber)
  │  POST /v1/chat/completions  (stream:true, Bearer key)
  │  SSE: choices[].delta.content   (incl. inline [emotion] tags)
  ▼
crates/openab-gateway/src/adapters/vtuber.rs
  messages[] → GatewayEvent → (request_id streaming) → GatewayReply → OpenAI deltas
  ▼
openab-gateway core → src/acp → coding agent (codex / claude / kiro)

Why OpenAI-compatible (not a bespoke protocol)

The target skins already speak /v1/chat/completions SSE and already parse inline [emotion] tags client-side (AniCompanion: 16 tags, stripped before TTS; Open-LLM-VTuber: 8; ChatdollKit: [face:X]). Being OpenAI-compatible means AniCompanion connects by typing our URL into its existing "OpenAI-compatible" backend — no Swift change — and we inherit the entire OpenAI-compatible frontend ecosystem for free.

Prior Art & Industry Research

Hermes Agent (AniCompanion's reference backend) — exposes POST /v1/chat/completions on :8642, auth Authorization: Bearer via API_SERVER_KEY, SSE chat.completion.chunk, plus a custom event: hermes.tool.progress for tool visibility "without polluting persisted assistant text." (api-server docs) Hermes "Skins" are CLI-only (colors/spinners); no VRM/avatar; no structured agent-state push.

OpenClaw — plugin/channel model + gateway WS control plane. Primary surface is POST /v1/responses (OpenResponses, item-based), but it also lists OpenAI-compatible POST /v1/chat/completions, /v1/models, /v1/embeddings (Bearer, SSE response.* events). (openresponses-http-api, plugin architecture) A community guide covers a browser VRM avatar frontend, but avatar state there is frontend-inferred from tokens/audio, not pushed by the agent.

Skin ecosystem (feature-specific): AniCompanion (VRM, 16 inline tags, pluggable OpenAI backend), Open-LLM-VTuber (Live2D, 8 tags, WS), ChatdollKit ([face:X]), clawd-on-desk (hook-driven agent state → pet animation; a proven state vocabulary for Tier 2).

Conclusion: Both incumbents validate the OpenAI-compatible passthrough as the way to let an external client drive an agent — so Tier 1 reuses a proven shape rather than inventing one. Neither pushes structured agent-state to drive character animation; that gap is deferred to a Tier-2 RFC.

Proposed Solution

A vtuber adapter exposing POST /v1/chat/completions (SSE) inside the gateway:

  • messages[] (including the skin's own system/persona, which already instructs the model to emit [emotion] tags) is forwarded as the agent prompt — no steering added by us.
  • The agent's streamed tokens (inline tags intact) are re-emitted as OpenAI chat.completion.chunk deltas, ending data: [DONE], reusing the gateway's existing request_id streaming.
  • The skin parses + strips tags and maps them to its own motion system (VRM expression / Live2D exp3 / VTube Studio hotkey). The adapter stays motion-agnostic.
  • Session: one OAB thread per connection for the MVP.
  • ACP subprocesses spawned with augmented_path() (per CONTRIBUTING dev tip), never a login shell.

Why this approach

  • Zero client change for AniCompanion and every OpenAI-compatible skin — maximum reach for minimum code (one adapter module).
  • Validated by Hermes and OpenClaw in production.
  • Tradeoff accepted: OpenAI /v1/chat/completions is a stateless pull — it carries final text (and inline tags) but not structured agent events (tool-use stages, permission requests) and cannot push proactive/ambient messages. Fine for the MVP: AniCompanion's proactive behavior is already a client-side timer, and emotion via inline tags is unaffected.

Alternatives Considered

  • Bespoke skin WebSocket protocol up front — richer (native agent-state + ambient) but every skin, including AniCompanion, needs new code; slower to a working demo; reinvents what OpenAI-compatible already gives. Deferred to Tier 2.
  • Connect skins straight to a raw LLM / openab-auth-proxy — no agent ability (tool use, code, MCP); defeats the purpose.
  • Inline agent-state as custom SSE events in the Tier-1 stream (Hermes hermes.tool.progress style) — avoids a second connection, but AniCompanion won't parse unknown events without changes, so no MVP benefit. Revisit in Tier 2.

Scope

Tier 1 (this RFC / first PR): the OpenAI-compatible chat path above.

Tier 2 (future, separate RFC): agent-state + ambient push that OpenAI's pull model can't carry — Clawd-style "watch the agent work" animation + true proactive notification. Out of scope here.

Open questions for triage

  1. Adapter lives in crates/openab-gateway/src/adapters/vtuber.rs alongside telegram/line/googlechat — confirm placement.
  2. The OpenAI-compatible endpoint is a different shape from the existing webhook adapters (persistent SSE response vs request/response webhook). Is an HTTP+SSE route inside the gateway adapter module the right home, or should it be a distinct surface?
  3. Session mapping: one OAB thread per skin connection for the MVP — acceptable?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions