Discord discussion: https://discord.com/channels/1491295327620169908/1520790210320011274
A PR implementing Tier 1 will follow and reference this issue.
Problem
Desktop character apps (VRM / Live2D / pixel pets) — AniCompanion, Open-LLM-VTuber, ChatVRM, clawd-on-desk — are presentation shells with no real brain. Each talks to a raw LLM via /v1/chat/completions and can only chat: no tool use, no code editing, no memory, no MCP. They are "a face with no agent behind it."
OAB already has the brain (ACP agents) and the multi-platform plumbing. We want these character frontends to become first-class OAB clients — so the character inherits the agent's full capability and the same persona/steering as every other platform — instead of being a dumb LLM skin.
Proposal in one line
Add a vtuber gateway adapter that exposes an OpenAI-compatible /v1/chat/completions SSE endpoint, backed by an OAB ACP agent. Any OpenAI-compatible skin points at it and gets a real agent with zero client changes.
At a Glance
Skin (AniCompanion / Open-LLM-VTuber)
│ POST /v1/chat/completions (stream:true, Bearer key)
│ SSE: choices[].delta.content (incl. inline [emotion] tags)
▼
crates/openab-gateway/src/adapters/vtuber.rs
messages[] → GatewayEvent → (request_id streaming) → GatewayReply → OpenAI deltas
▼
openab-gateway core → src/acp → coding agent (codex / claude / kiro)
Why OpenAI-compatible (not a bespoke protocol)
The target skins already speak /v1/chat/completions SSE and already parse inline [emotion] tags client-side (AniCompanion: 16 tags, stripped before TTS; Open-LLM-VTuber: 8; ChatdollKit: [face:X]). Being OpenAI-compatible means AniCompanion connects by typing our URL into its existing "OpenAI-compatible" backend — no Swift change — and we inherit the entire OpenAI-compatible frontend ecosystem for free.
Prior Art & Industry Research
Hermes Agent (AniCompanion's reference backend) — exposes POST /v1/chat/completions on :8642, auth Authorization: Bearer via API_SERVER_KEY, SSE chat.completion.chunk, plus a custom event: hermes.tool.progress for tool visibility "without polluting persisted assistant text." (api-server docs) Hermes "Skins" are CLI-only (colors/spinners); no VRM/avatar; no structured agent-state push.
OpenClaw — plugin/channel model + gateway WS control plane. Primary surface is POST /v1/responses (OpenResponses, item-based), but it also lists OpenAI-compatible POST /v1/chat/completions, /v1/models, /v1/embeddings (Bearer, SSE response.* events). (openresponses-http-api, plugin architecture) A community guide covers a browser VRM avatar frontend, but avatar state there is frontend-inferred from tokens/audio, not pushed by the agent.
Skin ecosystem (feature-specific): AniCompanion (VRM, 16 inline tags, pluggable OpenAI backend), Open-LLM-VTuber (Live2D, 8 tags, WS), ChatdollKit ([face:X]), clawd-on-desk (hook-driven agent state → pet animation; a proven state vocabulary for Tier 2).
Conclusion: Both incumbents validate the OpenAI-compatible passthrough as the way to let an external client drive an agent — so Tier 1 reuses a proven shape rather than inventing one. Neither pushes structured agent-state to drive character animation; that gap is deferred to a Tier-2 RFC.
Proposed Solution
A vtuber adapter exposing POST /v1/chat/completions (SSE) inside the gateway:
messages[] (including the skin's own system/persona, which already instructs the model to emit [emotion] tags) is forwarded as the agent prompt — no steering added by us.
- The agent's streamed tokens (inline tags intact) are re-emitted as OpenAI
chat.completion.chunk deltas, ending data: [DONE], reusing the gateway's existing request_id streaming.
- The skin parses + strips tags and maps them to its own motion system (VRM expression / Live2D exp3 / VTube Studio hotkey). The adapter stays motion-agnostic.
- Session: one OAB thread per connection for the MVP.
- ACP subprocesses spawned with
augmented_path() (per CONTRIBUTING dev tip), never a login shell.
Why this approach
- Zero client change for AniCompanion and every OpenAI-compatible skin — maximum reach for minimum code (one adapter module).
- Validated by Hermes and OpenClaw in production.
- Tradeoff accepted: OpenAI
/v1/chat/completions is a stateless pull — it carries final text (and inline tags) but not structured agent events (tool-use stages, permission requests) and cannot push proactive/ambient messages. Fine for the MVP: AniCompanion's proactive behavior is already a client-side timer, and emotion via inline tags is unaffected.
Alternatives Considered
- Bespoke skin WebSocket protocol up front — richer (native agent-state + ambient) but every skin, including AniCompanion, needs new code; slower to a working demo; reinvents what OpenAI-compatible already gives. Deferred to Tier 2.
- Connect skins straight to a raw LLM / openab-auth-proxy — no agent ability (tool use, code, MCP); defeats the purpose.
- Inline agent-state as custom SSE events in the Tier-1 stream (Hermes
hermes.tool.progress style) — avoids a second connection, but AniCompanion won't parse unknown events without changes, so no MVP benefit. Revisit in Tier 2.
Scope
Tier 1 (this RFC / first PR): the OpenAI-compatible chat path above.
Tier 2 (future, separate RFC): agent-state + ambient push that OpenAI's pull model can't carry — Clawd-style "watch the agent work" animation + true proactive notification. Out of scope here.
Open questions for triage
- Adapter lives in
crates/openab-gateway/src/adapters/vtuber.rs alongside telegram/line/googlechat — confirm placement.
- The OpenAI-compatible endpoint is a different shape from the existing webhook adapters (persistent SSE response vs request/response webhook). Is an HTTP+SSE route inside the gateway adapter module the right home, or should it be a distinct surface?
- Session mapping: one OAB thread per skin connection for the MVP — acceptable?
Problem
Desktop character apps (VRM / Live2D / pixel pets) — AniCompanion, Open-LLM-VTuber, ChatVRM, clawd-on-desk — are presentation shells with no real brain. Each talks to a raw LLM via
/v1/chat/completionsand can only chat: no tool use, no code editing, no memory, no MCP. They are "a face with no agent behind it."OAB already has the brain (ACP agents) and the multi-platform plumbing. We want these character frontends to become first-class OAB clients — so the character inherits the agent's full capability and the same persona/steering as every other platform — instead of being a dumb LLM skin.
Proposal in one line
Add a
vtubergateway adapter that exposes an OpenAI-compatible/v1/chat/completionsSSE endpoint, backed by an OAB ACP agent. Any OpenAI-compatible skin points at it and gets a real agent with zero client changes.At a Glance
Why OpenAI-compatible (not a bespoke protocol)
The target skins already speak
/v1/chat/completionsSSE and already parse inline[emotion]tags client-side (AniCompanion: 16 tags, stripped before TTS; Open-LLM-VTuber: 8; ChatdollKit:[face:X]). Being OpenAI-compatible means AniCompanion connects by typing our URL into its existing "OpenAI-compatible" backend — no Swift change — and we inherit the entire OpenAI-compatible frontend ecosystem for free.Prior Art & Industry Research
Hermes Agent (AniCompanion's reference backend) — exposes
POST /v1/chat/completionson:8642, authAuthorization: BearerviaAPI_SERVER_KEY, SSEchat.completion.chunk, plus a customevent: hermes.tool.progressfor tool visibility "without polluting persisted assistant text." (api-server docs) Hermes "Skins" are CLI-only (colors/spinners); no VRM/avatar; no structured agent-state push.OpenClaw — plugin/channel model + gateway WS control plane. Primary surface is
POST /v1/responses(OpenResponses, item-based), but it also lists OpenAI-compatiblePOST /v1/chat/completions,/v1/models,/v1/embeddings(Bearer, SSEresponse.*events). (openresponses-http-api, plugin architecture) A community guide covers a browser VRM avatar frontend, but avatar state there is frontend-inferred from tokens/audio, not pushed by the agent.Skin ecosystem (feature-specific): AniCompanion (VRM, 16 inline tags, pluggable OpenAI backend), Open-LLM-VTuber (Live2D, 8 tags, WS), ChatdollKit (
[face:X]), clawd-on-desk (hook-driven agent state → pet animation; a proven state vocabulary for Tier 2).Conclusion: Both incumbents validate the OpenAI-compatible passthrough as the way to let an external client drive an agent — so Tier 1 reuses a proven shape rather than inventing one. Neither pushes structured agent-state to drive character animation; that gap is deferred to a Tier-2 RFC.
Proposed Solution
A
vtuberadapter exposingPOST /v1/chat/completions(SSE) inside the gateway:messages[](including the skin's own system/persona, which already instructs the model to emit[emotion]tags) is forwarded as the agent prompt — no steering added by us.chat.completion.chunkdeltas, endingdata: [DONE], reusing the gateway's existingrequest_idstreaming.augmented_path()(per CONTRIBUTING dev tip), never a login shell.Why this approach
/v1/chat/completionsis a stateless pull — it carries final text (and inline tags) but not structured agent events (tool-use stages, permission requests) and cannot push proactive/ambient messages. Fine for the MVP: AniCompanion's proactive behavior is already a client-side timer, and emotion via inline tags is unaffected.Alternatives Considered
hermes.tool.progressstyle) — avoids a second connection, but AniCompanion won't parse unknown events without changes, so no MVP benefit. Revisit in Tier 2.Scope
Tier 1 (this RFC / first PR): the OpenAI-compatible chat path above.
Tier 2 (future, separate RFC): agent-state + ambient push that OpenAI's pull model can't carry — Clawd-style "watch the agent work" animation + true proactive notification. Out of scope here.
Open questions for triage
crates/openab-gateway/src/adapters/vtuber.rsalongside telegram/line/googlechat — confirm placement.