From 75a80978e6c1b63d93b4e3e5d7e785de7e66fc34 Mon Sep 17 00:00:00 2001 From: chaodu-agent Date: Mon, 29 Jun 2026 00:46:05 +0000 Subject: [PATCH 1/5] docs(adr): add multi-model aggregation endpoint proposal Proposes an OpenAI-compatible MoA endpoint that leverages existing multi-agent Discord setup to fan out prompts, collect responses, and return aggregated results. Includes comparison with Hermes Agent MoA. --- docs/adr/multi-model-aggregation.md | 346 ++++++++++++++++++++++++++++ 1 file changed, 346 insertions(+) create mode 100644 docs/adr/multi-model-aggregation.md diff --git a/docs/adr/multi-model-aggregation.md b/docs/adr/multi-model-aggregation.md new file mode 100644 index 000000000..7e9568fc0 --- /dev/null +++ b/docs/adr/multi-model-aggregation.md @@ -0,0 +1,346 @@ +# ADR: Multi-Model Aggregation Endpoint (Mixture of Agents) + +- **Status:** Proposed +- **Date:** 2026-06-29 +- **Author:** @chaodu-agent +- **References:** [Hermes Agent — Mixture of Agents](https://hermes-agent.nousresearch.com/docs/user-guide/features/mixture-of-agents), [Ambient Mode](../ambient.md), [Multi-Agent Setup](../multi-agent.md) + +--- + +## 1. User Story & Requirements + +As an OpenAB operator running multiple agents (Kiro, Claude, Codex, OpenCode, Copilot, Grok) in the same Discord channel, I want to expose a single OpenAI-compatible API endpoint that fans out a prompt to multiple agents, collects their responses, and returns an aggregated result — so that external callers get multi-model consensus through one standard LLM API call. + +As an API consumer, I want to call a single `POST /v1/chat/completions` endpoint and receive a response synthesized from multiple LLM backends, without needing to know which models are behind it or how they communicate. + +### Requirements + +- Expose an OpenAI-compatible HTTP endpoint (`/v1/chat/completions`) on `localhost` +- Fan out the incoming prompt to N configured agents in a Discord channel +- Collect responses within a configurable timeout window (30–60 seconds) +- Aggregate collected responses into a single final response +- Support multiple aggregation strategies (synthesis, best-of-N, majority vote) +- Return standard OpenAI response format to the caller +- Work with existing multi-agent Discord setup — no changes to agent containers +- Gracefully handle partial results (some agents timeout or fail) +- Optional: support streaming (`stream: true`) after aggregation completes + +--- + +## 2. High-Level Design + +### Prior Art: Hermes Agent MoA + +Hermes Agent implements Mixture of Agents (MoA) as a **virtual model provider** integrated into its agent loop: + +1. User selects an MoA preset via `/model --provider moa` +2. For each model call, Hermes runs configured **reference models** (without tool schemas) to get diverse perspectives +3. Reference outputs are appended as private context to the **aggregator** model +4. The aggregator produces the final response and can emit tool calls +5. MoA is NOT a separate API endpoint — it's a model-selection concept within the agent + +**Key difference for OpenAB:** Hermes directly calls each model's API. OpenAB's approach leverages Discord as the message bus — agents are already running as bots, each with their own backend. We route through Discord rather than making direct API calls. + +### OpenAB Architecture + +``` + External Caller + │ + POST /v1/chat/completions + │ + ▼ + ┌──────────────────────────────┐ + │ MoA Gateway Service │ + │ (localhost:8787) │ + │ │ + │ ┌────────────────────────┐ │ + │ │ Request Handler │ │ + │ │ • Auth (API key) │ │ + │ │ • Parse OAI format │ │ + │ └──────────┬─────────────┘ │ + │ │ │ + │ ▼ │ + │ ┌────────────────────────┐ │ + │ │ Fan-Out Engine │ │ + │ │ • Post prompt to │ │ + │ │ Discord channel │ │ + │ │ • Use coordinator bot │ │ + │ │ identity │ │ + │ └──────────┬─────────────┘ │ + │ │ │ + │ ▼ │ + │ ┌────────────────────────┐ │ + │ │ Response Collector │ │ + │ │ • Listen for replies │ │ + │ │ • Timeout window │ │ + │ │ • Partial results OK │ │ + │ └──────────┬─────────────┘ │ + │ │ │ + │ ▼ │ + │ ┌────────────────────────┐ │ + │ │ Aggregator │ │ + │ │ • Synthesis / Vote │ │ + │ │ • Format as OAI resp │ │ + │ └────────────────────────┘ │ + └──────────────────────────────┘ + │ + Discord Channel (message bus) + │ + ┌───────────┬───────┼───────┬───────────┐ + ▼ ▼ ▼ ▼ ▼ + ┌─────┐ ┌───────┐ ┌─────┐ ┌──────┐ ┌──────┐ + │Kiro │ │Claude │ │Codex│ │Grok │ │ ... │ + │Agent│ │Agent │ │Agent│ │Agent │ │ │ + └─────┘ └───────┘ └─────┘ └──────┘ └──────┘ +``` + +### Message Flow + +``` +1. Caller → MoA Gateway: POST /v1/chat/completions { messages: [...] } +2. Gateway → Discord: Posts prompt in designated MoA channel using coordinator bot +3. Discord → Agents: Each agent sees the message (ambient mode or @mention) +4. Agents → Discord: Each agent replies in the thread +5. Gateway ← Discord: Collector gathers replies within timeout window +6. Gateway (Aggregator): Synthesizes collected responses into one +7. Gateway → Caller: Returns OpenAI-format response +``` + +--- + +## 3. Fan-Out Strategies + +### Option A: Ambient Mode (Recommended) + +Leverage existing ambient mode. The MoA channel has all agents configured with `allow_bot_messages = true`. The gateway posts a prompt; agents naturally respond within their `flush_interval_seconds`. + +**Pros:** No per-agent @mention logic, scales by simply adding bots to the channel +**Cons:** Relies on ambient flush timing, agents may not all respond + +### Option B: Explicit @mention + +Gateway posts a message @mentioning each configured agent. Each agent responds immediately to the mention. + +**Pros:** Guaranteed immediate response from each agent, predictable timing +**Cons:** Requires knowing each agent's Discord ID, more intrusive + +### Option C: Hybrid + +Post the prompt normally (triggers ambient), but also @mention agents that haven't responded after half the timeout. + +--- + +## 4. Response Collection + +The collector uses a mechanism similar to ambient mode's buffered collection: + +```toml +[moa] +enabled = true +channel_id = "1234567890" # Dedicated MoA channel +timeout_seconds = 45 # Max wait for responses +min_responses = 2 # Minimum responses before aggregating +max_responses = 6 # Stop collecting after N responses +early_complete_seconds = 10 # If min met, wait this long for stragglers +``` + +### Collection Logic + +``` +start_time = now() +responses = [] + +loop: + if len(responses) >= max_responses → break + if elapsed > timeout_seconds → break + if len(responses) >= min_responses AND elapsed > early_complete_seconds → break + wait for next reply in thread + responses.push(reply) + +return responses # may be partial (>= 0) +``` + +--- + +## 5. Aggregation Strategies + +### Strategy 1: Synthesis (Default) + +Call a designated aggregator model (e.g., the coordinator's own LLM backend) with all collected responses as context: + +``` +System: You are an aggregator. Multiple AI models have answered the same question. + Synthesize their responses into one high-quality answer. + Preserve the best insights from each. Resolve contradictions. + +User: [original prompt] + +Context: +- Model A (Kiro/Claude): [response A] +- Model B (Codex): [response B] +- Model C (Grok): [response C] + +Produce a single, coherent response. +``` + +### Strategy 2: Best-of-N + +Use a judge model to rank responses and return the highest-quality one unchanged. + +### Strategy 3: Majority Vote + +For tasks with discrete answers (code review verdicts, yes/no decisions), count the majority answer. + +--- + +## 6. API Interface + +### Request (OpenAI-compatible) + +```bash +curl http://localhost:8787/v1/chat/completions \ + -H "Authorization: Bearer $MOA_API_KEY" \ + -H "Content-Type: application/json" \ + -d '{ + "model": "moa-default", + "messages": [ + {"role": "user", "content": "Review this architecture and suggest improvements..."} + ], + "temperature": 0.7 + }' +``` + +### Response (OpenAI-compatible) + +```json +{ + "id": "moa-abc123", + "object": "chat.completion", + "created": 1719619200, + "model": "moa-default", + "choices": [ + { + "index": 0, + "message": { + "role": "assistant", + "content": "Based on analysis from multiple models..." + }, + "finish_reason": "stop" + } + ], + "usage": { + "prompt_tokens": 0, + "completion_tokens": 0, + "total_tokens": 0 + }, + "metadata": { + "responses_collected": 4, + "agents_responded": ["kiro", "claude", "codex", "grok"], + "aggregation_strategy": "synthesis", + "collection_time_ms": 32450 + } +} +``` + +### Model Names + +Multiple presets can be configured, each mapping to a different channel or agent subset: + +| Model Name | Channel | Agents | Strategy | +|------------|---------|--------|----------| +| `moa-default` | #moa-general | All agents | Synthesis | +| `moa-review` | #moa-review | Claude, Kiro, Codex | Synthesis | +| `moa-vote` | #moa-vote | All agents | Majority vote | + +--- + +## 7. Configuration + +```toml +[moa] +enabled = true +listen_address = "127.0.0.1:8787" +api_key = "sk-moa-..." # Simple bearer token auth + +[moa.presets.default] +channel_id = "1234567890" +timeout_seconds = 45 +min_responses = 2 +max_responses = 6 +early_complete_seconds = 10 +aggregation_strategy = "synthesis" # synthesis | best_of_n | majority_vote +aggregator_model = "coordinator" # which agent's LLM does the synthesis + +[moa.presets.review] +channel_id = "9876543210" +timeout_seconds = 60 +min_responses = 3 +aggregation_strategy = "synthesis" +``` + +--- + +## 8. Who Calls This Endpoint? + +### Use Cases + +1. **Other services in the cluster** — A CI pipeline or internal tool calls the MoA endpoint for multi-model code review or analysis, treating it like any other LLM API. + +2. **Local development tools** — IDE extensions, CLI tools, or scripts configured to use `http://localhost:8787/v1/chat/completions` as their LLM endpoint get automatic multi-model consensus. + +3. **LLM routers / orchestrators** — Tools like LiteLLM, OpenRouter proxies, or custom orchestrators can register the MoA endpoint as a "model" and route specific tasks to it. + +4. **The coordinator agent itself** — The coordinator (超渡法師) could use this endpoint for tasks that benefit from multi-model consensus before producing a final answer. + +5. **Hermes Agent integration** — Configure Hermes to use the MoA endpoint as a custom provider, giving Hermes access to OpenAB's multi-agent consensus as a single model. + +### Exposure Options + +| Scope | How | When to use | +|-------|-----|-------------| +| Pod-local only | `127.0.0.1:8787` | Single-pod testing | +| Cluster-internal | K8s Service (ClusterIP) | Other services in same cluster | +| External | Ingress + auth | Remote callers (with proper auth) | + +The default is **localhost-only** — safe by default, opt-in to broader exposure. + +--- + +## 9. Differences from Hermes MoA + +| Aspect | Hermes MoA | OpenAB MoA | +|--------|-----------|------------| +| Message bus | Direct API calls to each provider | Discord channel as message bus | +| Agent management | Config file with provider/model pairs | Existing bot deployments | +| Latency | ~5–15s (parallel API calls) | ~30–60s (Discord message flow) | +| Tool calls | Aggregator can emit tool calls | Aggregator returns text only (v1) | +| Exposure | Internal to agent loop, not an API | Standalone OpenAI-compatible endpoint | +| Adding models | Edit config.yaml | Add a bot to the channel | +| Cost model | Pay per API call to each provider | Each bot uses its own credentials | + +--- + +## 10. Future Considerations + +- **Streaming support:** Buffer aggregated response, then stream it back to the caller +- **Tool-call passthrough:** Let the aggregator emit tool calls (requires tool schema in the MoA endpoint) +- **Caching:** Cache identical prompts to avoid re-querying agents +- **Metrics:** Track per-agent response times, quality scores, participation rates +- **Weighted aggregation:** Weight agent responses by historical quality on similar tasks +- **Recursive MoA:** Allow a preset's aggregator to be another MoA preset (Hermes explicitly blocks this; we should evaluate) + +--- + +## 11. Open Questions + +1. **Should MoA be a separate binary or built into the main OAB gateway?** + - Separate: simpler, independently deployable, clear boundary + - Built-in: shares Discord connection, less operational overhead + +2. **How to handle conversation context (multi-turn)?** + - Option A: Stateless — each call is independent, caller manages history + - Option B: Session-based — gateway maintains a thread per conversation + +3. **Should agents know they're in MoA mode?** + - If yes: they can tailor responses (shorter, more analytical) + - If no: responses are natural but may be verbose for aggregation From 928311c1b1a82e7c20e4b58d853c0d3107411139 Mon Sep 17 00:00:00 2001 From: chaodu-agent Date: Mon, 29 Jun 2026 00:47:18 +0000 Subject: [PATCH 2/5] docs(adr): clarify pure aggregation vs synthesis modes Explicitly distinguish: - Mode A: Pure aggregation (no model, just merge/vote) - Mode B: Synthesis (aggregator model re-optimizes) - Mode C: Best-of-N (judge picks the best response) --- docs/adr/multi-model-aggregation.md | 44 +++++++++++++++++++++++------ 1 file changed, 36 insertions(+), 8 deletions(-) diff --git a/docs/adr/multi-model-aggregation.md b/docs/adr/multi-model-aggregation.md index 7e9568fc0..611e65976 100644 --- a/docs/adr/multi-model-aggregation.md +++ b/docs/adr/multi-model-aggregation.md @@ -162,16 +162,32 @@ return responses # may be partial (>= 0) --- -## 5. Aggregation Strategies +## 5. Operating Modes & Aggregation Strategies -### Strategy 1: Synthesis (Default) +The MoA endpoint is fundamentally a **virtual agent / proxy** — it has no opinion of its own. It routes the request to multiple downstream agents and aggregates the result. Two primary operating modes: -Call a designated aggregator model (e.g., the coordinator's own LLM backend) with all collected responses as context: +### Mode A: Pure Aggregation (No Model) + +The endpoint collects responses and merges them **without an additional LLM call**. It acts purely as a proxy + combiner. + +| Strategy | How | Use Case | +|----------|-----|----------| +| Majority Vote | Count discrete answers, return the majority | Code review verdicts, yes/no, classification | +| Concatenation | Join all responses with attribution | "Give me all perspectives" | +| Longest / First | Return the most detailed or fastest response | Low-latency passthrough | + +**Pros:** No extra latency, no extra cost, no additional model needed +**Cons:** No conflict resolution, no optimization, raw output + +### Mode B: Aggregation + Synthesis (With Model) + +The endpoint collects responses, then calls an **aggregator model** to synthesize, resolve contradictions, and optimize the final output. The aggregator adds its own reasoning on top. ``` System: You are an aggregator. Multiple AI models have answered the same question. Synthesize their responses into one high-quality answer. Preserve the best insights from each. Resolve contradictions. + Add your own analysis where the responses are incomplete. User: [original prompt] @@ -180,16 +196,28 @@ Context: - Model B (Codex): [response B] - Model C (Grok): [response C] -Produce a single, coherent response. +Produce a single, coherent, optimized response. ``` -### Strategy 2: Best-of-N +**Pros:** Higher quality output, conflict resolution, coherent single voice +**Cons:** Extra LLM call adds latency + cost, requires an aggregator model + +### Mode C: Best-of-N (Judge Model) -Use a judge model to rank responses and return the highest-quality one unchanged. +Collect responses, use a judge model to score/rank them, return the best one unchanged. -### Strategy 3: Majority Vote +**Pros:** Returns a real model's full response (not a rewrite), quality selection +**Cons:** Requires a judge call, doesn't combine insights across responses + +### Configuration + +```toml +[moa.presets.default] +mode = "synthesis" # "pure" | "synthesis" | "best_of_n" +aggregator_model = "coordinator" # only needed for "synthesis" and "best_of_n" modes +``` -For tasks with discrete answers (code review verdicts, yes/no decisions), count the majority answer. +When `mode = "pure"`, no aggregator model is required — the gateway handles merging locally. --- From ca05156d5e9671d819eebb12d7e00421ffbd418c Mon Sep 17 00:00:00 2001 From: chaodu-agent Date: Mon, 29 Jun 2026 00:49:26 +0000 Subject: [PATCH 3/5] docs(adr): add cost considerations for aggregator model MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Aggregator doesn't need expensive models — downstream agents already did the reasoning. Synthesis mode is just text reorganization. --- docs/adr/multi-model-aggregation.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/docs/adr/multi-model-aggregation.md b/docs/adr/multi-model-aggregation.md index 611e65976..b522c8a2a 100644 --- a/docs/adr/multi-model-aggregation.md +++ b/docs/adr/multi-model-aggregation.md @@ -219,6 +219,15 @@ aggregator_model = "coordinator" # only needed for "synthesis" and "best_of_n" m When `mode = "pure"`, no aggregator model is required — the gateway handles merging locally. +### Cost Considerations + +The aggregator does **not** need an expensive model. The real inference/reasoning is already done by the downstream agents — the aggregator's job is purely editorial: + +- **Pure mode:** Zero model cost. Just programmatic concat/vote logic. +- **Synthesis mode:** Only text reorganization and deduplication. The cheapest available model (GPT-4o-mini, Claude Haiku, or even a local model) is sufficient. No complex reasoning required — it's a formatting task, not an inference task. + +This means the MoA overhead cost is negligible regardless of mode. + --- ## 6. API Interface From 5a6cab5cfb45be157f733061e20995bd9d66d055 Mon Sep 17 00:00:00 2001 From: chaodu-agent Date: Mon, 29 Jun 2026 00:51:24 +0000 Subject: [PATCH 4/5] docs(adr): add detailed Hermes MoA architecture diagram --- docs/adr/multi-model-aggregation.md | 55 +++++++++++++++++++++++++---- 1 file changed, 49 insertions(+), 6 deletions(-) diff --git a/docs/adr/multi-model-aggregation.md b/docs/adr/multi-model-aggregation.md index b522c8a2a..e545ac038 100644 --- a/docs/adr/multi-model-aggregation.md +++ b/docs/adr/multi-model-aggregation.md @@ -33,13 +33,56 @@ As an API consumer, I want to call a single `POST /v1/chat/completions` endpoint Hermes Agent implements Mixture of Agents (MoA) as a **virtual model provider** integrated into its agent loop: -1. User selects an MoA preset via `/model --provider moa` -2. For each model call, Hermes runs configured **reference models** (without tool schemas) to get diverse perspectives -3. Reference outputs are appended as private context to the **aggregator** model -4. The aggregator produces the final response and can emit tool calls -5. MoA is NOT a separate API endpoint — it's a model-selection concept within the agent +``` +┌─────────────────────────────────────────────────────────────────────┐ +│ Hermes Agent Loop │ +│ │ +│ User Prompt │ +│ │ │ +│ ▼ │ +│ ┌──────────────────────────────────────────────┐ │ +│ │ MoA Provider (selected via /model --provider moa) │ +│ │ │ +│ │ Step 1: Fan-out to Reference Models (parallel, no tools) │ +│ │ │ +│ │ ┌──────────┐ ┌──────────────┐ ┌──────────┐ │ +│ │ │ GPT-5.5 │ │ DeepSeek-V4 │ │ Model C │ ... │ +│ │ │(OpenAI) │ │(OpenRouter) │ │ │ │ +│ │ └────┬─────┘ └──────┬───────┘ └────┬─────┘ │ +│ │ │ │ │ │ +│ │ ▼ ▼ ▼ │ +│ │ response A response B response C │ +│ │ │ │ │ │ +│ │ └───────────────┼───────────────┘ │ +│ │ ▼ │ +│ │ Step 2: Inject as private context │ +│ │ │ │ +│ │ ▼ │ +│ │ Step 3: Call Aggregator (with full tool schema) │ +│ │ ┌────────────────────────────────┐ │ +│ │ │ Claude Opus (Aggregator) │ │ +│ │ │ • Sees: user prompt │ │ +│ │ │ • Sees: reference outputs │ │ +│ │ │ • Can: emit tool calls │ │ +│ │ │ • Produces: final response │ │ +│ │ └──────────────┬─────────────────┘ │ +│ │ │ │ +│ └─────────────────────┼────────────────────────────────────────────┘ +│ ▼ │ +│ Final Response (returned to user as if from a single model) │ +│ │ +│ If aggregator emits tool calls → Hermes executes tools │ +│ → next iteration runs the SAME MoA process again │ +└─────────────────────────────────────────────────────────────────────┘ +``` -**Key difference for OpenAB:** Hermes directly calls each model's API. OpenAB's approach leverages Discord as the message bus — agents are already running as bots, each with their own backend. We route through Discord rather than making direct API calls. +**Key characteristics:** +- All API calls are direct (Hermes → each provider's API) +- Reference models get only conversation text (no system prompt, no tools) — cheap calls +- Aggregator is the "real" model — it can use tools, iterate, do everything a normal model does +- Not a separate endpoint — it's a model selection within the existing agent loop +- Latency: ~5–15s (parallel reference calls + aggregator call) +- Config: `config.yaml` with explicit `provider/model` pairs per preset ### OpenAB Architecture From cf1fc31e0a849d9d428ffe0e0c741c12670ae902 Mon Sep 17 00:00:00 2001 From: chaodu-agent Date: Mon, 29 Jun 2026 00:54:14 +0000 Subject: [PATCH 5/5] docs(adr): add OpenAB MoA advantages vs Hermes comparison Key advantages: zero API key mgmt, agent diversity beyond public APIs, full-capability responses, trivial scaling, audit trail, distributed cost. --- docs/adr/multi-model-aggregation.md | 35 ++++++++++++++++++++++++----- 1 file changed, 29 insertions(+), 6 deletions(-) diff --git a/docs/adr/multi-model-aggregation.md b/docs/adr/multi-model-aggregation.md index e545ac038..de809917f 100644 --- a/docs/adr/multi-model-aggregation.md +++ b/docs/adr/multi-model-aggregation.md @@ -388,15 +388,38 @@ The default is **localhost-only** — safe by default, opt-in to broader exposur ## 9. Differences from Hermes MoA +### OpenAB Advantages (Discord-as-Bus) + +| # | Advantage | Why | +|---|-----------|-----| +| 1 | **Zero API key management** | MoA gateway holds no model credentials. Each agent manages its own auth — could even be free-tier accounts. | +| 2 | **Agent diversity beyond public APIs** | Hermes can only aggregate models with public LLM APIs. OpenAB can aggregate Copilot, Cursor, Kiro, OpenCode — things that have no callable LLM endpoint but can respond as Discord bots. | +| 3 | **Full-capability responses** | Hermes reference models get bare prompts (no tools, no system prompt). OpenAB agents respond with their full toolchain — code search, file read, web search, shell exec. Each "reference" is a complete agent, not a stripped-down model call. | +| 4 | **Trivial horizontal scaling** | Add a model = add a bot to the channel. No config change, no API key, no gateway redeploy. | +| 5 | **Built-in audit trail** | All conversations live in Discord — traceable, debuggable, replayable. Hermes reference calls are internal and ephemeral. | +| 6 | **Distributed cost** | Each agent pod pays its own model bill. No single concentrated API bill. Different team members can sponsor different agents. | + +### OpenAB Disadvantages + +| # | Disadvantage | Mitigation | +|---|--------------|------------| +| 1 | **Higher latency** (30–60s vs 5–15s) | Acceptable for async tasks (code review, analysis, research). Not suited for interactive chat. | +| 2 | **Discord dependency** | Discord rate limits, outages affect the pipeline. Could add a direct-call fallback path later. | +| 3 | **Less deterministic timing** | Agents respond at their own pace; some may skip. Early-complete + min_responses config handles this. | + +### Comparison Table + | Aspect | Hermes MoA | OpenAB MoA | |--------|-----------|------------| -| Message bus | Direct API calls to each provider | Discord channel as message bus | -| Agent management | Config file with provider/model pairs | Existing bot deployments | +| Message bus | Direct API calls to each provider | Discord channel | +| Agent management | Config file with provider/model pairs | Bots in a channel | +| What gets aggregated | Bare model outputs (no tools) | Full agent responses (with tools) | | Latency | ~5–15s (parallel API calls) | ~30–60s (Discord message flow) | -| Tool calls | Aggregator can emit tool calls | Aggregator returns text only (v1) | -| Exposure | Internal to agent loop, not an API | Standalone OpenAI-compatible endpoint | -| Adding models | Edit config.yaml | Add a bot to the channel | -| Cost model | Pay per API call to each provider | Each bot uses its own credentials | +| Tool calls | Only aggregator can use tools | Every agent uses its own tools | +| Exposure | Internal to agent loop | Standalone OpenAI-compatible endpoint | +| Adding models | Edit config.yaml + add API key | Add a bot to the channel | +| Cost model | Centralized API bill | Each bot uses its own credentials | +| Audit | Ephemeral internal context | Persistent Discord history | ---