Skip to content

ADR: Multi-Model Aggregation Endpoint (MoA)#1236

Draft
chaodu-agent wants to merge 5 commits into
mainfrom
adr/multi-model-aggregation
Draft

ADR: Multi-Model Aggregation Endpoint (MoA)#1236
chaodu-agent wants to merge 5 commits into
mainfrom
adr/multi-model-aggregation

Conversation

@chaodu-agent

Copy link
Copy Markdown
Collaborator

Summary

Proposes a Multi-Model Aggregation (MoA) feature for OpenAB — an OpenAI-compatible endpoint that fans out prompts to multiple agents in a Discord channel, collects their responses, and returns an aggregated result.

What's in this ADR

  • User story & requirements — who needs this and what it should do
  • High-level architecture — MoA gateway service that uses Discord as message bus
  • Fan-out strategies — ambient mode vs explicit @mention vs hybrid
  • Response collection — configurable timeout/min/max with early-complete logic
  • Aggregation strategies — synthesis, best-of-N, majority vote
  • API interface — full OpenAI-compatible request/response format
  • Configuration — TOML config with named presets
  • Who calls it — CI pipelines, dev tools, LLM routers, coordinator agent, Hermes integration
  • Hermes MoA comparison — key differences (Discord bus vs direct API, latency tradeoffs)
  • Open questions — separate binary vs built-in, multi-turn, agent awareness

Key Design Decisions

  1. OpenAI-compatible endpoint on localhost — callers use standard SDKs
  2. Discord as message bus — leverages existing multi-agent setup, no direct API key management
  3. Ambient-mode inspired collection — configurable timeout window with partial result support
  4. Localhost by default — safe default, opt-in to cluster/external exposure

Related

Proposes an OpenAI-compatible MoA endpoint that leverages existing
multi-agent Discord setup to fan out prompts, collect responses, and
return aggregated results. Includes comparison with Hermes Agent MoA.
@chaodu-agent chaodu-agent requested a review from thepagent as a code owner June 29, 2026 00:46
Explicitly distinguish:
- Mode A: Pure aggregation (no model, just merge/vote)
- Mode B: Synthesis (aggregator model re-optimizes)
- Mode C: Best-of-N (judge picks the best response)
@chaodu-agent

This comment has been minimized.

@chaodu-agent

This comment has been minimized.

Aggregator doesn't need expensive models — downstream agents already
did the reasoning. Synthesis mode is just text reorganization.
@chaodu-agent

This comment has been minimized.

@chaodu-agent

This comment has been minimized.

Key advantages: zero API key mgmt, agent diversity beyond public APIs,
full-capability responses, trivial scaling, audit trail, distributed cost.
@chaodu-agent

This comment has been minimized.

@chaodu-agent

Copy link
Copy Markdown
Collaborator Author

CHANGES REQUESTED ⚠️

Summary

This ADR proposes a Multi-Model Aggregation (MoA) endpoint for OpenAB. The architecture is well-reasoned with clear tradeoff analysis, but has critical gaps in error handling and documentation consistency that should be resolved before implementation.


Findings

# Severity Finding Location
1 🔴 Critical Fail-open pattern: aggregation proceeds with 0 responses when all agents timeout Section 4 (Collection Logic)
2 🟡 Important No error response format defined — what do 401/408/504 look like? Section 6 (Response)
3 🟡 Important TOML config inconsistency — Section 4 uses flat keys, Section 7 uses nested [moa.presets.default] Sections 4 vs 7
4 🟡 Important mode key missing from TOML examples — referenced in Section 5 but absent from config Section 5 vs 7
5 🟡 Important usage metrics hardcoded to 0 without explanation — misleading for OpenAI-compatible claim Section 6 (:273-277)
6 🟡 Important API key in TOML plaintext with no env-var or secret management guidance Section 7 (:298)
7 🟢 Praise Well-structured ADR with honest tradeoff analysis; Hermes comparison is accurate and useful Overall
8 🟢 Nit aggregator_model = "coordinator" undefined — is this an agent name, model alias, or literal? Section 5 (:224)
9 🟢 Nit No TOC for 449-line document — would improve navigation Top of file

Details

🔴 F1: Fail-open — aggregation proceeds with 0 responses

The pseudocode breaks collection when timeout expires and returns whatever responses are available — including zero. The min_responses = 2 config exists but is not enforced as a hard gate. This means callers silently get a hallucinated response instead of an error when all agents fail.

Recommendation: If len(responses) < min_responses, return HTTP 504 with an OpenAI-compatible error body, not a synthesized response.

🟡 F2: No error response format

Section 6 only shows the success response. API consumers need to know what 401 Unauthorized, 408 Timeout, and 500 Internal Server Error responses look like for proper error handling.

Recommendation: Add an "Error Responses" subsection with OpenAI-compatible error format examples.

🟡 F3: TOML config inconsistency

Section 4 shows flat keys (channel_id, timeout_seconds) while Section 7 shows nested format under [moa.presets.default]. Readers cannot determine which format to use.

Recommendation: Consolidate Section 7 to include all keys from Section 4 in the canonical format.

🟡 F4: mode key absent from config examples

Section 5 references mode = "synthesis" but neither Section 4 nor Section 7 includes it in their config blocks.

Recommendation: Add mode to the TOML config example in Section 7.

🟡 F5: Usage metrics zeroed without explanation

The example response shows prompt_tokens: 0, completion_tokens: 0, total_tokens: 0. The aggregator model does consume tokens. Callers relying on usage for billing/cost tracking will get incorrect data.

Recommendation: Either populate with actual aggregator token usage, explicitly note usage reflects only the aggregator call (not fan-out), or mark as "to be implemented."

🟡 F6: API key plaintext in config

Config shows api_key = "sk-moa-..." as plain text. Acceptable for localhost-only, but the ADR proposes ClusterIP and Ingress exposure (Section 8). No mention of env-var substitution or Kubernetes Secret mounting.

Recommendation: Note that auth requirements scale with exposure level, and document env-var substitution pattern for non-local deployments.


What's Good

  • Architecture is well-reasoned — Discord as message bus avoids API key management complexity
  • Fan-out strategies (ambient vs @mention vs hybrid) are clearly differentiated
  • Hermes comparison table is accurate and highlights key tradeoffs honestly
  • Cost analysis is practical and useful for decision-making

Review performed by the OpenAB review team.

@thepagent thepagent marked this pull request as draft June 29, 2026 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant