ADR: Multi-Model Aggregation Endpoint (MoA) by chaodu-agent · Pull Request #1236 · openabdev/openab

chaodu-agent · 2026-06-29T00:46:26Z

Summary

Proposes a Multi-Model Aggregation (MoA) feature for OpenAB — an OpenAI-compatible endpoint that fans out prompts to multiple agents in a Discord channel, collects their responses, and returns an aggregated result.

What's in this ADR

User story & requirements — who needs this and what it should do
High-level architecture — MoA gateway service that uses Discord as message bus
Fan-out strategies — ambient mode vs explicit @mention vs hybrid
Response collection — configurable timeout/min/max with early-complete logic
Aggregation strategies — synthesis, best-of-N, majority vote
API interface — full OpenAI-compatible request/response format
Configuration — TOML config with named presets
Who calls it — CI pipelines, dev tools, LLM routers, coordinator agent, Hermes integration
Hermes MoA comparison — key differences (Discord bus vs direct API, latency tradeoffs)
Open questions — separate binary vs built-in, multi-turn, agent awareness

Key Design Decisions

OpenAI-compatible endpoint on localhost — callers use standard SDKs
Discord as message bus — leverages existing multi-agent setup, no direct API key management
Ambient-mode inspired collection — configurable timeout window with partial result support
Localhost by default — safe default, opt-in to cluster/external exposure

CHANGES REQUESTED ⚠️

Summary

This ADR proposes a Multi-Model Aggregation (MoA) endpoint for OpenAB. The architecture is well-reasoned with clear tradeoff analysis, but has critical gaps in error handling and documentation consistency that should be resolved before implementation.

Findings

#	Severity	Finding	Location
1	🔴 Critical	Fail-open pattern: aggregation proceeds with 0 responses when all agents timeout	Section 4 (Collection Logic)
2	🟡 Important	No error response format defined — what do 401/408/504 look like?	Section 6 (Response)
3	🟡 Important	TOML config inconsistency — Section 4 uses flat keys, Section 7 uses nested `[moa.presets.default]`	Sections 4 vs 7
4	🟡 Important	`mode` key missing from TOML examples — referenced in Section 5 but absent from config	Section 5 vs 7
5	🟡 Important	`usage` metrics hardcoded to 0 without explanation — misleading for OpenAI-compatible claim	Section 6 (:273-277)
6	🟡 Important	API key in TOML plaintext with no env-var or secret management guidance	Section 7 (:298)
7	🟢 Praise	Well-structured ADR with honest tradeoff analysis; Hermes comparison is accurate and useful	Overall
8	🟢 Nit	`aggregator_model = "coordinator"` undefined — is this an agent name, model alias, or literal?	Section 5 (:224)
9	🟢 Nit	No TOC for 449-line document — would improve navigation	Top of file

Details

🔴 F1: Fail-open — aggregation proceeds with 0 responses

The pseudocode breaks collection when timeout expires and returns whatever responses are available — including zero. The min_responses = 2 config exists but is not enforced as a hard gate. This means callers silently get a hallucinated response instead of an error when all agents fail.

Recommendation: If len(responses) < min_responses, return HTTP 504 with an OpenAI-compatible error body, not a synthesized response.

🟡 F2: No error response format

Section 6 only shows the success response. API consumers need to know what 401 Unauthorized, 408 Timeout, and 500 Internal Server Error responses look like for proper error handling.

Recommendation: Add an "Error Responses" subsection with OpenAI-compatible error format examples.

🟡 F3: TOML config inconsistency

Section 4 shows flat keys (channel_id, timeout_seconds) while Section 7 shows nested format under [moa.presets.default]. Readers cannot determine which format to use.

Recommendation: Consolidate Section 7 to include all keys from Section 4 in the canonical format.

🟡 F4: `mode` key absent from config examples

Section 5 references mode = "synthesis" but neither Section 4 nor Section 7 includes it in their config blocks.

Recommendation: Add mode to the TOML config example in Section 7.

🟡 F5: Usage metrics zeroed without explanation

The example response shows prompt_tokens: 0, completion_tokens: 0, total_tokens: 0. The aggregator model does consume tokens. Callers relying on usage for billing/cost tracking will get incorrect data.

Recommendation: Either populate with actual aggregator token usage, explicitly note usage reflects only the aggregator call (not fan-out), or mark as "to be implemented."

🟡 F6: API key plaintext in config

Config shows api_key = "sk-moa-..." as plain text. Acceptable for localhost-only, but the ADR proposes ClusterIP and Ingress exposure (Section 8). No mention of env-var substitution or Kubernetes Secret mounting.

Recommendation: Note that auth requirements scale with exposure level, and document env-var substitution pattern for non-local deployments.

What's Good

Architecture is well-reasoned — Discord as message bus avoids API key management complexity
Fan-out strategies (ambient vs @mention vs hybrid) are clearly differentiated
Hermes comparison table is accurate and highlights key tradeoffs honestly
Cost analysis is practical and useful for decision-making

Review performed by the OpenAB review team.

docs(adr): add multi-model aggregation endpoint proposal

75a8097

Proposes an OpenAI-compatible MoA endpoint that leverages existing multi-agent Discord setup to fan out prompts, collect responses, and return aggregated results. Includes comparison with Hermes Agent MoA.

chaodu-agent requested a review from thepagent as a code owner June 29, 2026 00:46

docs(adr): clarify pure aggregation vs synthesis modes

928311c

Explicitly distinguish: - Mode A: Pure aggregation (no model, just merge/vote) - Mode B: Synthesis (aggregator model re-optimizes) - Mode C: Best-of-N (judge picks the best response)