ADR: Multi-Model Aggregation Endpoint (MoA)#1236
Conversation
Proposes an OpenAI-compatible MoA endpoint that leverages existing multi-agent Discord setup to fan out prompts, collect responses, and return aggregated results. Includes comparison with Hermes Agent MoA.
Explicitly distinguish: - Mode A: Pure aggregation (no model, just merge/vote) - Mode B: Synthesis (aggregator model re-optimizes) - Mode C: Best-of-N (judge picks the best response)
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Aggregator doesn't need expensive models — downstream agents already did the reasoning. Synthesis mode is just text reorganization.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Key advantages: zero API key mgmt, agent diversity beyond public APIs, full-capability responses, trivial scaling, audit trail, distributed cost.
This comment has been minimized.
This comment has been minimized.
CHANGES REQUESTED
|
| # | Severity | Finding | Location |
|---|---|---|---|
| 1 | 🔴 Critical | Fail-open pattern: aggregation proceeds with 0 responses when all agents timeout | Section 4 (Collection Logic) |
| 2 | 🟡 Important | No error response format defined — what do 401/408/504 look like? | Section 6 (Response) |
| 3 | 🟡 Important | TOML config inconsistency — Section 4 uses flat keys, Section 7 uses nested [moa.presets.default] |
Sections 4 vs 7 |
| 4 | 🟡 Important | mode key missing from TOML examples — referenced in Section 5 but absent from config |
Section 5 vs 7 |
| 5 | 🟡 Important | usage metrics hardcoded to 0 without explanation — misleading for OpenAI-compatible claim |
Section 6 (:273-277) |
| 6 | 🟡 Important | API key in TOML plaintext with no env-var or secret management guidance | Section 7 (:298) |
| 7 | 🟢 Praise | Well-structured ADR with honest tradeoff analysis; Hermes comparison is accurate and useful | Overall |
| 8 | 🟢 Nit | aggregator_model = "coordinator" undefined — is this an agent name, model alias, or literal? |
Section 5 (:224) |
| 9 | 🟢 Nit | No TOC for 449-line document — would improve navigation | Top of file |
Details
🔴 F1: Fail-open — aggregation proceeds with 0 responses
The pseudocode breaks collection when timeout expires and returns whatever responses are available — including zero. The min_responses = 2 config exists but is not enforced as a hard gate. This means callers silently get a hallucinated response instead of an error when all agents fail.
Recommendation: If len(responses) < min_responses, return HTTP 504 with an OpenAI-compatible error body, not a synthesized response.
🟡 F2: No error response format
Section 6 only shows the success response. API consumers need to know what 401 Unauthorized, 408 Timeout, and 500 Internal Server Error responses look like for proper error handling.
Recommendation: Add an "Error Responses" subsection with OpenAI-compatible error format examples.
🟡 F3: TOML config inconsistency
Section 4 shows flat keys (channel_id, timeout_seconds) while Section 7 shows nested format under [moa.presets.default]. Readers cannot determine which format to use.
Recommendation: Consolidate Section 7 to include all keys from Section 4 in the canonical format.
🟡 F4: mode key absent from config examples
Section 5 references mode = "synthesis" but neither Section 4 nor Section 7 includes it in their config blocks.
Recommendation: Add mode to the TOML config example in Section 7.
🟡 F5: Usage metrics zeroed without explanation
The example response shows prompt_tokens: 0, completion_tokens: 0, total_tokens: 0. The aggregator model does consume tokens. Callers relying on usage for billing/cost tracking will get incorrect data.
Recommendation: Either populate with actual aggregator token usage, explicitly note usage reflects only the aggregator call (not fan-out), or mark as "to be implemented."
🟡 F6: API key plaintext in config
Config shows api_key = "sk-moa-..." as plain text. Acceptable for localhost-only, but the ADR proposes ClusterIP and Ingress exposure (Section 8). No mention of env-var substitution or Kubernetes Secret mounting.
Recommendation: Note that auth requirements scale with exposure level, and document env-var substitution pattern for non-local deployments.
What's Good
- Architecture is well-reasoned — Discord as message bus avoids API key management complexity
- Fan-out strategies (ambient vs @mention vs hybrid) are clearly differentiated
- Hermes comparison table is accurate and highlights key tradeoffs honestly
- Cost analysis is practical and useful for decision-making
Review performed by the OpenAB review team.
Summary
Proposes a Multi-Model Aggregation (MoA) feature for OpenAB — an OpenAI-compatible endpoint that fans out prompts to multiple agents in a Discord channel, collects their responses, and returns an aggregated result.
What's in this ADR
Key Design Decisions
Related