feat: add multi-agent orchestration pattern HUD-574 by ryantzr1 · Pull Request #268 · hud-evals/hud-python

ryantzr1 · 2026-01-12T08:54:27Z

Note

Introduces a complete multi-agent orchestration pattern with a conductor and sub-agents, plus a runnable example.

Adds cookbooks/multi-agent with concepts, patterns, and code snippets; links example examples/07_multi_agent.py (browser + coding AgentTools, coordinator Environment, CLI flags)
Updates docs navigation to include the new cookbook page
Refines hud/agents/resolver.resolve_cls to route Codex-capable models to OpenAIAgent only when the matched model string is Codex-capable; avoids alias-based misrouting
Adds tests covering Codex routing and provider mapping in hud/agents/tests/test_resolver.py

^{Written by Cursor Bugbot for commit 971f613. This will update automatically on new commits. Configure here.}

- Add multi-agent example with conductor and sub-agents - Add skills documentation (overview, authoring guide, orchestration) - Fix codex model routing to check only matched ID - Remove unsafe eval() from docs example

Ruthwik-Data · 2026-05-18T17:54:07Z

This is a solid foundation for multi-agent orchestration — the conductor/sub-agent pattern with typed Environment handoff is the right abstraction for CUA evals.

A few observations from an eval design perspective:

On eval attribution in multi-agent traces: When a sub-agent fails in the middle of a task, the current orchestration pattern doesn't surface which sub-agent's output caused the downstream failure. For eval purposes, this is the core challenge — TaskCompletionMetric at the conductor level just gives you a 0/1, but you can't tell if the browser sub-agent retrieved wrong context vs the coding sub-agent wrote bad code. This connects to my open issue #388 about the gap between tool-call correctness and output correctness.

Would it make sense to add a sub_agent_eval_results field to the conductor's EvalResult, so each sub-agent's intermediate verdict is surfaced alongside the final task outcome? That would let you pinpoint which step broke without having to re-run the full trace.

On the resolver routing: The fix to route Codex-capable models via OpenAIAgent only when the model string is explicitly Codex-capable is the right call — alias-based misrouting is a silent failure mode that's hard to catch in evals.

On the example: 07_multi_agent.py would be even more useful as an eval scenario with a ground-truth expected output, so users can see how to score orchestrated agents end-to-end, not just run them.

feat: add multi-agent orchestration pattern

971f613

- Add multi-agent example with conductor and sub-agents - Add skills documentation (overview, authoring guide, orchestration) - Fix codex model routing to check only matched ID - Remove unsafe eval() from docs example

mintlify Bot deployed to staging - docs January 12, 2026 08:54 View deployment

ryantzr1 changed the title ~~feat: add multi-agent orchestration pattern~~ feat: add multi-agent orchestration pattern HUD-574 Jan 12, 2026

ryantzr1 requested a review from lorenss-m January 12, 2026 09:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add multi-agent orchestration pattern HUD-574#268

feat: add multi-agent orchestration pattern HUD-574#268
ryantzr1 wants to merge 1 commit into
mainfrom
feat/agent-orchestrator-cookbook

ryantzr1 commented Jan 12, 2026 •

edited

Loading

Uh oh!

Ruthwik-Data commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ryantzr1 commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ruthwik-Data commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ryantzr1 commented Jan 12, 2026 •

edited

Loading