Skip to content

feat: add multi-agent orchestration pattern HUD-574#268

Open
ryantzr1 wants to merge 1 commit into
mainfrom
feat/agent-orchestrator-cookbook
Open

feat: add multi-agent orchestration pattern HUD-574#268
ryantzr1 wants to merge 1 commit into
mainfrom
feat/agent-orchestrator-cookbook

Conversation

@ryantzr1
Copy link
Copy Markdown
Contributor

@ryantzr1 ryantzr1 commented Jan 12, 2026


Note

Introduces a complete multi-agent orchestration pattern with a conductor and sub-agents, plus a runnable example.

  • Adds cookbooks/multi-agent with concepts, patterns, and code snippets; links example examples/07_multi_agent.py (browser + coding AgentTools, coordinator Environment, CLI flags)
  • Updates docs navigation to include the new cookbook page
  • Refines hud/agents/resolver.resolve_cls to route Codex-capable models to OpenAIAgent only when the matched model string is Codex-capable; avoids alias-based misrouting
  • Adds tests covering Codex routing and provider mapping in hud/agents/tests/test_resolver.py

Written by Cursor Bugbot for commit 971f613. This will update automatically on new commits. Configure here.

- Add multi-agent example with conductor and sub-agents
- Add skills documentation (overview, authoring guide, orchestration)
- Fix codex model routing to check only matched ID
- Remove unsafe eval() from docs example
@ryantzr1 ryantzr1 changed the title feat: add multi-agent orchestration pattern feat: add multi-agent orchestration pattern HUD-574 Jan 12, 2026
@ryantzr1 ryantzr1 requested a review from lorenss-m January 12, 2026 09:11
@Ruthwik-Data
Copy link
Copy Markdown

This is a solid foundation for multi-agent orchestration — the conductor/sub-agent pattern with typed Environment handoff is the right abstraction for CUA evals.

A few observations from an eval design perspective:

On eval attribution in multi-agent traces: When a sub-agent fails in the middle of a task, the current orchestration pattern doesn't surface which sub-agent's output caused the downstream failure. For eval purposes, this is the core challenge — TaskCompletionMetric at the conductor level just gives you a 0/1, but you can't tell if the browser sub-agent retrieved wrong context vs the coding sub-agent wrote bad code. This connects to my open issue #388 about the gap between tool-call correctness and output correctness.

Would it make sense to add a sub_agent_eval_results field to the conductor's EvalResult, so each sub-agent's intermediate verdict is surfaced alongside the final task outcome? That would let you pinpoint which step broke without having to re-run the full trace.

On the resolver routing: The fix to route Codex-capable models via OpenAIAgent only when the model string is explicitly Codex-capable is the right call — alias-based misrouting is a silent failure mode that's hard to catch in evals.

On the example: 07_multi_agent.py would be even more useful as an eval scenario with a ground-truth expected output, so users can see how to score orchestrated agents end-to-end, not just run them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants