Add unit tests for OpenAIChatAgent by vaibhavhariram · Pull Request #408 · hud-evals/hud-python

vaibhavhariram · 2026-05-05T22:05:35Z

Summary

51 tests across 9 test classes in hud/agents/tests/test_openai_chat.py
Covers init (api_key, base_url, openai_client, completion_kwargs, checkpoint injection), schema sanitization (_sanitize_schema_for_openai: anyOf nullables, prefixItems→array, nested recursion, key allowlist), tool schema formatting to OpenAI function format, _oai_to_mcp conversion, system message formatting, format_blocks (text/image/mixed), format_tool_results (text/image/empty/structured content), error handling (API errors, JSON truncation), and get_response/_invoke_chat_completion params passthrough

Test plan

uv run pytest hud/agents/tests/test_openai_chat.py -q → 51 passed, 0 failed
All async tests use pytest-asyncio (auto mode via asyncio_mode = "auto" in pyproject.toml)
No real network calls — OpenAI client fully mocked

🤖 Generated with Claude Code

Note

Low Risk
Low risk: adds a new test module only, with no production code changes. Main risk is tightening behavior expectations that could require follow-up fixes if current behavior differs.

Overview
Adds hud/agents/tests/test_openai_chat.py, a comprehensive unit test suite for OpenAIChatAgent.

The tests cover initialization paths (HUD gateway vs. explicit api_key/base_url/openai_client, completion_kwargs, and checkpoint injection), schema sanitization and tool schema conversion to OpenAI function format, plus message/result formatting (get_system_messages, format_blocks, format_tool_results) including image handling.

Also adds coverage for get_response / _invoke_chat_completion behavior: tool call parsing, passthrough of allowed completion kwargs (while protecting model/messages/tools), finish-reason handling, and graceful error responses for API failures/truncated JSON.

^{Reviewed by Cursor Bugbot for commit 4ec040c. Bugbot is set up for automated code reviews on this repo. Configure here.}

51 tests covering init, schema sanitization, tool formatting, OAI→MCP conversion, system messages, format_blocks, format_tool_results, error handling, and get_response/invoke params passthrough. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Ruthwik-Data · 2026-05-11T17:38:17Z

Solid coverage — 51 tests across 9 classes with fully mocked OpenAI client is the right approach for a unit test suite at this layer.

A couple of observations:

reasoning_content handling not tested. The mocks set reasoning_content = None throughout, but the actual OpenAIChatAgent likely has a code path that handles non-None reasoning_content (e.g., o1/o3 models). Worth adding at least one test where reasoning_content is a non-empty string to ensure it doesn't get silently dropped or cause a formatting error downstream.

format_tool_results with mixed content types. The tests cover text/image/empty/structured content individually, but no test exercises a ToolResult with a mixed list (e.g., [TextContent, ImageContent] in the same result). In eval pipelines that return evidence alongside screenshots, this combination is common and worth a dedicated case.

Minor: The make_tool_call / make_tool_result factories are clean and reusable — if this project grows the test suite further, it may be worth moving them to a shared conftest.py so other agent tests can use them too.

Overall the test plan is well-structured and the async handling via pytest-asyncio in auto mode is the right call. Happy to see this merged once the above gaps are addressed or consciously scoped out.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add unit tests for OpenAIChatAgent#408

Add unit tests for OpenAIChatAgent#408
vaibhavhariram wants to merge 1 commit into
hud-evals:mainfrom
vaibhavhariram:feat/test-openai-chat-agent

vaibhavhariram commented May 5, 2026 •

edited by cursor Bot

Loading

Uh oh!

Ruthwik-Data commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vaibhavhariram commented May 5, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Ruthwik-Data commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vaibhavhariram commented May 5, 2026 •

edited by cursor Bot

Loading