Skip to content

Add unit tests for OpenAIChatAgent#408

Open
vaibhavhariram wants to merge 1 commit into
hud-evals:mainfrom
vaibhavhariram:feat/test-openai-chat-agent
Open

Add unit tests for OpenAIChatAgent#408
vaibhavhariram wants to merge 1 commit into
hud-evals:mainfrom
vaibhavhariram:feat/test-openai-chat-agent

Conversation

@vaibhavhariram
Copy link
Copy Markdown

@vaibhavhariram vaibhavhariram commented May 5, 2026

Summary

  • 51 tests across 9 test classes in hud/agents/tests/test_openai_chat.py
  • Covers init (api_key, base_url, openai_client, completion_kwargs, checkpoint injection), schema sanitization (_sanitize_schema_for_openai: anyOf nullables, prefixItems→array, nested recursion, key allowlist), tool schema formatting to OpenAI function format, _oai_to_mcp conversion, system message formatting, format_blocks (text/image/mixed), format_tool_results (text/image/empty/structured content), error handling (API errors, JSON truncation), and get_response/_invoke_chat_completion params passthrough

Test plan

  • uv run pytest hud/agents/tests/test_openai_chat.py -q → 51 passed, 0 failed
  • All async tests use pytest-asyncio (auto mode via asyncio_mode = "auto" in pyproject.toml)
  • No real network calls — OpenAI client fully mocked

🤖 Generated with Claude Code


Note

Low Risk
Low risk: adds a new test module only, with no production code changes. Main risk is tightening behavior expectations that could require follow-up fixes if current behavior differs.

Overview
Adds hud/agents/tests/test_openai_chat.py, a comprehensive unit test suite for OpenAIChatAgent.

The tests cover initialization paths (HUD gateway vs. explicit api_key/base_url/openai_client, completion_kwargs, and checkpoint injection), schema sanitization and tool schema conversion to OpenAI function format, plus message/result formatting (get_system_messages, format_blocks, format_tool_results) including image handling.

Also adds coverage for get_response / _invoke_chat_completion behavior: tool call parsing, passthrough of allowed completion kwargs (while protecting model/messages/tools), finish-reason handling, and graceful error responses for API failures/truncated JSON.

Reviewed by Cursor Bugbot for commit 4ec040c. Bugbot is set up for automated code reviews on this repo. Configure here.

51 tests covering init, schema sanitization, tool formatting, OAI→MCP
conversion, system messages, format_blocks, format_tool_results, error
handling, and get_response/invoke params passthrough.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Ruthwik-Data
Copy link
Copy Markdown

Solid coverage — 51 tests across 9 classes with fully mocked OpenAI client is the right approach for a unit test suite at this layer.

A couple of observations:

reasoning_content handling not tested. The mocks set reasoning_content = None throughout, but the actual OpenAIChatAgent likely has a code path that handles non-None reasoning_content (e.g., o1/o3 models). Worth adding at least one test where reasoning_content is a non-empty string to ensure it doesn't get silently dropped or cause a formatting error downstream.

format_tool_results with mixed content types. The tests cover text/image/empty/structured content individually, but no test exercises a ToolResult with a mixed list (e.g., [TextContent, ImageContent] in the same result). In eval pipelines that return evidence alongside screenshots, this combination is common and worth a dedicated case.

Minor: The make_tool_call / make_tool_result factories are clean and reusable — if this project grows the test suite further, it may be worth moving them to a shared conftest.py so other agent tests can use them too.

Overall the test plan is well-structured and the async handling via pytest-asyncio in auto mode is the right call. Happy to see this merged once the above gaps are addressed or consciously scoped out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants