Qa3 by Iamsdt · Pull Request #34 · 10xHub/agentflow-cli

Iamsdt · 2026-05-19T16:14:43Z

This pull request introduces a comprehensive plan and initial implementation for improving Agentflow's testing and evaluation CLI commands and the evaluation report dashboard. The main changes include adding new CLI commands (agentflow test and agentflow eval), enhancing configuration and auto-discovery for evaluations, and redesigning the HTML report for better usability and visual clarity. Additional improvements include template updates, documentation enhancements, and minor fixes.

CLI Command Additions and Improvements

Added a detailed plan for the new agentflow test and agentflow eval commands, specifying their behaviors, configuration, and integration with agentflow.json. The plan includes auto-discovery of evaluation files, improved report generation, and CLI options for customization. (EVAL_CLI_PLAN.md EVAL_CLI_PLAN.mdR1-R340)
Implemented the evaluation key in the dev template's agentflow.json to support new evaluation features. (agentflow_cli/cli/templates/dev/agentflow.json agentflow_cli/cli/templates/dev/agentflow.jsonL5-R11)

Evaluation Protocol and Discovery

Updated the evaluation file protocol to prioritize get_eval_set() for CLI discovery, with optional get_eval_config() for configuration, and run() as an escape hatch for custom evaluation flows. Improved warning messages for missing entry points. (agentflow_cli/cli/commands/eval.py [1] [2] [3]
Updated the prod template's weather_agents_eval.py to export a get_eval_config() function, making it compatible with the new CLI auto-discovery. (agentflow_cli/cli/templates/prod/evals/weather_agents_eval.py agentflow_cli/cli/templates/prod/evals/weather_agents_eval.pyL1-R6)

HTML Report Redesign

Outlined a redesign for the HTML evaluation report, including a new visual dashboard, dark mode, inline SVG charts, and improved layout for criteria and case results. The plan details splitting the template into focused files for maintainability and offline sharing. (EVAL_CLI_PLAN.md EVAL_CLI_PLAN.mdR1-R340)

Documentation and Template Enhancements

Added references to new and updated documentation for unit testing and evaluation in the skill template's metadata and documentation lists. (agentflow_cli/cli/templates/skills/agent-skills/SKILL.md [1] [2]

Miscellaneous Fixes

Excluded .ruff_cache from template copying to avoid unnecessary files in initialized projects. (agentflow_cli/cli/commands/init.py agentflow_cli/cli/commands/init.pyL19-R19)

These changes lay the groundwork for a more robust, user-friendly, and visually informative testing and evaluation workflow in Agentflow.

…ge examples

… discovery

…ation modules

…otocol

codecov · 2026-05-19T16:22:57Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Iamsdt added 4 commits May 19, 2026 16:37

feat: add evaluation and unit testing documentation with detailed usa…

bf5b623

…ge examples

feat: add eval configuration retrieval and unit tests for EvalCommand…

5929874

… discovery

feat: update eval command configuration and improve handling of evalu…

789b0ea

…ation modules

feat: streamline evaluation module execution by prioritizing run() pr…

834a33b

…otocol

Iamsdt merged commit dd9267e into main May 19, 2026
1 check passed

Iamsdt deleted the qa3 branch May 19, 2026 16:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qa3#34

Qa3#34
Iamsdt merged 4 commits into
mainfrom
qa3

Iamsdt commented May 19, 2026

Uh oh!

Uh oh!

codecov Bot commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Iamsdt commented May 19, 2026

Uh oh!

Uh oh!

codecov Bot commented May 19, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant