Skip to content

Qa3#34

Merged
Iamsdt merged 4 commits into
mainfrom
qa3
May 19, 2026
Merged

Qa3#34
Iamsdt merged 4 commits into
mainfrom
qa3

Conversation

@Iamsdt
Copy link
Copy Markdown
Collaborator

@Iamsdt Iamsdt commented May 19, 2026

This pull request introduces a comprehensive plan and initial implementation for improving Agentflow's testing and evaluation CLI commands and the evaluation report dashboard. The main changes include adding new CLI commands (agentflow test and agentflow eval), enhancing configuration and auto-discovery for evaluations, and redesigning the HTML report for better usability and visual clarity. Additional improvements include template updates, documentation enhancements, and minor fixes.

CLI Command Additions and Improvements

  • Added a detailed plan for the new agentflow test and agentflow eval commands, specifying their behaviors, configuration, and integration with agentflow.json. The plan includes auto-discovery of evaluation files, improved report generation, and CLI options for customization. (EVAL_CLI_PLAN.md EVAL_CLI_PLAN.mdR1-R340)
  • Implemented the evaluation key in the dev template's agentflow.json to support new evaluation features. (agentflow_cli/cli/templates/dev/agentflow.json agentflow_cli/cli/templates/dev/agentflow.jsonL5-R11)

Evaluation Protocol and Discovery

  • Updated the evaluation file protocol to prioritize get_eval_set() for CLI discovery, with optional get_eval_config() for configuration, and run() as an escape hatch for custom evaluation flows. Improved warning messages for missing entry points. (agentflow_cli/cli/commands/eval.py [1] [2] [3]
  • Updated the prod template's weather_agents_eval.py to export a get_eval_config() function, making it compatible with the new CLI auto-discovery. (agentflow_cli/cli/templates/prod/evals/weather_agents_eval.py agentflow_cli/cli/templates/prod/evals/weather_agents_eval.pyL1-R6)

HTML Report Redesign

  • Outlined a redesign for the HTML evaluation report, including a new visual dashboard, dark mode, inline SVG charts, and improved layout for criteria and case results. The plan details splitting the template into focused files for maintainability and offline sharing. (EVAL_CLI_PLAN.md EVAL_CLI_PLAN.mdR1-R340)

Documentation and Template Enhancements

  • Added references to new and updated documentation for unit testing and evaluation in the skill template's metadata and documentation lists. (agentflow_cli/cli/templates/skills/agent-skills/SKILL.md [1] [2]

Miscellaneous Fixes

These changes lay the groundwork for a more robust, user-friendly, and visually informative testing and evaluation workflow in Agentflow.

@Iamsdt Iamsdt merged commit dd9267e into main May 19, 2026
1 check passed
@Iamsdt Iamsdt deleted the qa3 branch May 19, 2026 16:22
@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant