Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces a comprehensive plan and initial implementation for improving Agentflow's testing and evaluation CLI commands and the evaluation report dashboard. The main changes include adding new CLI commands (
agentflow testandagentflow eval), enhancing configuration and auto-discovery for evaluations, and redesigning the HTML report for better usability and visual clarity. Additional improvements include template updates, documentation enhancements, and minor fixes.CLI Command Additions and Improvements
agentflow testandagentflow evalcommands, specifying their behaviors, configuration, and integration withagentflow.json. The plan includes auto-discovery of evaluation files, improved report generation, and CLI options for customization. (EVAL_CLI_PLAN.mdEVAL_CLI_PLAN.mdR1-R340)evaluationkey in the dev template'sagentflow.jsonto support new evaluation features. (agentflow_cli/cli/templates/dev/agentflow.jsonagentflow_cli/cli/templates/dev/agentflow.jsonL5-R11)Evaluation Protocol and Discovery
get_eval_set()for CLI discovery, with optionalget_eval_config()for configuration, andrun()as an escape hatch for custom evaluation flows. Improved warning messages for missing entry points. (agentflow_cli/cli/commands/eval.py[1] [2] [3]weather_agents_eval.pyto export aget_eval_config()function, making it compatible with the new CLI auto-discovery. (agentflow_cli/cli/templates/prod/evals/weather_agents_eval.pyagentflow_cli/cli/templates/prod/evals/weather_agents_eval.pyL1-R6)HTML Report Redesign
EVAL_CLI_PLAN.mdEVAL_CLI_PLAN.mdR1-R340)Documentation and Template Enhancements
agentflow_cli/cli/templates/skills/agent-skills/SKILL.md[1] [2]Miscellaneous Fixes
.ruff_cachefrom template copying to avoid unnecessary files in initialized projects. (agentflow_cli/cli/commands/init.pyagentflow_cli/cli/commands/init.pyL19-R19)These changes lay the groundwork for a more robust, user-friendly, and visually informative testing and evaluation workflow in Agentflow.