Skip to content

feat: Add eval-harness for behavioral regression testing via Promptfoo and DeepEval#30

Merged
vinodvx merged 2 commits into
mainfrom
feat/eval-harness
Jun 19, 2026
Merged

feat: Add eval-harness for behavioral regression testing via Promptfoo and DeepEval#30
vinodvx merged 2 commits into
mainfrom
feat/eval-harness

Conversation

@vinodvx

@vinodvx vinodvx commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Description

Adds eval-harness for behavioral regression testing — mock LLM/tools, JSON run output, assertions on content, llm_usage, and telemetry.

  • Go runner + shared run_agent.sh
  • Promptfoo (eval-harness/promptfoo/)
  • DeepEval (eval-harness/deepeval/)
  • CI job on PRs
  • README updates

Test plan

  • [*] go run ./eval-harness/runner
  • [*] cd eval-harness/promptfoo && npx promptfoo eval -c config.yaml
  • [*] cd eval-harness/deepeval && pytest test_agent.py -v

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Refactor / chore

Related issues

Closes #28

Checklist

  • I have run make check
  • I have run task examples:all
  • I have run make tidy if I added or removed dependencies
  • Commit messages follow conventional commits (e.g. feat:, fix:, docs:)
  • I have added/updated tests for my changes
  • Documentation is updated if needed

@vinodvx vinodvx merged commit 806ca42 into main Jun 19, 2026
4 checks passed
@vinodvx vinodvx deleted the feat/eval-harness branch June 19, 2026 04:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eval harness - Add AgentTelemetry to AgentRunResult

1 participant