Add refund-agent eval-fix behavior pair (deterministic shield bypass + semantic policy reasoning)#82
Add refund-agent eval-fix behavior pair (deterministic shield bypass + semantic policy reasoning)#82Copilot wants to merge 2 commits into
Conversation
changliu2
left a comment
There was a problem hiding this comment.
Review — Request changes
The two spec files (refund_authorization_bypass.md, refund_policy_reasoning_errors.md) are well-scoped: clean deterministic / semantic split, useful as starting points for a refund-agent demo. README pointer is fine.
What's missing relative to the asked scope (build a runnable 3-step eval-fix story mirroring the banking demo on the refund-agent at https://github.com/changliu2/refund-agent-a365):
- No
examples/refund_agent_*/directory (no agent code, no MCP/tool plumbing, notarget.callable) - No
eval_config_*.yamlfiles (A/B/C variants) - No
guardrails.yamlwith the deterministic shield gates theauthorization_bypassspec implies - No README with the 3-step story / headline table
- No measured numbers — nothing to validate the specs against
Net delivery is ~10% of the asked scope. Recommend one of:
- Close this PR and re-delegate with a stricter prompt requiring agent + configs + a runnable n=100 baseline
- Re-scope this PR to "behavior specs only" in the title + description and merge as scaffolding for the demo work to follow
- Extend this PR with the missing pieces
Happy with any of the three — flagging because as-is this doesn't deliver the demo it was opened to build.
|
Closing in favor of #88, the bank-manager 4-axis demo that's now landed n=100 numbers across four variants (unguarded → ACS gates → naïve DO-NOT prompt → ACS + GEPA prompt-optimized) and is the converged //build 2026 demo storyline. The refund-agent eval-fix pair here was a useful early exploration of the deterministic-vs-semantic shield pattern, but PR #88 covers the same story arc more completely in the banking domain — 9 judge dims, a real trade-off chart, and a GEPA notebook — and was where we converged the demo scope. Closing this PR. Thanks for the work — the framing influenced PR #88's 3-act structure. |
This PR adds a demo-ready behavior pair for the refund-agent eval-fix loop: one deterministic behavior intended to be fully mitigated by Agent Shield, and one semantic behavior that captures policy reasoning quality.
What this enables
Behavior specs added
examples/behavior_specs/refund_authorization_bypass.mdexamples/behavior_specs/refund_policy_reasoning_errors.mdReference index update
examples/behavior_specs/README.mdExample usage in eval config:
Original prompt
follow the demo storyline, find 2 beahvors (1 detemrinistic that agent shield can fix 100% and 1 semantics) to build a eval-fix loop demo on theis refud agent scenario. git@github.com:changliu2/refund-agent-a365.git
**Chronological Review:** 1. Session began with summarized prior context: //build 2026 demo prep, with prior work on validating RM-v3 and AS-MCP demos at n=400, and a planned 4-PR sequence. 2. User had just confirmed "1a yes 3 use the latest repo version and yes 4 yes" — green light on all 4 PRs, plus two viewer issues (duplicate policy_viol column, BY BEHAVIOR header overlap). 3. Previous work in this session had: removed `policy_viol` from 9 YAMLs, fixed viewer compare page label/truncation, added `agent_shield` extra to pyproject.toml, rewrote AS-MCP README. 4. Current session resumed by: updating RM-v3 README with n=400 numbers, creating cli-override-flag branch via cherry-pick from RM-v3, smoke-testing the override logic, committing + pushing 4 branches in parallel, drafting 4 PR bodies, getting user approval, opening 4 PRs via `gh pr create`, then cleaning up worktrees.Intent Mapping:
Technical Inventory:
gh pr createfor PR openingmicrosoft/adaptive-eval→microsoft/ASSERT(GitHub redirects transparent)git add -A, no public PR posting without user approval, Copilot co-author trailerCode Archaeology:
test_set.sample_size=10→ 5/5 splitProgress Assessment:
Context Validation:
Recent Commands Analysis:
Last commands were: (1) parallel
gh pr createfor #80 and #81, (2) verification of PR list, (3) worktree cleanup (one needed--forcedue to validate-as-mcp containing modified/untracked files from prior validation run).1. Conversation Overview: - Primary Objectives: //build 2026 demo prep for Chang Liu (PM DRI). User's most recent confirmation: "1a yes 3 use the latest repo version and yes 4 yes" — green light to fix the rubric bug, fix duplicate viewer columns, and open 4 PRs in sequence (viewer → CLI override → RM-v3 → AS-MCP). - Session Context: Continuation after compaction. Picked up mid-Wave-1 with most files already edited but uncommitted. This session: updated RM-v3 README, created cli-override-flag branch, smoke-tested override, committed + pushed all 4 branches, drafted PR bodies, got user approval ("Post all 4 as drafted"), opened all 4 PRs successfully, cleaned worktrees. - User Intent Evolution: Simple linear execution of pre-approved 4-PR plan. No new asks during this session.-
- Git worktrees: 4 parallel worktrees (p2m-wt-viewer, p2m-wt-cli, p2m-validate-rm-v3, p2m-validate-as-mcp) plus main checkout — all 4 worktrees now removed.
- GitHub CLI:
- PowerShell here-strings (
- Repo rename: remote URL
- Boundary audit regex on each commit's
-
- Updated this session: replaced old result tables with n=400 numbers
- Deterministic table: email_domain_viol/rm_book_viol/sanctions_viol/overrefusal, with policy_violation master 76→12→2% note
- Legal/tax table: legal_advice_viol 88.8%→88.0%→0.0%, overrefusal 72.3→72.0→66.3
- Committed in
- Cherry-picked from RM-v3 commit
- +44/-2 across 2 files; committed as
Technical Foundation:
gh pr create --base main --head <branch> --title ... --body-file .pr-body.mdpattern@'...'@) for multi-line PR bodieshttps://github.com/microsoft/adaptive-eval.gitstill works; GitHub redirects to actual repomicrosoft/ASSERT--cached --name-only(passed all 4)Codebase Status:
examples/private_banking_rm_v3_langchain/README.md(branchprivate-banking-rm-v3-langchain):1b4746bwith 6 YAML files (policy_viol block removal)p2m/cli.py+p2m/runner.py(branchcli-override-flag):8620b14viagit checkout origin/private-banking-rm-v3-langchain -- p2m/cli.py p2m/runner.py4a8a5b8_apply_config_overrides(raw, overrides)at runner.py:70 — supportstest_set.sample_size=Nshortcut (splits N half-half: cei...Created from Copilot CLI via the copilot delegate command.