Release v0.3.19 by placerda · Pull Request #298 · Azure/agentops

placerda · 2026-06-10T17:36:28Z

Release v0.3.19

Automated release branch created from develop.

What happened

Branch release/v0.3.19 created from develop
CHANGELOG.md updated: versioned section [0.3.19] added
Plugin versions synced to 0.3.19 (package.json, plugin.json, marketplace.json)
Staging pipeline triggered automatically (build → TestPyPI + VSIX pre-release → verify)

Next steps

Wait for the Staging pipeline to pass
Review and approve this PR
Merge to main
Tag and push: git tag v0.3.19 && git push origin v0.3.19
Approve the PyPI publish and VSIX stable publish in the Release workflow
Sync develop: git checkout develop && git merge main && git push origin develop

Checklist

Staging pipeline passes (build + TestPyPI + VSIX pre-release + verify)
CHANGELOG entries reviewed
PR approved and merged to main
Tag v0.3.19 pushed
PyPI publish approved
VSIX stable publish approved
develop synced from main

The `What is smoke-core?` callout previously sat in step 10 right after `agentops eval init`, but the evaluator name is only actually needed in step 11 where `Find the evaluator name` already explains the same content in proper context. Removing the duplicate keeps step 10 focused on running the smoke gate. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Step 11 previously told readers to `find the entry with a local_uri` without saying what that evaluator (`smoke-core` in the seeded eval.yaml) actually is or where it came from. Add a short paragraph right before `Find the evaluator name` that explains the difference between the built-in evaluators and the auto-generated local rubric evaluator, and why `rubrics:` in agentops.yaml reference its name. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

) azd ai agent eval emits one aggregate pass-rate metric per evaluator (coherence, fluency, smoke-core), not one metric per rubric dimension. Step 11.3 previously instructed readers to set thresholds on the dimension ids (correct_itinerary, adherence_to_constraints, clear_practical_notes), which always fails with `threshold metric(s) not found in azd results`. Switch the example thresholds to the evaluator names azd actually emits (0..1 pass-rate scale) and add a callout explaining why dimension-level thresholds are not supported today. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The tutorial previously stopped at the terminal's aggregate pass/fail output, which makes the eval steps feel like a black box. Add a reusable `See the run in the Foundry portal` section at the end of step 10 (deep-link via results report_url + manual nav + what each pane shows) and link back to it from the step 11 multi-turn re-run and the step 11.4 rubric re-run. Step 11.4 also gets an explicit walkthrough of the `View rubric details` modal so users see the per-dimension judge scores even though the CLI threshold lives on the aggregate. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…k, terser logs (#297) Four user-visible issues with `execution: azd` runs: 1. `report.md` shipped an empty `Dataset:` line because the eval.yaml parser only recognized `dataset_reference:` and `azd ai agent eval init` actually emits `dataset_file:`. EvalRecipe now accepts `dataset_file:` and `_recipe_dataset_path` prefers it. 2. `report.md` shipped a header-only `## Rows` table on every azd run (azd does not expose per-row metrics through `agentops eval run`). The reporter now omits the row sections when `result.rows` is empty and instead emits a `## Per-row breakdown` callout linking to the Foundry run. 3. The CLI did not surface the Foundry run URL. `azd_runner.normalize_to_results` now captures `report_url` from the azd payload and the CLI prints a `Foundry run:` line next to the results paths. 4. `Running azd backend: azd --no-prompt ai agent eval run --config <long absolute path> --output json` was unreadable; replaced with `Running azd backend: azd ai agent eval run` (full command stays in the failure debug logs added in 0.3.18). The `delegating to azd ai agent eval` startup line also uses a workspace-relative recipe path. 921 unit tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

placerda and others added 6 commits June 10, 2026 11:22

chore: prepare release 0.3.19

62ddf96

placerda temporarily deployed to staging June 10, 2026 17:39 — with GitHub Actions Inactive

placerda merged commit 77b3dd7 into main Jun 10, 2026
5 checks passed

placerda deleted the release/v0.3.19 branch June 10, 2026 17:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release v0.3.19#298

Release v0.3.19#298
placerda merged 6 commits into
mainfrom
release/v0.3.19

placerda commented Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

placerda commented Jun 10, 2026

Release v0.3.19

What happened

Next steps

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant