Skip to content

Release v0.3.19#298

Merged
placerda merged 6 commits into
mainfrom
release/v0.3.19
Jun 10, 2026
Merged

Release v0.3.19#298
placerda merged 6 commits into
mainfrom
release/v0.3.19

Conversation

@placerda

Copy link
Copy Markdown
Contributor

Release v0.3.19

Automated release branch created from develop.

What happened

  • Branch release/v0.3.19 created from develop
  • CHANGELOG.md updated: versioned section [0.3.19] added
  • Plugin versions synced to 0.3.19 (package.json, plugin.json, marketplace.json)
  • Staging pipeline triggered automatically (build → TestPyPI + VSIX pre-release → verify)

Next steps

  1. Wait for the Staging pipeline to pass
  2. Review and approve this PR
  3. Merge to main
  4. Tag and push: git tag v0.3.19 && git push origin v0.3.19
  5. Approve the PyPI publish and VSIX stable publish in the Release workflow
  6. Sync develop: git checkout develop && git merge main && git push origin develop

Checklist

  • Staging pipeline passes (build + TestPyPI + VSIX pre-release + verify)
  • CHANGELOG entries reviewed
  • PR approved and merged to main
  • Tag v0.3.19 pushed
  • PyPI publish approved
  • VSIX stable publish approved
  • develop synced from main

placerda and others added 6 commits June 10, 2026 11:22
The `What is smoke-core?` callout previously sat in step 10 right after `agentops eval init`, but the evaluator name is only actually needed in step 11 where `Find the evaluator name` already explains the same content in proper context. Removing the duplicate keeps step 10 focused on running the smoke gate.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Step 11 previously told readers to `find the entry with a local_uri` without saying what that evaluator (`smoke-core` in the seeded eval.yaml) actually is or where it came from. Add a short paragraph right before `Find the evaluator name` that explains the difference between the built-in evaluators and the auto-generated local rubric evaluator, and why `rubrics:` in agentops.yaml reference its name.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
)

azd ai agent eval emits one aggregate pass-rate metric per evaluator (coherence, fluency, smoke-core), not one metric per rubric dimension. Step 11.3 previously instructed readers to set thresholds on the dimension ids (correct_itinerary, adherence_to_constraints, clear_practical_notes), which always fails with `threshold metric(s) not found in azd results`. Switch the example thresholds to the evaluator names azd actually emits (0..1 pass-rate scale) and add a callout explaining why dimension-level thresholds are not supported today.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The tutorial previously stopped at the terminal's aggregate pass/fail output, which makes the eval steps feel like a black box. Add a reusable `See the run in the Foundry portal` section at the end of step 10 (deep-link via results report_url + manual nav + what each pane shows) and link back to it from the step 11 multi-turn re-run and the step 11.4 rubric re-run. Step 11.4 also gets an explicit walkthrough of the `View rubric details` modal so users see the per-dimension judge scores even though the CLI threshold lives on the aggregate.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…k, terser logs (#297)

Four user-visible issues with `execution: azd` runs:

1. `report.md` shipped an empty `Dataset:` line because the eval.yaml parser only recognized `dataset_reference:` and `azd ai agent eval init` actually emits `dataset_file:`. EvalRecipe now accepts `dataset_file:` and `_recipe_dataset_path` prefers it.
2. `report.md` shipped a header-only `## Rows` table on every azd run (azd does not expose per-row metrics through `agentops eval run`). The reporter now omits the row sections when `result.rows` is empty and instead emits a `## Per-row breakdown` callout linking to the Foundry run.
3. The CLI did not surface the Foundry run URL. `azd_runner.normalize_to_results` now captures `report_url` from the azd payload and the CLI prints a `Foundry run:` line next to the results paths.
4. `Running azd backend: azd --no-prompt ai agent eval run --config <long absolute path> --output json` was unreadable; replaced with `Running azd backend: azd ai agent eval run` (full command stays in the failure debug logs added in 0.3.18). The `delegating to azd ai agent eval` startup line also uses a workspace-relative recipe path.

921 unit tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@placerda placerda merged commit 77b3dd7 into main Jun 10, 2026
5 checks passed
@placerda placerda deleted the release/v0.3.19 branch June 10, 2026 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant