fix: omit unsupported GPT-5 temperature overrides by jakepresent · Pull Request #95 · responsibleai/ASSERT

jakepresent · 2026-05-26T18:29:32Z

Summary

omit non-default temperature from GPT-5.x LiteLLM payloads, including Azure deployment names like azure/gpt-5.4-1
keep explicit default temperature=1 and keep custom temperatures for non-GPT-5 models
cover chat payload normalization with focused model-client tests

Validation

python -m pytest tests/test_model_client.py -q
python -m pytest tests/test_model_client.py tests/test_runtime_modes.py -q

Bug bash context

During setup smoke, GPT-5.4 Azure deployments rejected config temperatures like 0.0/0.2 because they only accept the provider default temperature. This keeps existing configs from failing when the selected deployment is GPT-5.x.

Copilot

Pull request overview

This PR updates request payload construction in p2m to avoid sending unsupported non-default temperature values for GPT-5.x deployments (notably Azure deployment-name models), and also extends the viewer’s metrics pipeline/UI to show policy-violation rates split by behavior permissibility (allowed vs blocked requests) by loading judge taxonomy data.

Changes:

Omit non-default temperature overrides for GPT-5.x models when building LiteLLM chat + Responses API payloads (while preserving explicit temperature=1 and custom temps for non–GPT-5 models).
Add server-side metric aggregation for policy-violation outcomes split by behavior permissibility, and surface it in the viewer run page UI for both prompt and audit tabs.
Add viewer-side taxonomy loading helpers + tests covering permissibility aggregation and taxonomy loading behavior.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
viewer/src/routes/suite/[suite_id]/[run_id]/+page.svelte	Renders new “Allowed/Blocked requests failed” summary cards for prompts and audits.
viewer/src/lib/types.ts	Extends `RunMetrics`/`AuditRunMetrics` types with permissibility-split fields.
viewer/src/lib/server/metrics.ts	Computes policy-violation aggregates split by permissibility from node judgments.
viewer/src/lib/server/data.ts	Loads behaviors from taxonomy and threads them into metrics computation + view models.
viewer/src/lib/server/artifacts.ts	Adds helpers to load judge taxonomy from artifacts/config/run directory.
tests/test_viewer_server_artifacts.py	Adds Node harness tests for permissibility metrics + taxonomy loading.
tests/test_model_client.py	Adds focused tests for GPT-5 temperature omission/retention behavior.
p2m/core/model_client.py	Implements GPT-5 temperature override omission in payload builders.

+		{#if data.metrics.policyViolationOnPermissible || data.metrics.policyViolationOnNotPermissible}
+			{@const promptPerm = data.metrics.policyViolationOnPermissible}
+			{@const promptNotPerm = data.metrics.policyViolationOnNotPermissible}
+			{#if (promptPerm?.count ?? 0) + (promptNotPerm?.count ?? 0) > 0}
+				<div class="mb-4 grid gap-3 sm:grid-cols-2" title="Per-behavior judgments aggregated across prompts. Denominator is judgments the judge marked relevant for that behavior.">


+	const rawTaxonomyPath = typeof judge?.taxonomy_path === 'string' ? judge.taxonomy_path : null;
+	if (!rawTaxonomyPath) return null;
+
+	const resolved = path.resolve(rawTaxonomyPath);
+	return readJsonFile<Taxonomy>(resolved, { missingOk: true });


+    temperature = _temperature_for_payload(model, resolved_options.temperature)
+    if temperature is not None:
+        payload["temperature"] = temperature


+	artifacts: Record<string, unknown> | null
+): Taxonomy | null {
+	const systematize = readObject(artifacts?.systematize);
+	const artifactTaxonomyPath = typeof systematize?.path === 'string' ? systematize.path : null;


changliu2 · 2026-05-30T16:30:23Z

@jakepresent — Build triage on the internal UX-testing chat feedback (5/28–5/29) flagged this PR as the central fix for #50. With #150 merged (literal temperature: 0.7 gone from example YAMLs), this becomes the durable fix so future GPT-5 SKUs don't paper-cut users with custom temperatures.

The PR is on old p2m/ paths; needs rebase onto current main (which has the assert_eval/ rename). Could you rebase? If not, I can do a path-rename pass on a side branch for you to cherry-pick.

Full triage rollup in session artifact: build-triage-final-rollup.md.

jakepresent added 2 commits May 26, 2026 14:21

viewer: split policy violation metrics by permissibility

5c794d3

fix: omit unsupported GPT-5 temperature overrides

86ae430

jakepresent requested a review from changliu2 May 26, 2026 18:35

jakepresent force-pushed the jakepresent/gpt5-temperature-normalization branch 2 times, most recently from f294981 to 86ae430 Compare May 26, 2026 18:56

AaronAspinwall123 requested a review from Copilot May 26, 2026 19:31

Copilot started reviewing on behalf of AaronAspinwall123 May 26, 2026 19:31 View session

Copilot AI reviewed May 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: omit unsupported GPT-5 temperature overrides#95

fix: omit unsupported GPT-5 temperature overrides#95
jakepresent wants to merge 2 commits into
mainfrom
jakepresent/gpt5-temperature-normalization

jakepresent commented May 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

changliu2 commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jakepresent commented May 26, 2026

Summary

Validation

Bug bash context

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

changliu2 commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants