[Evaluation] Fix RedTeam.scan() decoding encoded attack prompts in stored results by huliang-microsoft · Pull Request #47536 · Azure/azure-sdk-for-python

huliang-microsoft · 2026-06-16T23:59:27Z

Problem

RedTeam.scan() was storing the decoded plaintext objective in
evaluation_results.json / results.json for every converter-based attack
strategy (Base64, Flip, Morse, ROT13, Caesar, Leetspeak,
AsciiArt, AnsiAttack, Atbash, Binary, CharacterSpace, CharSwap,
Diacritic, StringJoin, SuffixAppend, UnicodeConfusable,
UnicodeSubstitution, Url, AsciiSmuggler, Tense). The target callback
actually received the encoded converted_value, but persisted
attack_details[].conversation[].content was the pre-converter
original_value. Customer impact (from #47228):

Impossible to audit/verify the attack surface post-scan.
Cannot debug why specific encoding variants succeeded / failed.
Cannot correlate target responses to the exact encoded prompts received.
attack_technique metadata correctly said e.g. "base64", but
conversation content was plaintext — breaking reproducibility.

Root cause

In FoundryResultProcessor._build_messages_from_pieces, user turns
deliberately preferred original_value over converted_value, dropping the
on-wire payload from the persisted conversation.

Fix

content now always reflects what was sent on the wire
(converted_value, falling back to original_value only when
converted_value is empty) for both user and assistant turns.
The pre-converter adversarial objective is preserved as a new sibling
original_value field on user messages only when it differs from
content, so the audit trail of "what the attack meant to say" is not
lost and baseline (non-encoded) strategies stay byte-identical to the
old output.

Tests

Updated test_build_messages_from_pieces to assert no original_value
field appears when there is no encoding (no regression for baseline).
Added test_build_messages_preserves_encoded_user_prompt (Base64
example) — primary regression test for Azure AI Evaluation's RedTeam.scan() method decodes **all** encoded attack prompts when storing the result files #47228.
Added test_build_messages_falls_back_to_original_when_converted_missing
— covers converted_value is None path.

CHANGELOG

Added a 1.17.1 (Unreleased) entry under ### Bugs Fixed.

cc @singankit @w-javed

For converter-based attack strategies (Base64, Flip, Morse, ROT13, Caesar, Leetspeak, AsciiArt, AnsiAttack, Atbash, Binary, CharacterSpace, CharSwap, Diacritic, StringJoin, SuffixAppend, UnicodeConfusable, UnicodeSubstitution, Url, AsciiSmuggler, Tense), FoundryResultProcessor was emitting the decoded 'original_value' as the user-message content while the target was actually receiving 'converted_value'. This made evaluation_results.json / results.json show plaintext where the audit trail should show the encoded payload, breaking post-scan auditability and per-variant debugging. This change makes conversation[].content always reflect the on-wire value (converted_value) for both user and assistant turns, and preserves the pre-converter objective as a sibling 'original_value' field on user messages whenever it differs. Baseline (non-encoded) strategies are unaffected since original_value == converted_value. Adds two regression tests in TestFoundryResultProcessor and a CHANGELOG entry. Resolves Azure#47228.

github-actions · 2026-06-17T00:00:02Z

Thank you for your contribution @huliang-microsoft! We will review the pull request and get back to you soon.

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Fixes persisted red-team conversations so they reflect the actual encoded payload sent to the target (converter output), while preserving the pre-conversion prompt for auditability.

Changes:

Update FoundryResultProcessor._build_messages_from_pieces() to store converted_value as content and add original_value only when it differs.
Add unit tests covering encoded user prompts and fallback behavior when converted_value is missing.
Document the bug fix and new persisted field in the changelog.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/red_team/_foundry/_foundry_result_processor.py	Changes message serialization to persist wire payload and optionally include `original_value` for auditing.
sdk/evaluation/azure-ai-evaluation/tests/unittests/test_redteam/test_foundry.py	Adds regression + behavior tests for encoded prompts and fallback behavior.
sdk/evaluation/azure-ai-evaluation/CHANGELOG.md	Documents the behavior change and new `original_value` field for persisted conversations.

huliang-microsoft · 2026-06-17T00:06:12Z

+            original = getattr(piece, "original_value", None)
+            converted = getattr(piece, "converted_value", None)
+            if isinstance(converted, str) and converted:
+                content = converted
+            elif isinstance(original, str) and original:
+                content = original
            else:
-                content = getattr(piece, "converted_value", None) or getattr(piece, "original_value", "")
+                content = ""


Addressed in 493577f — dropped the isinstance(str) guards around content selection so non-string converted_value / original_value (bytes, structured/multimodal payloads) pass through unchanged. The str check is kept only on the original_value audit-field emission, where comparing two non-text values for inequality would be meaningless.

huliang-microsoft · 2026-06-17T00:06:13Z

+        assistant_piece.prompt_metadata = {}
+        assistant_piece.labels = {}
+
+        messages = processor._build_messages_from_pieces([user_piece, assistant_piece])


Added in 493577f — test_build_messages_preserves_non_string_payloads covers a list-of-parts user payload and a bytes assistant payload, asserting both survive on content without being coerced to "".

huliang-microsoft · 2026-06-17T00:06:15Z

+
+### Bugs Fixed
+
+- Fixed `RedTeam.scan()` storing decoded plaintext instead of the actual encoded payload for converter-based attack strategies (`Base64`, `Flip`, `Morse`, `ROT13`, `Caesar`, `Leetspeak`, `AsciiArt`, `AnsiAttack`, `Atbash`, `Binary`, `CharacterSpace`, `CharSwap`, `Diacritic`, `StringJoin`, `SuffixAppend`, `UnicodeConfusable`, `UnicodeSubstitution`, `Url`, `AsciiSmuggler`, `Tense`) in `evaluation_results.json` / `results.json`. The persisted `conversation[].content` for user turns now reflects what the target actually received (`converted_value`); the pre-converter adversarial objective is preserved on the same message as a new `original_value` field so the audit trail of what the attack meant to say is not lost. Baseline (non-encoded) strategies are unaffected. Resolves [Azure/azure-sdk-for-python#47228](https://github.com/Azure/azure-sdk-for-python/issues/47228).


Addressed in 493577f — wrapped the 1.17.1 entry across multiple lines, dropped the exhaustive strategy enumeration (now reads Base64, Flip, Morse, ROT13, etc.), and kept the key behavior change (content uses converted_value, new original_value audit field) plus the issue link.

…en changelog - _foundry_result_processor.py: stop forcing converted_value/original_value through isinstance(str) when computing content. Bytes / structured multimodal payloads now pass through unchanged; the original_value audit field is still gated on both sides being str so cross-type inequality cannot produce a misleading field. - test_foundry.py: add test_build_messages_preserves_non_string_payloads covering list-of-parts and bytes payloads. - CHANGELOG.md: wrap the 1.17.1 entry across multiple lines and drop the exhaustive strategy enumeration.

Copilot AI review requested due to automatic review settings June 16, 2026 23:59

huliang-microsoft requested a review from a team as a code owner June 16, 2026 23:59

github-actions Bot added Community Contribution Community members are working on the issue customer-reported Issues that are reported by GitHub users external to the Azure organization. Evaluation Issues related to the client library for Azure AI Evaluation labels Jun 17, 2026

Copilot AI reviewed Jun 17, 2026

View reviewed changes

tangym approved these changes Jun 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Evaluation] Fix RedTeam.scan() decoding encoded attack prompts in stored results#47536

[Evaluation] Fix RedTeam.scan() decoding encoded attack prompts in stored results#47536
huliang-microsoft wants to merge 2 commits into
Azure:mainfrom
huliang-microsoft:fix/redteam-encoded-prompts-fidelity

huliang-microsoft commented Jun 16, 2026

Uh oh!

github-actions Bot commented Jun 17, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

huliang-microsoft Jun 17, 2026

Uh oh!

huliang-microsoft Jun 17, 2026

Uh oh!

huliang-microsoft Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		### Bugs Fixed

		- Fixed `RedTeam.scan()` storing decoded plaintext instead of the actual encoded payload for converter-based attack strategies (`Base64`, `Flip`, `Morse`, `ROT13`, `Caesar`, `Leetspeak`, `AsciiArt`, `AnsiAttack`, `Atbash`, `Binary`, `CharacterSpace`, `CharSwap`, `Diacritic`, `StringJoin`, `SuffixAppend`, `UnicodeConfusable`, `UnicodeSubstitution`, `Url`, `AsciiSmuggler`, `Tense`) in `evaluation_results.json` / `results.json`. The persisted `conversation[].content` for user turns now reflects what the target actually received (`converted_value`); the pre-converter adversarial objective is preserved on the same message as a new `original_value` field so the audit trail of what the attack meant to say is not lost. Baseline (non-encoded) strategies are unaffected. Resolves [Azure/azure-sdk-for-python#47228](https://github.com/Azure/azure-sdk-for-python/issues/47228).

Conversation

huliang-microsoft commented Jun 16, 2026

Problem

Root cause

Fix

Tests

CHANGELOG

Uh oh!

github-actions Bot commented Jun 17, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

huliang-microsoft Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

huliang-microsoft Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

huliang-microsoft Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants