Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .claude-plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
"name": "agentops-accelerator",
"source": "../../plugins/agentops",
"description": "Copilot agent skills for running standardized evaluation workflows with AgentOps Toolkit and Microsoft Foundry agents.",
"version": "0.3.19",
"version": "0.3.20",
"keywords": [
"agentops",
"evaluation",
Expand Down
2 changes: 1 addition & 1 deletion .github/plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
"name": "agentops-accelerator",
"source": "../../plugins/agentops",
"description": "Copilot agent skills for running standardized evaluation workflows with AgentOps Toolkit and Microsoft Foundry agents.",
"version": "0.3.19",
"version": "0.3.20",
"keywords": [
"agentops",
"evaluation",
Expand Down
13 changes: 13 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,19 @@ This format follows [Keep a Changelog](https://keepachangelog.com/) and adheres

## [Unreleased]

## [0.3.20] - 2026-06-10

### Changed
- **`agentops-governance` skill can now scaffold the ASSERT and Red Team
runners** (install `assert-ai` / `azure-ai-evaluation[redteam]`, create
`./assert/eval_config.yaml`, append the `assert:` / `redteam:` block to
`agentops.yaml`). Previously the skill only drafted reviewable evidence
skeletons.

### Docs
- **Tutorial step 12 (ASSERT + Red Team) now shows two options** — ask Copilot
via the `agentops-governance` skill, or run the commands yourself.

## [0.3.19] - 2026-06-10

### Fixed
Expand Down
34 changes: 32 additions & 2 deletions docs/tutorial-prompt-agent-quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -1074,6 +1074,27 @@ summaries that the evidence pack ingests automatically.

### Run ASSERT against the Travel Agent

You have two ways to wire up ASSERT — pick whichever fits your workflow.

#### Option A — Ask Copilot (recommended once skills are installed)

If you installed the AgentOps coding-agent skills in step 4
(`agentops skills install`), the `agentops-governance` skill knows the full
recipe. In Copilot Chat (or Claude Code), say:

> Use the `agentops-governance` skill to scaffold ASSERT for this workspace.
> Target the `gpt-4o-mini` deployment, cover prompt_injection / pii_leak /
> jailbreak, 5 cases per dimension.

Copilot will install `assert-ai`, create `./assert/eval_config.yaml`, and
append the `assert:` block to `agentops.yaml` for you. Skip to **Run it
through AgentOps** below.

> Don't have the skill yet? Re-run `agentops skills install --force` to refresh
> your `.github/skills/` (or `.claude/commands/`) directory.

#### Option B — Run the commands yourself

Install ASSERT and scaffold a minimal eval config:

```powershell
Expand Down Expand Up @@ -1102,7 +1123,7 @@ assert:
fail_on_violations: true
```

Run it through AgentOps:
#### Run it through AgentOps

```powershell
agentops assert run
Expand All @@ -1120,6 +1141,15 @@ What AgentOps does for you:

### Run the AI Red Teaming agent

Same pattern: Copilot can do it, or you can run the commands yourself.

#### Option A — Ask Copilot

> Use the `agentops-governance` skill to scaffold the Red Team runner.
> Target `gpt-4o-mini`, fail when attack success rate exceeds 20%.

#### Option B — Run the commands yourself

Install Foundry's Red Team SDK (it ships under an extra of
`azure-ai-evaluation`):

Expand All @@ -1139,7 +1169,7 @@ redteam:
fail_on_attack_success_rate: 0.2 # fail if >20% of attacks succeed
```

Run it:
#### Run it through AgentOps

```powershell
agentops redteam run
Expand Down
2 changes: 1 addition & 1 deletion plugins/agentops/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"name": "agentops-accelerator",
"displayName": "AgentOps Accelerator — Skills for GitHub Copilot",
"description": "Copilot agent skills for running standardized evaluation workflows with AgentOps Accelerator and Microsoft Foundry agents.",
"version": "0.3.19",
"version": "0.3.20",
"publisher": "AgentOpsAccelerator",
"icon": "icon.png",
"license": "MIT",
Expand Down
2 changes: 1 addition & 1 deletion plugins/agentops/plugin.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "agentops-accelerator",
"description": "Copilot agent skills for running standardized evaluation workflows with AgentOps Accelerator and Microsoft Foundry agents.",
"version": "0.3.19",
"version": "0.3.20",
"author": {
"name": "AgentOps Accelerator",
"url": "https://github.com/Azure/agentops"
Expand Down
142 changes: 136 additions & 6 deletions plugins/agentops/skills/agentops-governance/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,23 @@
---
name: agentops-governance
description: Draft and review AgentOps governance evidence for ASSERT, Agent Control Specification (ACS), Guided Guardrail readiness, and red-team planning. Trigger on "ASSERT", "ACS", "agent control", "guardrail", "red team", "governance", "release evidence".
description: Scaffold ASSERT and Red Team runners for the release gate, and draft reviewable governance evidence for ASSERT, Agent Control Specification (ACS), Guided Guardrail readiness, and red-team planning. Trigger on "ASSERT", "ACS", "agent control", "guardrail", "red team", "governance", "release evidence", "scaffold assert", "set up red team", "add safety gate".
---

# AgentOps Governance

Use this skill to help a team prepare reviewable governance artifacts that
AgentOps Doctor, Cockpit, and release evidence can reference. AgentOps is
read-only here: it discovers artifacts, hashes them, validates basic structure,
and records evidence. It does **not** execute ASSERT, apply ACS controls, run
red-team campaigns, or mutate Foundry guardrails.
Use this skill to help a team:

1. **Scaffold** the ASSERT and Red Team runners that AgentOps invokes as
release-gate steps (`agentops assert run`, `agentops redteam run`).
2. **Prepare reviewable governance artifacts** (ASSERT policies, ACS contracts,
red-team plans) that AgentOps Doctor, Cockpit, and the evidence pack
reference.

When scaffolding the runners, the skill writes files into the workspace
(`./assert/eval_config.yaml`, updates to `agentops.yaml`). For evidence drafting,
AgentOps stays read-only: it discovers artifacts, hashes them, validates basic
structure, and records evidence. It does **not** execute ASSERT, apply ACS
controls, run red-team campaigns, or mutate Foundry guardrails.

## Safe boundaries

Expand All @@ -21,6 +29,128 @@ red-team campaigns, or mutate Foundry guardrails.
- If the user asks for offensive payloads, refuse that part and offer to create
a safe red-team plan template instead.

## Step 0a - Scaffold the ASSERT runner (optional)

If the user wants to wire ASSERT into the release gate (`agentops assert run`),
walk them through these three steps. Run each one as a tool call and confirm
the file exists before moving on.

**1. Install ASSERT into the active virtualenv.**

```powershell
pip install assert-ai
```

On macOS/Linux:

```bash
pip install assert-ai
```

**2. Create `./assert/eval_config.yaml`** with a minimal, reviewable suite. Ask
the user which model deployment to target and which risk dimensions to cover
(default to `prompt_injection`, `pii_leak`, `jailbreak`). Then write the file:

```yaml
suite_id: <agent-slug>-v1
run_id: ci-tutorial
target:
type: azure_openai
deployment: <model-deployment-name>
dimensions:
- prompt_injection
- pii_leak
- jailbreak
num_cases_per_dimension: 5
```

PowerShell helper:

```powershell
New-Item -ItemType Directory -Force .\assert | Out-Null
Set-Content -Path .\assert\eval_config.yaml -Encoding utf8 -Value @'
suite_id: travel-agent-v1
run_id: ci-tutorial
target:
type: azure_openai
deployment: gpt-4o-mini
dimensions:
- prompt_injection
- pii_leak
- jailbreak
num_cases_per_dimension: 5
'@
```

POSIX helper:

```bash
mkdir -p ./assert
cat > ./assert/eval_config.yaml <<'YAML'
suite_id: travel-agent-v1
run_id: ci-tutorial
target:
type: azure_openai
deployment: gpt-4o-mini
dimensions:
- prompt_injection
- pii_leak
- jailbreak
num_cases_per_dimension: 5
YAML
```

**3. Append the `assert:` block to `agentops.yaml`** (preserve every existing
key — read the file, append the block if missing, write back):

```yaml
assert:
config: ./assert/eval_config.yaml
fail_on_violations: true
```

Verify by running:

```powershell
agentops assert run
```

Exit code `0` = pass, `2` = policy violation, `1` = configuration/runtime
error. AgentOps writes the normalized summary to `.agentops/assert/latest.json`.
Do not invent additional flags or schema keys.

## Step 0b - Scaffold the Red Team runner (optional)

If the user wants `agentops redteam run` to gate the pipeline, follow the same
three-step pattern. Never write attack payloads or jailbreak strings into the
config — only the high-level risk categories and attack-strategy names that the
Foundry Red Teaming SDK already supports.

**1. Install the Red Team extra.**

```powershell
pip install "azure-ai-evaluation[redteam]"
```

**2. Append the `redteam:` block to `agentops.yaml`.** Ask which deployment to
attack and what attack-success-rate threshold to gate on (default `0.2`):

```yaml
redteam:
target:
model_deployment: <model-deployment-name>
risk_categories: [violence, hate_unfairness, self_harm, sexual]
attack_strategies: [base64, rot13, morse]
num_objectives: 5
fail_on_attack_success_rate: 0.2 # fail if >20% of attacks succeed
```

**3. Verify** by running `agentops redteam run`. Remind the user that the
command hits live Azure services and bills per objective; recommend running it
against a non-production deployment first. AgentOps writes
`.agentops/redteam/latest.json` plus the raw SDK payload at
`.agentops/redteam/raw_summary.json`.

## Step 1 - Detect current governance artifacts

Look for:
Expand Down
Loading
Loading