Add evaluation cases for agent-factory, issue-ops-architect, and rule-distiller by weslleycapelari · Pull Request #3 · weslleycapelari/github-patterns

weslleycapelari · 2026-05-13T13:38:08Z

Introduce evaluation cases to enhance the testing framework for agents, ensuring proper handling of bad input scenarios and expected high-quality outputs. Update MEMORY.md with records from sprint N4 Hardening and set up for N5 sprint with locale synchronization and L4 stabilization.

…-distiller; enhance MEMORY.md with sprint N4 Hardening records

Copilot

Pull request overview

This PR expands the repository’s AI governance/eval hardening by adding new eval cases for core agents, recording sprint decisions, and introducing a PR governance audit step.

Changes:

Added bad-input and high-quality-output eval fixtures for agent-factory, issue-ops-architect, and rule-distiller.
Added sprint N4/N5 mission and memory records, plus roadmap updates for locale sync and L4 stabilization.
Added a marketplace-integrity workflow step to report agent eval coverage on PRs.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
`ROADMAP.md`	Updates sprint status, completed N4 work, and N5 backlog planning.
`library/github-baseline/MEMORY.md`	Adds sprint decision-memory entries for N4/N5.
`library/evals/rule-distiller/case-02-expected-high-quality-output.md`	Adds rule-distiller high-quality-output eval.
`library/evals/rule-distiller/case-01-bad-input.md`	Adds rule-distiller bad-input eval.
`library/evals/issue-ops-architect/case-02-expected-high-quality-output.md`	Adds issue-ops high-quality-output eval.
`library/evals/issue-ops-architect/case-01-bad-input.md`	Adds issue-ops bad-input eval.
`library/evals/agent-factory/case-02-expected-high-quality-output.md`	Adds agent-factory high-quality-output eval.
`library/evals/agent-factory/case-01-bad-input.md`	Adds agent-factory bad-input eval.
`.github/workflows/marketplace-integrity.yml`	Adds PR eval coverage audit reporting.
`.github/MISSIONS/roadmap-2026-05-26-sprint-n5-locale-sync.md`	Adds N5 locale-sync/L4 stabilization mission plan.
`.github/MISSIONS/roadmap-2026-05-12-sprint-n4-hardening.md`	Adds N4 hardening mission record.

Comments suppressed due to low confidence (3)

.github/MISSIONS/roadmap-2026-05-26-sprint-n5-locale-sync.md:30

The mission protocol is defined as one delegation hop per mission file, but this mission assigns Tasks 1/2 to documentation-steward and Task 3 to project-auditor. A valid Child_Agent field can contain only one child agent, so this should be split into separate mission files or represented as a single valid delegation hop before CI can pass.

## Parent Agent & Delegation

- **Parent Agent:** `@roadmap-steward`
- **Child Agents:**
  - `@documentation-steward` — Tasks 1 & 2
  - `@project-auditor` — Task 3

library/evals/rule-distiller/case-02-expected-high-quality-output.md:44

This PASS criterion expects the section title New Best Practices for English output, but the current rule-distiller agent contract classifies best-practice findings under Novas Boas Práticas. Without updating the agent contract in the same PR, this eval will fail the repository's current rule-distiller behavior rather than validating it.

1. File is created at the exact path `library/instructions/laravel-11.md`.
2. Breaking Changes and New Best Practices are in separate sections (not mixed). When output language preference is English, all section titles must be in English — section must be titled "New Best Practices" (not "Novas Boas Práticas").

.github/MISSIONS/roadmap-2026-05-26-sprint-n5-locale-sync.md:139

This CI requirement checks for Spanish text in the EN locale, but the documented drift this sprint is meant to catch is Spanish wording in PT-BR files (for example, Ejecute). As written, the planned validation could pass while the known PT-BR issue remains undetected.

2. Step should validate:
   - Files in `docs/en/` have PT-BR equivalents (and vice versa)
   - No Spanish text in EN locale (sanity check)
   - Files contain valid locale sync declaration block

+**Mission ID:** `MISSION-N5-2026-05-26`  
+**Sprint:** N5 — Locale Sync & L4 Stabilization  
+**Duration:** 2026-05-26 to 2026-06-09 (2 weeks)  
+**Status:** Approved & Active


 > This file is maintained by the `@roadmap-steward` agent. Do not edit manually unless updating strategic direction. See [Manual Edit Policy](#manual-edit-policy) below.

-## 🗓️ Current Sprint (MVP 1.9.x → 2.0.0)
+## 🗓️ Current Sprint (N5 — Locale Sync & L4 Stabilization — 2026-05-26 to 2026-06-09)


+- **agent**: `issue-ops-architect`
+- **case_id**: `IOA-01`
+- **status**: `draft`
+- **scenario_type**: `bad_input`


+- **agent**: `rule-distiller`
+- **case_id**: `RD-01`
+- **status**: `draft`
+- **scenario_type**: `bad_input`


+Migration checklist: yes
+Output language preference: English
+May I create/modify files after preview: yes


+| Sobrecarga operacional em time small              | Média      | Critérios binários em CI; não exigir perfeccionismo em evals de primeira versão. |
+| Cobertura de evals pode ser genérica inicialmente | Média      | Review gate obrigatório; iteração rápida em ciclo seguinte com feedback real.    |
+| MEMORY.md requer disciplina contínua              | Baixa      | Atribuir ownership ao context-steward; review semanal enxuta (15 min max).       |
+| Auditoria pode gerar falsos positivos             | Média      | Usar regras simples na primeira versão; refinar baseado em experience.           |


+
+**Why This Matters:**
+
+- Post-N4 audit identified PT-BR desincronizado (typos in Spanish, stale L4 references)


+- **ID:** MEM-20260513-01
+- **Date:** 2026-05-13
+- **Owner:** roadmap-steward
+- **Record:** Sprint N5 (Locale Sync & L4 Stabilization) approved. Post-N4 audit identified PT-BR desincronizado (stale L4 references, typos em espanhol). Sync automation never operationalized. N5 scope: 3 tasks — sync PT-BR, update docs, operationalize CI validation. Target: 2026-06-09. Mission file created.


+
+              cases = [
+                  f for f in os.listdir(agent_dir)
+                  if f.endswith(".md") and f != ".gitkeep"


weslleycapelari · 2026-05-13T16:52:36Z

@copilot apply changes based on the comments in this thread

Agent-Logs-Url: https://github.com/weslleycapelari/github-patterns/sessions/577201a0-68d1-468c-b38c-e3f9468cf6e8 Co-authored-by: weslleycapelari <28955078+weslleycapelari@users.noreply.github.com>

Copilot · 2026-05-13T17:04:03Z

@copilot apply changes based on the comments in this thread

Implemented the requested updates from the review thread in commit 07e2925. This includes mission protocol key compliance, eval matrix/index fixes, IOA/RD eval contract alignment (explicit strategy/CONFIRM behavior), wording/grammar fixes, workflow audit counting for non-.md case fixtures, and PR locale sync declaration fields (en_status, pt_br_status, pending_sync_tasks).

weslleycapelari added 2 commits May 13, 2026 08:11

feat: add eval cases for agent-factory, issue-ops-architect, and rule…

b209beb

…-distiller; enhance MEMORY.md with sprint N4 Hardening records

chore: N5 sprint setup — locale sync & L4 stabilization

801dfec

Copilot AI review requested due to automatic review settings May 13, 2026 13:38

Copilot started reviewing on behalf of weslleycapelari May 13, 2026 13:38 View session

Copilot AI reviewed May 13, 2026

View reviewed changes

weslleycapelari assigned Copilot May 13, 2026

Copilot started work on behalf of weslleycapelari May 13, 2026 16:52 View session

fix: resolve PR review feedback for mission/evals/workflow consistency

07e2925

Agent-Logs-Url: https://github.com/weslleycapelari/github-patterns/sessions/577201a0-68d1-468c-b38c-e3f9468cf6e8 Co-authored-by: weslleycapelari <28955078+weslleycapelari@users.noreply.github.com>

Copilot finished work on behalf of weslleycapelari May 13, 2026 17:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add evaluation cases for agent-factory, issue-ops-architect, and rule-distiller#3

Add evaluation cases for agent-factory, issue-ops-architect, and rule-distiller#3
weslleycapelari wants to merge 3 commits into
mainfrom
n5-locale-sync-setup

weslleycapelari commented May 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

weslleycapelari commented May 13, 2026

Uh oh!

Copilot AI commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		Why This Matters:

		- Post-N4 audit identified PT-BR desincronizado (typos in Spanish, stale L4 references)

Conversation

weslleycapelari commented May 13, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

weslleycapelari commented May 13, 2026

Uh oh!

Copilot AI commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants