Skip to content

science: sync structured systematization output#111

Open
jakepresent wants to merge 1 commit into
mainfrom
jake/science-sync-systematization
Open

science: sync structured systematization output#111
jakepresent wants to merge 1 commit into
mainfrom
jake/science-sync-systematization

Conversation

@jakepresent

Copy link
Copy Markdown
Collaborator

Summary

  • Syncs the latest science-side systematization shape from Omni into ASSERT while preserving ASSERT's current behavior/taxonomy terminology.
  • Adds the validation-criteria rubric to the systematization prompt and requires the model to return structured validation scores alongside the behavior spec.
  • Stores the structured systematization JSON directly, feeds that structured artifact into taxonomy conversion, and updates the viewer systematization modal to read source patterns / validation from the new artifact shape.

Science-side comparison

Recent Omni omni/measurements changes since May 16 were:

  • e5025090 systematization: literature-grounded retrieval + validation criteria + structured output shape
  • 36813a2f viewer split of policy violation by permissibility
  • cb5b2874 wording clarification for policy violation on permissible requests

The viewer/permissibility split is already covered separately by #94. This PR ports the applicable systematization delta. It intentionally does not copy Omni wholesale because Omni's branch has different concept/policy naming and science-only repo assumptions.

Validation

  • /home/jakepresent/git/adaptive-eval-ms-import/.venv/bin/python -m pytest tests/test_systematization_stage.py tests/test_systematization_convert_stage.py tests/test_viewer_run_page_server.py -q
  • npm --prefix viewer run check reports only pre-existing unrelated diagnostics in PrimerDropdown.svelte, PrimerPagination.svelte, routes/+page.svelte, and routes/new/+page.svelte; no diagnostics in touched viewer files.
  • git diff --check

Copilot AI review requested due to automatic review settings May 28, 2026 18:09

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR syncs ASSERT’s “systematization” stage and viewer to a newer science-side structured JSON output shape, adding validation scoring alongside the behavior spec and updating taxonomy conversion and UI to consume the new fields.

Changes:

  • Update the systematization prompt + runtime to produce/validate a structured JSON artifact (including concept_spec patterns and validation scores) and write it directly.
  • Update systematization→taxonomy conversion to consume the structured artifact (instead of Markdown + summary_items) and embed the JSON into the converter prompt.
  • Update viewer pages/modal to display pattern counts and render systematization sections (including validation) from the new artifact shape.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
viewer/src/routes/suite/[suite_id]/+page.svelte Switches suite header count from legacy summary_items to structured concept_spec.patterns and updates label text.
viewer/src/routes/suite/[suite_id]/[run_id]/+page.svelte Improves results header layout responsiveness and uses axis labels for grouping controls.
viewer/src/lib/SystematizationModal.svelte Updates modal to read structured patterns and validation, and renders a synthesized Markdown view of key sections.
tests/test_systematization_stage.py Updates stage tests/fixtures for structured systematization artifacts and new validation requirements.
tests/test_systematization_convert_stage.py Updates conversion tests/fixtures to reflect structured artifact ingestion and prompt expectations.
prompts/validation_criteria.md Adds the validation rubric content to be embedded into the systematization prompt.
prompts/systematization_single.md Extends the output contract to include validation + nested slot components and adds deep-research/validation requirements.
p2m/stages/systematization.py Implements structured schema/models, prompt assembly with embedded validation rubric, and stricter structured validation.
p2m/stages/systematization_convert.py Loads/validates structured systematization JSON and feeds it (minus meta) into taxonomy conversion.

SYSTEMATIZATION_PROMPT = load_prompt_text("systematization_single.md")
VALIDATION_CRITERIA_PROMPT = load_prompt_text("validation_criteria.md")
SYSTEMATIZATION_PROMPT = load_prompt_text("systematization_single.md").replace(
"{validation_criteria}", VALIDATION_CRITERIA_PROMPT

@jakepresent jakepresent May 28, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest push by replacing the exact double-braced validation_criteria token and updating the prompt assertion. Re-ran the targeted systematization/viewer tests, 14 passed, plus git diff --check.

Comment on lines +145 to +150
for index, lens in enumerate(parsed.stakeholder_lenses):
_require_nonempty(lens.label, f"stakeholder_lenses[{index}].label")
_require_nonempty(lens.expertise, f"stakeholder_lenses[{index}].expertise")
if not parsed.validation:
raise ValueError("systematization validation must include at least one item")
for index, item in enumerate(parsed.validation):

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest push. Validation now requires exactly the six utility attributes, checks them case-insensitively, and rejects missing, duplicate, or unexpected attributes. Added focused missing/duplicate tests and re-ran the systematization/viewer targeted tests.

Comment on lines +76 to +83
function systematizationPatternCount(systematization: Record<string, unknown> | null | undefined): number {
const conceptSpec = systematization?.concept_spec;
if (!conceptSpec || typeof conceptSpec !== 'object' || Array.isArray(conceptSpec)) return 0;
const patterns = (conceptSpec as Record<string, unknown>).patterns;
return Array.isArray(patterns) ? patterns.length : 0;
}

let summaryItemCount = $derived(systematizationPatternCount(data.systematization));

@jakepresent jakepresent May 28, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest push by renaming the derived value to patternCount, matching the structured concept_spec.patterns source used by the suite header.

Comment on lines 221 to +230
model_cfg=ModelConfig(name="azure/gpt-5.4"),
)

async def test_run_systematization_to_taxonomy_rejects_missing_systematization(self) -> None:
async def test_run_systematization_to_taxonomy_rejects_missing_systematization_field(self) -> None:
with TemporaryDirectory() as tmp_dir:
tmp_path = Path(tmp_dir)
systematization_path = tmp_path / "systematization.json"
systematization_path.write_text(
json.dumps(
{
"behavior": "Harmful advice",
"summary_items": [],
}
),
encoding="utf-8",
)
payload = _structured_systematization()
del payload["validation"]
systematization_path.write_text(json.dumps(payload), encoding="utf-8")

@jakepresent jakepresent May 28, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest push by renaming the test to test_run_systematization_to_taxonomy_rejects_missing_validation, since the fixture intentionally deletes the validation field.

@jakepresent jakepresent force-pushed the jake/science-sync-systematization branch from 296520b to 309f361 Compare May 29, 2026 22:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants