fix: slug->UUID canonicalisation in paper_flow + repair magic resolver structured-output schemas by keola808hunt-dot · Pull Request #78 · LLMQuant/quant-mind

keola808hunt-dot · 2026-06-04T22:54:49Z

Summary

Fixes two openai-agents SDK structured-output failures that prevented paper_flow and the magic natural-language resolver from running end-to-end. Both stem from QuantMind's rich Pydantic models (UUID-keyed trees, discriminated-union inputs, ModelSettings) meeting the SDK's strict-schema requirements.

1. `paper_flow` — 94 `uuid_parsing` errors (the crash)

With strict_json_schema=False, the model emits human-readable node-id slugs ("root", "intro", "methodology") into the UUID-typed Paper/TreeNode fields → 94 uuid_parsing validation errors. The slug tree is internally coherent (parent/child/citation refs all consistent) — the model's behaviour is correct for one-shot tree generation; it just collides with the storage type.

The domain UUID is load-bearing (tested node-id uniqueness, dict[UUID, TreeNode] JSON round-trips, stable dedup identity), so rather than weaken it to str, this adds a slug-tolerant extraction boundary:

quantmind/knowledge/_extraction.py — pure canonicalize_tree_ids() maps each distinct slug to one UUID and rewrites every id slot (nodes keys, node_id, parent_id, children_ids, root_node_id, citation anchors), passing values that are already UUIDs through unchanged.
PaperExtraction(Paper) — a model_validator(mode="before") running the canonicalizer; paper_flow defaults output_type to it. The domain model is untouched.

2. `magic` resolver — un-schemable `ModelSettings` + strict-mode rejection (latent)

ResolvedFlowConfig embeds BaseFlowCfg.model_settings: ModelSettings, whose callable fields cannot be JSON-schema'd (PydanticInvalidForJsonSchema), and the discriminated-union / knowledge schemas trip strict mode's additionalProperties guard. This was hidden because the existing resolver tests mock Runner.run.

Two layers, both required:

SkipJsonSchema on model_settings — it's an execution knob (set programmatically), never LLM-populated, so it is skipped during schema generation. Lets the schema build at all.
AgentOutputSchema(..., strict_json_schema=False) on the resolver output type — accepts the additionalProperties the union/knowledge models emit (same pattern as paper_flow).

3. Portable local-file provenance

_fetch_and_format records local paths via Path.as_posix() so provenance is consistent cross-platform (fixes a Windows-only test).

Tests & verification

New: tests/knowledge/test_extraction.py (slug→UUID + UUID passthrough); ResolverOutputSchemaTests (real, non-mocked schema build exercising both magic layers); a fallback-coverage test.
Updated: test_output_type_override_propagated and test_basemodel_renders_json_schema to the post-fix contracts; a non-strict wrap-guard assertion on the resolver.
Full suite: 238 passing. ruff and basedpyright clean.
Live verification (real models, not mocks): paper_flow on arXiv 2606.05138 (the exact paper that 94-errored) builds a valid Paper; the magic resolver via gpt-4o-mini parses natural language into a correct ArxivIdentifier + config. The strict-mode layer of the magic fix was caught by the live run — the mocked tests could not surface it.

Notes

No new dependencies (SkipJsonSchema ships with pydantic); the lockfile is unchanged.
DOI input remains an intentional NotImplementedError (documented follow-up).

🤖 Generated with Claude Code

…ed-output schemas Two openai-agents SDK structured-output failures stopped paper_flow and the magic NL resolver from running end-to-end. Both stem from QuantMind's rich Pydantic models meeting the SDK's strict-schema requirements. paper_flow (the crash): With strict_json_schema=False the model emits human-readable node-id slugs ("root", "intro") into UUID-typed Paper/TreeNode fields -> 94 uuid_parsing errors. The domain UUID is load-bearing (tested node-id uniqueness, UUID-keyed JSON round-trips, stable dedup identity), so rather than weaken it, new PaperExtraction(Paper) carries a mode="before" validator (canonicalize_tree_ids) that maps each distinct slug to one UUID and rewrites every id slot (nodes keys, node_id, parent_id, children_ids, root_node_id, citation anchors), passing values that are already UUIDs through unchanged. paper_flow defaults output_type to it; the domain model is untouched. magic resolver (latent, hidden by mocked-Runner tests): ResolvedFlowConfig embeds BaseFlowCfg.model_settings: ModelSettings, whose callable fields cannot be JSON-schema'd (PydanticInvalidForJsonSchema), and the discriminated-union/knowledge schemas trip strict mode's additionalProperties guard. Both layers fixed: SkipJsonSchema on model_settings (an execution knob, never LLM-set) + AgentOutputSchema(..., strict_json_schema=False) on the resolver output type (same pattern as paper_flow). Local-file provenance now uses Path.as_posix() for portable, cross-platform paths (fixes a Windows-only test). Tests: new test_extraction.py (slug->uuid + uuid passthrough); new ResolverOutputSchemaTests (real non-mocked schema build, both layers); repaired test_output_type_override_propagated + test_basemodel_renders_json_schema to the post-fix contracts; fallback + wrap guards added. Full suite 238/238; ruff + basedpyright clean. Verified by live-fire: paper_flow on arXiv 2606.05138 (the exact paper that 94-errored) and the magic resolver via real gpt-4o-mini. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

wanghaoxue0 requested a review from keli-wen June 5, 2026 11:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: slug->UUID canonicalisation in paper_flow + repair magic resolver structured-output schemas#78

fix: slug->UUID canonicalisation in paper_flow + repair magic resolver structured-output schemas#78
keola808hunt-dot wants to merge 1 commit into
LLMQuant:masterfrom
keola808hunt-dot:fix/quantmind-paper-extraction-and-magic-schema

keola808hunt-dot commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

keola808hunt-dot commented Jun 4, 2026

Summary

1. paper_flow — 94 uuid_parsing errors (the crash)

2. magic resolver — un-schemable ModelSettings + strict-mode rejection (latent)

3. Portable local-file provenance

Tests & verification

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. `paper_flow` — 94 `uuid_parsing` errors (the crash)

2. `magic` resolver — un-schemable `ModelSettings` + strict-mode rejection (latent)