Skip to content

feat(okf): add a validate subcommand checking a bundle against SPEC §9#125

Open
mftnakrsu wants to merge 1 commit into
GoogleCloudPlatform:mainfrom
mftnakrsu:feat/okf-conformance-validate
Open

feat(okf): add a validate subcommand checking a bundle against SPEC §9#125
mftnakrsu wants to merge 1 commit into
GoogleCloudPlatform:mainfrom
mftnakrsu:feat/okf-conformance-validate

Conversation

@mftnakrsu

@mftnakrsu mftnakrsu commented Jun 20, 2026

Copy link
Copy Markdown

Implements the §9 conformance runner requested in #62 — a credential-free, offline validator for OKF bundles.

Rebased onto main after the enrichment_agentreference_agent package rename (#129); the runner now lives under reference_agent/ and is invoked as python -m reference_agent validate.

What

A new validate subcommand and a small bundle/conformance.py library that check an on-disk bundle against the machine-checkable SPEC §9 rules:

  • §9.1 — every non-reserved .md has a parseable YAML frontmatter block
  • §9.2 — every frontmatter has a non-empty type
  • §6/§11 — reserved index.md files carry no frontmatter except a bundle-root okf_version
python -m reference_agent validate --bundle ./bundles/<name>          # §9
python -m reference_agent validate --bundle ./bundles/<name> --strict # + §4.1 keys

Prints one path: [rule] message line per violation and exits non-zero if any are found (drops cleanly into CI).

Scope of the conformance claim

§9.3 also covers log.md date headings (§7) and index.md body sections (§6). These are not validated — faithfully telling a real ## heading apart from fenced-code content needs a full CommonMark parser, which is out of scope for this dependency-light checker. So on success the tool reports exactly what it checked:

OK: … — no OKF v0.1 violations found (§9.1, §9.2, index.md §6/§11)

rather than asserting full v0.1 conformance.

Design notes

  • Reuses OKFDocument.parse and adds no new dependencies (pyyaml already required).
  • A frontmatter block's presence is checked explicitly, because parse returns an empty mapping for a no-block file — so §9.1 ("no block") is distinguished from §9.2 ("empty/missing type") without changing parse's behavior.
  • Robust against bad input, so the validator never crashes on what it validates:
    • an out-of-range frontmatter value (e.g. timestamp: 2026-13-45) makes PyYAML raise a bare ValueError; it is caught and reported as §9.1 instead of propagating;
    • a directory or broken symlink whose name ends in .md is skipped, not read as a file (it no longer produces a spurious failure);
    • a bad --bundle path prints a one-line error and exits 2 (usage error) instead of dumping a traceback.
  • Intentionally stricter to detect than SPEC §9's permissive consumption model — a validator surfaces problems; a consumer must tolerate them. The producer-level bar (title/description/timestamp) is opt-in via --strict.

Tests

  • okf/tests/test_conformance.py — all three committed bundles pass clean (guards against future drift), plus a fixture per violation type, --strict, the bad-timestamp case, a directory named .md, a non-string type, and the CLI bad-path exit code.
  • Full okf suite: pytest54 passed.

Addresses #62.

🤖 Generated with Claude Code

@mftnakrsu mftnakrsu force-pushed the feat/okf-conformance-validate branch 2 times, most recently from 4b4359f to ece28c8 Compare June 20, 2026 20:47
@mftnakrsu mftnakrsu marked this pull request as ready for review June 20, 2026 20:48
@mftnakrsu mftnakrsu force-pushed the feat/okf-conformance-validate branch from ece28c8 to 4aae068 Compare June 20, 2026 20:56
A credential-free, offline conformance runner for OKF bundles (addresses GoogleCloudPlatform#62).

Adds reference_agent/bundle/conformance.py and a `validate` CLI subcommand
that check an on-disk bundle against the machine-checkable SPEC §9 rules:
§9.1 (parseable YAML frontmatter), §9.2 (non-empty `type`), and the hard
index.md rule (§6/§11). Prints one `path: [rule] message` line per
violation and exits non-zero, so it drops into CI. `--strict` additionally
enforces the producer-level recommended keys (§4.1).

log.md date headings (§7) and index.md body sections (§6) are out of scope
(they need a full CommonMark parser), so success reports exactly what was
checked rather than asserting full v0.1 conformance.

Reuses OKFDocument.parse, adds no new dependencies. Tests cover all three
committed bundles plus a fixture per violation type; full okf suite: 54 passed.
@mftnakrsu mftnakrsu force-pushed the feat/okf-conformance-validate branch from 4aae068 to 044ed63 Compare June 21, 2026 12:08
@mftnakrsu

Copy link
Copy Markdown
Author

@amirhormati when you have a moment, would you be able to review? This adds the SPEC §9 validate subcommand (addresses #62). I rebased it onto the latest main after the enrichment_agentreference_agent rename — checks are green and it's mergeable. Thanks!

@lirik173

Copy link
Copy Markdown

Really like this framing - "a validator surfaces problems; a consumer must tolerate them" That line quietly does more than §9: it's also the natural home for a trust check, and I don't think anyone's pulled that thread yet.

Here's the itch. A consumed bundle is untrusted input, but a consumer has nothing to check it against. Not integrity -- is this byte-for-byte what the producer shipped? Not provenance - who asserts this concept, and from where? Identity is just the path (§2); there's no manifest, no signature, no author binding. So a Playbook whose body carries a one-line — the kind a human reviewer skims past but the model runs — sails straight through a conformant consumer. Indirect prompt injection at the knowledge layer, and the format gives you no way to even notice.

So I prototyped the trust sibling to your validate: an opt-in verify. Per-document sha256 manifest + optional ed25519 signature + an optional provenance: block — all additive, a §11 minor bump, default report-only so permissive consumption stays exactly as it is. I ran it against the bundles/stackoverflow already checked into this repo: it signs and verifies all 53 concepts clean, then catches a single tampered one under --strict. Needle, meet haystack.

One thing I'll say plainly so it isn't oversold: an in-bundle hash manifest is tamper-evidence, not tamper-proofing -- real authenticity still needs a trusted issuer key out-of-band. But that's exactly the gap: today the format has nowhere to carry the hash or the signature, so no trust root (Sigstore, a key registry, signed tags) can be bolted on interoperably. The carrier is the missing primitive; the root is a layer above it.

I didn't want to derail your PR - so the real question is for the maintainers: is provenance/integrity something OKF wants as an optional layer, or deliberately left to the consuming system, the way serving is? If there's appetite I'll take it to its own issue with the threat model and the verifier, framed as a sibling to this. Either way - nice work on validate; the validator-stricter-than-consumer split is the right backbone

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants