Skip to content

release: v0.2.0#3

Merged
random-walks merged 4 commits into
mainfrom
release/v0.2.0
Apr 24, 2026
Merged

release: v0.2.0#3
random-walks merged 4 commits into
mainfrom
release/v0.2.0

Conversation

@random-walks

@random-walks random-walks commented Apr 24, 2026

Copy link
Copy Markdown
Owner

Summary

Minor version bump covering the post-v0.1.0 expansion already merged via #2, plus folded-in CI perf tuning and a post-announcement image polish. Merging this one PR ships all of it at once.

Why minor (not patch)

  • Two new backends (GeminiBackend, MistralBackend) — new API surface.
  • New Source.from_pdf(..., extractor="grobid") keyword path.
  • New benchmarks/alce_subset.py harness.

Why not major

  • The three §10 contracts (grammar shape, Source.metadata CSL shape, GenerationResult/VerificationReport schema_version) are all untouched.
  • Dep-floor raises (torch>=2.4>=2.8, mistralai>=1.0>=2.0) are narrow — the mistralai floor only applies to a new extra; the torch raise is a CI-correctness move past a known-bad wheel window.

What's folded in

Previously three separate PRs, consolidated here so merging this one PR ships the full v0.2.0:

Commit Origin PR Gist
c77dcc8 release: v0.2.0 #3 (this) Version bump + CHANGELOG promote
e3d9955 ci: commit uv.lock + cpu-only torch + --locked installs was #4 (closed) CI speedup — ~3m47s → ~2m20s per Linux test cell
2991cc9 docs(multi-image): swap Phi for Claude Haiku was #5 (closed) Multi-model thread image shows one backend per enforcement tier
d7589d2 changelog: fold CI + multi-image updates into v0.2.0 section this CHANGELOG reflects the folded-in work

Release checklist

  • make lint && make test && make docs-build green locally
  • make release-check builds sdist + wheel cleanly (0.2.0)
  • uv sync --locked --extra dev --extra all succeeds (lockfile consistent)
  • CHANGELOG.md has the [0.2.0] section (dated 2026-04-24) with CI + docs bullets folded in
  • _version.py matches the target tag (0.2.0)
  • No §10 contracts touched — /contract-check clean
  • CI green on this PR (pending — with the new --locked + CPU-only torch config)

Test plan

  • CI green (this is also the first real-world test of the new CI config — the CI run itself should take materially less time than previous runs on this repo)
  • After merge, tag v0.2.0 on main → release.yml publishes to PyPI via OIDC

Post-merge

Tag + push:

git checkout main && git pull
git tag v0.2.0 && git push --tags

🤖 Generated with Claude Code

Minor bump capturing the post-v0.1.0 expansion landed in #2:

- Two new API backends: GeminiBackend + MistralBackend (schema-tier
  enforcement mirroring the OpenAI pattern).
- GROBID opt-in PDF extractor: `Source.from_pdf(..., extractor="grobid")`.
- ALCE-flavoured benchmark harness (toy subset + real-JSONL support).
- Live OpenAI + Anthropic integration test suite (env-gated).
- HF Space deploy automation.
- PREPRINT.md + community files (CODE_OF_CONDUCT, SECURITY, issue
  templates).
- Cover + four thread images rendered by benchmarks/generate_cover.py.
- Multiprompt sweep: 3→5 seeds, still 0.0 ± 0.0 fabrication.
- Smaller-NLI calibration (Finding 4b — DeBERTa-base caps at F1 0.63).

Dep floors raised (narrow impact, no §10 contract changes):
- `torch>=2.4` → `torch>=2.8`   (CI cp313 wheel resolution)
- `mistralai>=1.0` → `>=2.0`    (only relevant to the new extra)

Three §10 contracts (grammar shape, Source.metadata CSL shape,
GenerationResult/VerificationReport schema_version) all untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
random-walks and others added 3 commits April 24, 2026 00:28
Three modest wins on the ~3-7min test-matrix cells. All lossless w/r/t
correctness; the trade is a bigger git repo (1.4MB uv.lock).

**1. Commit `uv.lock`.** Previously gitignored, which meant every CI run
resolved dependencies from scratch against PyPI. Committed now so:
- Resolution is deterministic — same run twice, same wheels, no
  drift from upstream floor-version updates.
- Cache keys stabilize (see #2 below).
- `uv sync --locked` can assert the lock matches pyproject.toml
  as a config-drift tripwire (see #3).

Libraries have historically not committed lockfiles because users
consume floors, not locks — but astral now recommends committing them
even for libraries (the lockfile is CI's dependency graph, not the
users'). Modern uv-based projects do.

**2. Add `uv.lock` to the cache-dependency-glob** on all three jobs so
setup-uv's cache invalidates precisely on lockfile bumps rather than
generously on any pyproject edit.

**3. `uv sync --locked` everywhere.** Enforces the lock is in sync with
pyproject on every CI run (fails loudly if someone edits pyproject
without regenerating). Side effect: resolution work is skipped.

**4. CPU-only torch on Linux test jobs.** The default torch wheel ships
with CUDA (~800MB); the CPU-only variant is ~200MB. CI has no GPU
anyway, so this is a pure ~600MB / ~60s download-and-extract saving
per Linux matrix cell. Configured via env: `UV_EXTRA_INDEX_URL=
https://download.pytorch.org/whl/cpu` set only when
`runner.os == 'Linux'`. macOS keeps the standard wheel (no CUDA
variant there anyway). Top-level `UV_INDEX_STRATEGY=unsafe-best-match`
so uv falls back to PyPI for everything else on the index.

Expected test-job runtime after (rough estimates, py3.12 Linux cell):
- before:  ~3m47s  (install ~3m, test ~30s)
- after:   ~2m20s  (install ~1m40s cold / ~15s cached, test ~30s)
- warm rerun on same PR: ~1m30s (cache hits dominate).

None of the three §10 contracts change. Only CI config + the lockfile
addition.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The v0.2 post should lead on three-tier coverage (v0.2 is the release
that shipped Gemini + Mistral, bringing the total API surface to the
seven-backend count). The previous multi-model image showed Qwen +
Phi-3.5-mini + GPT-4o-mini — two locals + one API, which undersells
the cross-tier story when the whole point of v0.2 is that the
contract holds across enforcement mechanisms.

New column lineup:

- Qwen 2.5 0.5B       — local · logit-mask (XGrammar)
- GPT-4o-mini         — API · schema-layer (strict JSON)
- Claude Haiku 4.5    — API · provider-native (Citations)

Each column is visually distinct (blue / orange / green) and reads as
one representative per enforcement tier. Subtitle rewritten to name
the tiers explicitly so a viewer catches the "three mechanisms, one
contract" in one glance.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@random-walks random-walks merged commit e09e49e into main Apr 24, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant