release: v0.2.0#3
Merged
Merged
Conversation
Minor bump capturing the post-v0.1.0 expansion landed in #2: - Two new API backends: GeminiBackend + MistralBackend (schema-tier enforcement mirroring the OpenAI pattern). - GROBID opt-in PDF extractor: `Source.from_pdf(..., extractor="grobid")`. - ALCE-flavoured benchmark harness (toy subset + real-JSONL support). - Live OpenAI + Anthropic integration test suite (env-gated). - HF Space deploy automation. - PREPRINT.md + community files (CODE_OF_CONDUCT, SECURITY, issue templates). - Cover + four thread images rendered by benchmarks/generate_cover.py. - Multiprompt sweep: 3→5 seeds, still 0.0 ± 0.0 fabrication. - Smaller-NLI calibration (Finding 4b — DeBERTa-base caps at F1 0.63). Dep floors raised (narrow impact, no §10 contract changes): - `torch>=2.4` → `torch>=2.8` (CI cp313 wheel resolution) - `mistralai>=1.0` → `>=2.0` (only relevant to the new extra) Three §10 contracts (grammar shape, Source.metadata CSL shape, GenerationResult/VerificationReport schema_version) all untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
Three modest wins on the ~3-7min test-matrix cells. All lossless w/r/t correctness; the trade is a bigger git repo (1.4MB uv.lock). **1. Commit `uv.lock`.** Previously gitignored, which meant every CI run resolved dependencies from scratch against PyPI. Committed now so: - Resolution is deterministic — same run twice, same wheels, no drift from upstream floor-version updates. - Cache keys stabilize (see #2 below). - `uv sync --locked` can assert the lock matches pyproject.toml as a config-drift tripwire (see #3). Libraries have historically not committed lockfiles because users consume floors, not locks — but astral now recommends committing them even for libraries (the lockfile is CI's dependency graph, not the users'). Modern uv-based projects do. **2. Add `uv.lock` to the cache-dependency-glob** on all three jobs so setup-uv's cache invalidates precisely on lockfile bumps rather than generously on any pyproject edit. **3. `uv sync --locked` everywhere.** Enforces the lock is in sync with pyproject on every CI run (fails loudly if someone edits pyproject without regenerating). Side effect: resolution work is skipped. **4. CPU-only torch on Linux test jobs.** The default torch wheel ships with CUDA (~800MB); the CPU-only variant is ~200MB. CI has no GPU anyway, so this is a pure ~600MB / ~60s download-and-extract saving per Linux matrix cell. Configured via env: `UV_EXTRA_INDEX_URL= https://download.pytorch.org/whl/cpu` set only when `runner.os == 'Linux'`. macOS keeps the standard wheel (no CUDA variant there anyway). Top-level `UV_INDEX_STRATEGY=unsafe-best-match` so uv falls back to PyPI for everything else on the index. Expected test-job runtime after (rough estimates, py3.12 Linux cell): - before: ~3m47s (install ~3m, test ~30s) - after: ~2m20s (install ~1m40s cold / ~15s cached, test ~30s) - warm rerun on same PR: ~1m30s (cache hits dominate). None of the three §10 contracts change. Only CI config + the lockfile addition. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The v0.2 post should lead on three-tier coverage (v0.2 is the release that shipped Gemini + Mistral, bringing the total API surface to the seven-backend count). The previous multi-model image showed Qwen + Phi-3.5-mini + GPT-4o-mini — two locals + one API, which undersells the cross-tier story when the whole point of v0.2 is that the contract holds across enforcement mechanisms. New column lineup: - Qwen 2.5 0.5B — local · logit-mask (XGrammar) - GPT-4o-mini — API · schema-layer (strict JSON) - Claude Haiku 4.5 — API · provider-native (Citations) Each column is visually distinct (blue / orange / green) and reads as one representative per enforcement tier. Subtitle rewritten to name the tiers explicitly so a viewer catches the "three mechanisms, one contract" in one glance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Minor version bump covering the post-v0.1.0 expansion already merged via #2, plus folded-in CI perf tuning and a post-announcement image polish. Merging this one PR ships all of it at once.
Why
minor(not patch)GeminiBackend,MistralBackend) — new API surface.Source.from_pdf(..., extractor="grobid")keyword path.benchmarks/alce_subset.pyharness.Why not
majorSource.metadataCSL shape,GenerationResult/VerificationReportschema_version) are all untouched.torch>=2.4→>=2.8,mistralai>=1.0→>=2.0) are narrow — the mistralai floor only applies to a new extra; the torch raise is a CI-correctness move past a known-bad wheel window.What's folded in
Previously three separate PRs, consolidated here so merging this one PR ships the full v0.2.0:
c77dcc8 release: v0.2.0e3d9955 ci: commit uv.lock + cpu-only torch + --locked installs2991cc9 docs(multi-image): swap Phi for Claude Haikud7589d2 changelog: fold CI + multi-image updates into v0.2.0 sectionRelease checklist
make lint && make test && make docs-buildgreen locallymake release-checkbuilds sdist + wheel cleanly (0.2.0)uv sync --locked --extra dev --extra allsucceeds (lockfile consistent)[0.2.0]section (dated 2026-04-24) with CI + docs bullets folded in_version.pymatches the target tag (0.2.0)/contract-checkclean--locked+ CPU-only torch config)Test plan
release.ymlpublishes to PyPI via OIDCPost-merge
Tag + push:
🤖 Generated with Claude Code