feat(MachineLearning/PACLearning): add VersionSpace abstraction by Zetetic-Dhruv · Pull Request #592 · leanprover/cslib

Zetetic-Dhruv · 2026-05-22T06:49:36Z

Adds the classical version space abstraction (Mitchell 1982, Angluin 1980) as a companion to the PAC learning definitions from #492.

VersionSpace C S: the subset of C whose concepts agree with S on every sample point
versionSpace_subset, versionSpace_empty_sample: sanity lemmas
versionSpace_antitone: more data gives a smaller version space
IsConsistent A C: predicate on learners whose output always lies in the version space
mem_versionSpace_of_realizable, versionSpace_nonempty_of_realizable: realizable-case bridge

Foundation for downstream proofs (Sauer-Shelah, PAC lower bounds, infinite NFL).

Adds the classical version space abstraction (Mitchell 1982, Angluin 1980) to the PAC learning module. Version space — the subset of a concept class consistent with observed labeled data — is the structural substrate every sample complexity theorem operates on (Baby PAC, Sauer-Shelah, PAC lower bounds, NFL). This PR complements leanprover#492 by providing: - VersionSpace C S: the consistent subset of C given sample S - versionSpace_subset, versionSpace_empty_sample: sanity lemmas - versionSpace_antitone: structural monotonicity (more data → smaller VS), dual to sample complexity monotonicity - IsConsistent A C: predicate on learners (output always in version space) - IsConsistent.output_mem_conceptClass, output_consistent: consistent learner properties - mem_versionSpace_of_realizable: under realizable data, the target concept lies in the version space (the realizable-case bridge) - versionSpace_nonempty_of_realizable: corollary No measure theory, no new Mathlib dependencies, ~150 lines, 0 sorry. Together with leanprover#492 these suffice to state the Baby PAC theorem, Sauer-Shelah sample complexity, PAC lower bounds, and NFL as structural statements rather than ad-hoc computations.

Zetetic-Dhruv · 2026-05-22T07:35:45Z

@SamuelSchlesinger

SamuelSchlesinger

I want to see the bridge that makes this useful: version-space membership is exactly zero empirical error. I think that is what will motivate this PR and make the downstream work easier. The stuff you do have is sound and I appreciate this PR overall, my requests are mostly just stylistic commentary and generalization requests.

SamuelSchlesinger · 2026-05-22T08:41:49Z

+/-- *Realizable version-space nonemptiness.* If a target concept `c` lies in
+`C` and the sample `S` is labeled by `c` (i.e. every `(S i).2 = c (S i).1`),
+then `c` itself lies in the version space `VersionSpace C S`. -/
+theorem mem_versionSpace_of_realizable {m : ℕ} {C : ConceptClass α β}


Lets take c implicit here to match the following theorem.

I also think something like:

theorem ae_mem_versionSpace_of_realizable [MeasurableSpace α] [MeasurableSpace β] {C : ConceptClass α β} {c : α → β} (hc : c ∈ C) (hcm : Measurable c) (P : Measure α) [IsProbabilityMeasure P] (m : ℕ) : ∀ᵐ S : LabeledSample α β m ∂(Measure.pi (fun _ : Fin m => P.map (fun x => (x, c x)))), c ∈ VersionSpace C S := ...

might be desirable here?

SamuelSchlesinger · 2026-05-22T08:41:54Z

+
+@[expose] public section
+
+/-! # Version Space


Lets move this above the section to match the rest of the files in cslib.

SamuelSchlesinger · 2026-05-22T08:42:35Z

+
+/-- A consistent learner's output agrees with the sample on every observed
+point. -/
+theorem IsConsistent.output_consistent {m : ℕ} {A : Learner α β m}


Lets rename this to output_agrees or just output cause Consistent is already in the namespace.

SamuelSchlesinger · 2026-05-22T08:54:10Z

+  pages        = {203--226},
+  year         = {1982},
+  doi          = {10.1016/0004-3702(82)90040-6}
+}


I think we should cite "Version Spaces: A Candidate Elimination Approach to Rule Learning" instead, or at least in addition.

SamuelSchlesinger · 2026-05-22T08:55:25Z

+    (S : LabeledSample α β n) :
+    VersionSpace C S ⊆ VersionSpace C (fun i => S (Fin.castLE hmn i)) := by
+  intro h hh
+  exact ⟨hh.1, fun i => hh.2 (Fin.castLE hmn i)⟩


Lets generalize this, something like:

theorem versionSpace_reindex {m n : ℕ} (f : Fin m → Fin n) (C : ConceptClass α β) (S : LabeledSample α β n) : VersionSpace C S ⊆ VersionSpace C (S ∘ f) := fun _ hh => ⟨hh.1, fun i => hh.2 (f i)⟩

this theorem pops out with f := Fin.castLE hmn.

Zetetic-Dhruv requested review from chenson2018 and fmontesi as code owners May 22, 2026 06:49

SamuelSchlesinger suggested changes May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(MachineLearning/PACLearning): add VersionSpace abstraction#592

feat(MachineLearning/PACLearning): add VersionSpace abstraction#592
Zetetic-Dhruv wants to merge 1 commit into
leanprover:mainfrom
Zetetic-Dhruv:version-space-v2

Zetetic-Dhruv commented May 22, 2026

Uh oh!

Zetetic-Dhruv commented May 22, 2026

Uh oh!

SamuelSchlesinger left a comment

Uh oh!

SamuelSchlesinger May 22, 2026

Uh oh!

SamuelSchlesinger May 22, 2026

Uh oh!

SamuelSchlesinger May 22, 2026

Uh oh!

SamuelSchlesinger May 22, 2026

Uh oh!

SamuelSchlesinger May 22, 2026

Uh oh!

SamuelSchlesinger May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Zetetic-Dhruv commented May 22, 2026

Uh oh!

Zetetic-Dhruv commented May 22, 2026

Uh oh!

SamuelSchlesinger left a comment

Choose a reason for hiding this comment

Uh oh!

SamuelSchlesinger May 22, 2026

Choose a reason for hiding this comment

Uh oh!

SamuelSchlesinger May 22, 2026

Choose a reason for hiding this comment

Uh oh!

SamuelSchlesinger May 22, 2026

Choose a reason for hiding this comment

Uh oh!

SamuelSchlesinger May 22, 2026

Choose a reason for hiding this comment

Uh oh!

SamuelSchlesinger May 22, 2026

Choose a reason for hiding this comment

Uh oh!

SamuelSchlesinger May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants