Official repository for the paper
MathlibLemma: Folklore Lemma Generation and Benchmark for Formal Mathematics
Xinyu Liu, Zixuan Xie, Amir Moeini, Claire Chen, Shuze Daniel Liu, Yu Meng, Aidong Zhang, Shangtong Zhang
This repository contains MathlibLemma, a benchmark and proof library for folklore lemmas in Lean 4. The project studies automated folklore mining: discovering, filtering, formalizing, and proving missing intermediate results that are natural to mathematicians but not always available in Mathlib in reusable form.
At the paper level, MathlibLemma contributes:
- a benchmark of
4028non-trivial type-checked Lean statements - a screened proof library of
1506Lean-checked proofs - a modular pipeline organized around Discovery, Judge, Formalizer, and Prover modules
Reference:
MathlibLemma/the main released artifact viewMathlibLemma/Benchmark/the4028benchmark statements, organized intoFoundational,Applied, andAbstractMathlibLemma/Proved/ProofLibrary/the1506screened proofs used for the main proof-library countMathlibLemma/Proved/ByModel/model-specific solved outputsmain_pipeline/the main pipeline code for the core benchmark and proof-library workflowsource_batches/the three source batch trees underlying the main release artifactssupplementary_experiments/appendix-style follow-up materials, including the explicit-necessary-conditions ablation and the held-out residual provability check
Compared with the earlier public release, the main benchmark and proof-library artifacts now live under MathlibLemma/Benchmark/ and MathlibLemma/Proved/ProofLibrary/ rather than top-level benchmark/ and lemma/ folders.
This repository includes Lean project metadata (lean-toolchain, lakefile.toml, and lake-manifest.json) so the released files can be checked against a consistent Mathlib environment.
Typical setup:
- Install Lean 4 via
elan. - Clone the repository:
git clone https://github.com/Sequential-Intelligence-Lab/MathlibLemma.git cd MathlibLemma - Fetch cached Mathlib artifacts:
lake exe cache get
In practice, the numbered benchmark and proof artifacts are most naturally checked file-by-file. For example:
lake env lean MathlibLemma/Benchmark/Foundational/Data/Fintype/Basic/0.lean
lake env lean MathlibLemma/Proved/ProofLibrary/Foundational/Data/Fintype/Basic/0.leanThe repository also includes .lake/ to make the packaged dependency state easier to reproduce.
At the paper level, the framework is modular rather than monolithic. The four modules are:
- Discovery Module generates candidate folklore lemmas from Mathlib seed files
- Judge Module filters candidates semantically using LLM-as-a-judge
- Formalizer Module repairs syntax and type issues until statements become Lean-compilable
- Prover Module attempts Lean-checked proofs and applies the proof-bypass filter
The main implementation for this workflow is under main_pipeline/. The appendix-style follow-up scripts live under supplementary_experiments/supplementary_pipeline/.
A key contribution of this project is the discovery of missing lemmas that have been accepted into Mathlib. Representative merged pull requests include:
- Mathlib4 PR #32170
adds
gronwallBound_mono:gronwallBoundis monotone non-decreasing in timexgiven non-negative parameters - Mathlib4 PR #32167
adds
Kernel.restrict_const: restricting a constant kernel to a measurable set commutes with restricting the underlying measure - Mathlib4 PR #31985
adds
centralMoment_congr_ae: central moments agree for almost-everywhere-equal random variables
Contributed by Sequential Intelligence Lab (SIL), University of Virginia.
If you find this repository useful, please cite the paper:
@article{liu2026mathliblemma,
title={MathlibLemma: Folklore Lemma Generation and Benchmark for Formal Mathematics},
author={Xinyu Liu and Zixuan Xie and Amir Moeini and Claire Chen and Shuze Daniel Liu and Yu Meng and Aidong Zhang and Shangtong Zhang},
year={2026},
journal={arXiv preprint arXiv:2602.02561}
}This project is licensed under the Apache License 2.0; see LICENSE.