Legibility

Which models are illegible under what conditions, and why? How does that impact monitorability?

Inspect View

Evaluation logs are viewable at: https://safety-research.github.io/legibility

This site is auto-deployed via GitHub Pages using inspect view bundle on every push to main.

Experiments

2026/15-4-2026 — CoT Legibility Phase 1: Classification

Systematic evaluation of whether external "reader" models can follow chain-of-thought reasoning produced by "generator" models. Tests whether reasoning legibility is a property of the CoT itself vs. the reader's capabilities.

Pipeline:

Step 1 — CoT Generation: 3 generators produce K=6 CoTs per question on GPQA-Diamond + MATH-500
Step 2 — Reader Evaluation: 4 readers evaluate under 3 conditions (self-CoT, crossfill, no-CoT)
Step 3 — Classification: each CoT classified as ANSWER_LEAKED, REASONING_LEGIBLE, or ILLEGIBLE

See experiments/2026/15-4-2026/SPEC.md for the full protocol.

Structure

experiments/
  2026/
    15-4-2026/
      eval.py        # @task definitions + eval_set() orchestration
      config.py      # Model IDs, constants, log directories
      data.py        # Dataset loading + CoT extraction from logs
      solvers.py     # Custom solvers: cot_generation, crossfill, self_cot, no_cot
      scorers.py     # Custom scorers: generator/reader correctness
      classify.py    # Post-processing: 3-tier classification + validation
      plot.py        # Visualization of classification results
      logs/          # Inspect eval logs (tracked with Git LFS)

Development

pip install inspect-ai datasets scikit-learn matplotlib numpy

# Run Step 1 (CoT generation)
cd experiments/2026/15-4-2026
inspect eval-set eval.py@cot_gen_G1 eval.py@cot_gen_G2 eval.py@cot_gen_G3 \
  --log-dir logs/step1_generation

# View logs locally
inspect view --log-dir logs --recursive

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
experiments/2026/15-4-2026		experiments/2026/15-4-2026
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.secrets.baseline		.secrets.baseline
LICENSE		LICENSE
README.md		README.md
ruff.toml		ruff.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Legibility

Inspect View

Experiments

2026/15-4-2026 — CoT Legibility Phase 1: Classification

Structure

Development

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Legibility

Inspect View

Experiments

2026/15-4-2026 — CoT Legibility Phase 1: Classification

Structure

Development

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages