Skip to content

raykuo18/2025Spring_AI_Project

Repository files navigation

Adaptive Chess Tutoring

Efficient LLM Reasoning via Programmatic Supervision and Cognitive Distillation CSE 537 (Artificial Intelligence) — Stony Brook University, Spring 2025 Authors: Shang-Jui (Ray) Kuo, Adebayo Braimah · Instructor: Prof. Niranjan Balasubramanian

A locally-deployable chess tutor that combines move prediction and human-like explanation in a single 1.1B-parameter LLM (TinyLlama-1.1B-Chat) via two LoRA adapters trained sequentially:

  • Phase 1 — Programmatic Supervision. ~60K rule-grounded samples generated from Lichess PGN data using python-chess as a verifier. Teaches chess fundamentals: legal moves, piece identification, attacked squares, comment parsing, next-move prediction.
  • Phase 2 — Cognitive Distillation. ~10K explanation samples distilled from a Mixtral-8x7B-Instruct teacher. Trains a second LoRA adapter on top of a frozen Phase 1 adapter, blended at inference via PEFT's add_weighted_adapter with tunable (α, β) coefficients.

poster


Reproduction status (NuWulf cluster, May 2026)

A full reproduction was performed in May 2026 on Stony Brook's NuWulf HPC cluster (NVIDIA H200 sm_90 GPUs, SLURM). 47 evaluations ran across α, β, combination_type, BERTScore backbone, and training recipe.

What reproduces

  • Base BERTScore F1 baseline (claimed 0.4744 / measured 0.4773, Δ = 0.003)
  • Phase 1 chess fundamentals. can_piece_move 2.9 % → 94.2 %; is_square_attacked 31.8 % → 49.7 %; list_legal_moves F1 0 → 0.23.
  • α/β trade-off curve. Smooth, monotone, 17-point: F1 drops 0.480 → 0.357 as α grows 0.25 → 1.5, while SSD@1 (Stockfish Score Delta, lower = better moves) ranges 238 → 727 cp — a 3× spread. The dual-LoRA design enables a real, characterizable trade-off between move quality and explanation similarity.

What does NOT reproduce

  • The "Phase 2 BERTScore F1 0.4744 → 0.5891" claim. Peak F1 across our full grid (any α, β, any combination_type ∈ {linear, svd, ties, dare_linear, dare_ties, cat}, both training recipes 5-ep and 10-ep, both BERTScore backbones {deberta-xlarge-mnli, roberta-large}) is 0.4801 at (α=0.25, β=1.5). The Phase 2 adapter does not push BERTScore above base.

See REPORT.md for the full writeup and docs/PROJECT_SUMMARY.md for the claim-by-claim verdict matrix.


Quick start

1. Clone + environment

git clone git@github.com:raykuo18/2025Spring_AI_Project.git
cd 2025Spring_AI_Project

# Create a fresh conda env (~5 min) and pin the working stack.
# peft 0.13.2 keeps PeftModel.add_weighted_adapter delegation (newer PEFT drops it).
conda create -n chess-tutor python=3.11 -y
conda activate chess-tutor
pip install --index-url https://download.pytorch.org/whl/cu121 'torch>=2.1.0'
pip install -e .                    # installs chess_tutor + all pinned deps from pyproject.toml
# Optional: GUI demo + Llama-2 GPTQ benchmark
pip install -e '.[gui,benchmarks]'

If you don't want to install the package, the entry points also work via PYTHONPATH=src (already set by scripts/env.sh).

2. Set up your data + model paths

Edit scripts/env.sh — point $PROJ at a large-storage location for the HF model cache (~100 GB for Mixtral 4-bit), Lichess raw dumps, trained adapters, and eval outputs. Put your HF token in .hf_token (gitignored). Required model: TinyLlama (public) and Mixtral-8x7B-Instruct (gated — accept the license at HuggingFace first).

3. End-to-end pipeline

# Stage 1 — download Stockfish, TinyLlama, Mixtral (~100 GB, ~15 min)
sbatch scripts/download_mixtral.sh
# (run TinyLlama download separately; it's small)

# Stage 2 — download + parse Lichess broadcasts, generate Phase 1/2 data
sbatch scripts/full_data_gen.sh                     # ~12 min

# Stage 3 — Phase 1 fine-tuning (60K samples × 3 epochs)
sbatch scripts/phase1_full.sh                       # ~8.5 h on 1 H200

# Stage 4 — Phase 2 distillation (Mixtral teacher on 10K prompts)
sbatch scripts/phase2_distill.sh                    # ~4.5 h on 1 H200

# Stage 5 — Phase 2 LoRA training (frozen P1 + trainable P2)
sbatch scripts/phase2_full.sh                       # ~30 min on 1 H200

# Stage 6 — full α/β eval sweep (base + P1 + P2 + 9-cell grid)
sbatch scripts/eval_full_sweep.sh                   # ~2 h on 1 H200

Smoke runs (~5 min each) for plumbing validation: scripts/phase1_smoke.sh, scripts/phase2_smoke.sh, scripts/eval_smoke.sh.

See docs/REPRO_LOG.md for the full reproduction log, including bugs encountered.


Repository layout

.
├── README.md, REPORT.md             ← intro + full results writeup
├── AGENT.md, CLUSTER_POLICY.md      ← original mission brief + SLURM routing
├── pyproject.toml                   ← Python package metadata + pinned deps
├── requirements.txt                 ← pip requirements (subset)
├── .gitignore, .hf_token            ← (.hf_token is gitignored)
│
├── docs/                            ← deliverable documentation
│   ├── CODEBASE_NOTES.md            ← faithful map of the code
│   ├── REPRO_LOG.md                 ← time-ordered reproduction log
│   ├── PROJECT_SUMMARY.md           ← claim verdict matrix + honest assessment
│   └── RELATED_WORK.md              ← 6-axis literature survey
│
├── src/chess_tutor/                 ← Python package (importable, no pip install needed)
│   ├── training/
│   │   ├── phase1.py                ← Phase 1 entry: `python -m chess_tutor.training.phase1`
│   │   ├── phase2.py                ← Phase 2 entry: dual-LoRA recipe
│   │   └── phase1_continue.py
│   ├── eval/
│   │   ├── single.py                ← single-adapter eval harness
│   │   ├── combined.py              ← dual-adapter eval with α/β blending
│   │   └── tables.py                ← post-process eval JSONs
│   ├── data/
│   │   ├── parse_broadcast.py       ← Lichess PGN → processed JSON
│   │   ├── generate_phase1.py       ← Phase 1 JSONL generation
│   │   ├── generate_phase2_prompts.py
│   │   ├── generate_phase2_explanations.py  ← Mixtral teacher loop
│   │   ├── organize_phase2.py       ← schema check + split
│   │   ├── simple_split.py          ← train/val/test splitter
│   │   ├── extract_comments.py
│   │   ├── parse_broadcast_parallel.py
│   │   └── make_hf_dataset.py       ← HuggingFace dataset format
│   ├── inference/
│   │   └── lora.py                  ← Load adapter + run inference
│   ├── benchmarks/
│   │   └── llama2_mixtral.py        ← Llama-2 GPTQ + Mixtral benchmark
│   └── gui/                         ← PyQt5 chess board demo
│       ├── chess_gui.py
│       └── images/pieces-basic-svg/
│
├── scripts/                         ← SLURM submission + reproduction scripts
│   ├── env.sh                       ← env vars + conda activation (source me!)
│   ├── env-freeze.txt               ← pinned pip versions
│   ├── verify_env.sh
│   ├── full_data_gen.sh             ← end-to-end Phase 1+2 data generation
│   ├── download_mixtral.sh
│   ├── phase1_{smoke,full}.sh
│   ├── phase2_{smoke,distill,full,full_10ep,smoke_finish}.sh
│   └── eval_*.sh                    ← single, alpha grid, combination_type sweep, etc.
│
├── tests/
│   ├── test_stockfish.py            ← stockfish smoke (`python -m tests.test_stockfish`)
│   ├── test_resources.py            ← GPU/memory pre-flight check
│   └── test_pipeline.py             ← end-to-end smoke
│
├── examples/                        ← example PGN samples
│   ├── example_games.pgn
│   └── short_example.pgn
│
├── training_data/                   ← data-location symlinks pointing at $PROJ
└── exp-outputs/                     ← historical SLURM logs from old runs

evaluation_results/, training_output/, hf_cache/, and the training_data/phase* symlinks all live under $PROJ (the large-storage area). The repo itself stays small (~20 MB tracked).


Citation / acknowledgement

This is course coursework from CSE 537, Spring 2025. If you build on it, please cite the original course project plus this reproduction:

Kuo, S-J. and Braimah, A. (2025). "Adaptive Chess Tutoring: Efficient LLM Reasoning
via Programmatic Supervision and Cognitive Distillation." CSE 537 final project,
Stony Brook University.

Kuo, S-J. (2026). "Reproduction of CSE 537 Chess-Tutor project on NuWulf HPC cluster."
Internal report. https://github.com/raykuo18/2025Spring_AI_Project

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors