Experimental toolkit for routing prompts to talent-oriented language-model specialists.
The value of this project is in the learning approach, not in claiming a new learning algorithm.
Most techniques used here are already known in the AI community: probes, talent vectors, distillation, LoRA/adapters, quantization, pruning, early exit, routing, and guarded generation. The interesting part is putting them together in one transparent local workflow so you can see what each method does, where it helps, and where it fails.
In that sense, this repository is closer to an experimental lab than to an algorithmic breakthrough. It is useful for learning how model specialization behaves in practice: measuring trade-offs, comparing quality/cost, exposing routing decisions, and understanding why a small or pruned model may need guardrails.
The project explores:
- hidden-state talent probes
- talent-vector scoring
- model benchmarking with Pareto reports
- knowledge distillation
- LoRA/adapters
- dynamic quantization
- layer pruning and early-exit probes
- automatic routing to specialist models
- a local web UI for transparent prompt routing
python -m venv .venv
.\.venv\Scripts\python.exe -m pip install -r requirements.txtLarge downloaded models, caches, virtual environments, and generated specialist checkpoints are intentionally ignored by Git.
Generate a basic analysis report:
.\.venv\Scripts\python.exe .\examples\generate_report.py --model distilgpt2Run the optimization dashboard builder:
.\.venv\Scripts\python.exe .\examples\build_optimization_dashboard.pyServe the local routing UI:
.\.venv\Scripts\python.exe .\examples\serve_talent_router.py --local-files-onlyThen open:
http://127.0.0.1:8765
| Script | Purpose | Example command | HTML / JSON output |
|---|---|---|---|
examples/generate_report.py |
Attention, hidden-state and talent report for one model. | .\.venv\Scripts\python.exe .\examples\generate_report.py --model distilgpt2 --local-files-only |
reports/distilgpt2_report.html, reports/distilgpt2_report.json |
examples/talent_vectors_usage.py |
Console example for talent vectors and supervised probes. | .\.venv\Scripts\python.exe .\examples\talent_vectors_usage.py --local-files-only |
No HTML/JSON report; prints scores and metrics to the console. |
examples/benchmark_models.py |
Model quality/cost Pareto benchmark. | .\.venv\Scripts\python.exe .\examples\benchmark_models.py --models distilgpt2 .\models\talent_distilled_distilgpt2 --local-files-only |
reports/benchmark_pareto.html, reports/benchmark_pareto.json |
examples/distill_student.py |
Talent-oriented knowledge distillation. | .\.venv\Scripts\python.exe .\examples\distill_student.py --teacher gpt2 --student distilgpt2 --model-output-dir .\models\talent_distilled_distilgpt2 --local-files-only |
reports/distillation_report.html, reports/distillation_report.json |
examples/train_lora_adapters.py |
Train one LoRA adapter per talent. | .\.venv\Scripts\python.exe .\examples\train_lora_adapters.py --base-model distilgpt2 --local-files-only |
reports/lora_adapters_report.html, reports/lora_adapters_report.json |
examples/use_lora_adapter.py |
Load a saved LoRA adapter and generate a sample. | .\.venv\Scripts\python.exe .\examples\use_lora_adapter.py --base-model distilgpt2 --adapter .\models\lora_adapters\distilgpt2\coding.pt --local-files-only |
No HTML/JSON report; prints one generation to the console. |
examples/quantization_benchmark.py |
fp32 vs dynamic int8 benchmark. | .\.venv\Scripts\python.exe .\examples\quantization_benchmark.py --models distilgpt2 .\models\talent_distilled_distilgpt2 --local-files-only |
reports/quantization_report.html, reports/quantization_report.json |
examples/layer_pruning_early_exit.py |
Layer pruning and early-exit probe analysis. | .\.venv\Scripts\python.exe .\examples\layer_pruning_early_exit.py --model .\models\talent_distilled_distilgpt2 --local-files-only |
reports/layer_pruning_report.html, reports/layer_pruning_report.json |
examples/train_talent_models.py |
Small pruned specialist checkpoints from a local base model. | .\.venv\Scripts\python.exe .\examples\train_talent_models.py --base-model .\models\talent_distilled_distilgpt2 --local-files-only |
reports/talent_specialists_report.html, reports/talent_specialists_report.json |
examples/minify_distilgpt2_specialists.py |
distilgpt2 minified specialists using pruning plus light alignment. | .\.venv\Scripts\python.exe .\examples\minify_distilgpt2_specialists.py --teacher-model distilgpt2 --student-seed-model distilgpt2 --local-files-only |
reports/distilgpt2_minified_specialists_report.html, reports/distilgpt2_minified_specialists_report.json |
examples/specialize_recent_llm.py |
Recent instruction LLM pruning / specialist export. | .\.venv\Scripts\python.exe .\examples\specialize_recent_llm.py --base-model Qwen/Qwen3-1.7B --keep-layers 24 --epochs 0 --local-files-only |
reports/recent_llm_specialists_report.html, reports/recent_llm_specialists_report.json |
examples/build_optimization_dashboard.py |
Aggregate existing reports into one dashboard. | .\.venv\Scripts\python.exe .\examples\build_optimization_dashboard.py |
reports/optimization_suite.html, reports/optimization_suite.json |
examples/serve_talent_router.py |
Local web UI and routing API. | .\.venv\Scripts\python.exe .\examples\serve_talent_router.py --local-files-only |
No report; serves http://127.0.0.1:8765. |
This is a research/learning project, not a production LLM serving stack. The route is shown transparently in the UI, and guarded responses are used for some code cases where small local models are not reliable enough.