This repository contains a learning-focused LoRA fine-tuning setup for an English legal support assistant. The current default config is sized for local experimentation on a Mac with Qwen/Qwen2.5-0.5B-Instruct. Quantization is included in the codepath but disabled by default because bitsandbytes 4-bit QLoRA requires a CUDA environment.
- loads an instruction-tuned base model from Hugging Face
- applies LoRA adapters with PEFT
- formats chat-style JSONL data into the model's chat template
- fine-tunes with
trl.SFTTrainer - saves the LoRA adapter separately from the base model
- reloads the base model + adapter for inference
qlora-project/
├── configs/
│ └── qlora.yaml
├── data/
│ └── processed/
│ └── train.jsonl
├── models/
│ ├── base/
│ └── lora/
├── outputs/
│ ├── checkpoints/
│ └── logs/
├── src/
│ ├── dataset.py
│ ├── inference.py
│ ├── model.py
│ ├── prompts.py
│ ├── train.py
│ └── utils.py
├── .gitignore
├── README.md
└── requirements.txt
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txtEach line in data/processed/train.jsonl must be JSON with a messages array:
{
"messages": [
{"role": "system", "content": "System instruction"},
{"role": "user", "content": "User question"},
{"role": "assistant", "content": "Assistant answer"}
]
}The assistant responses should stay within general legal information, mention jurisdiction differences, and recommend consulting a licensed attorney for specific advice.
python -m src.train --config configs/qlora.yamlNotes:
- the default config uses
Qwen/Qwen2.5-0.5B-Instruct save_strategy: "no"avoids the shared-tensor checkpoint error you hit during local training- the LoRA adapter is saved to
models/lora/legal-support-qwen-lora
python -m src.inference \
--config configs/qlora.yaml \
--question "My employer fired me without warning. What general legal issues should I review?"configs/qlora.yaml controls:
- model name and dtype
- optional quantization settings
- LoRA hyperparameters
- training arguments
- inference generation settings
To move from local LoRA to true QLoRA later:
- switch to a CUDA environment
- set
quantization.enabled: true - update
model.torch_dtypeand training flags for the target hardware - replace the small base model with your target Qwen checkpoint