Skip to content

Lomesh2000/text2SQL

Repository files navigation

T2S

Folder Structure

<!-- t2s/
├── configs/
│   └── config.py           # ModelConfig, TrainConfig
├── data/
│   ├── tokenizer.py        # encode / decode / special tokens
│   ├── dataset.py          # AlignedSQLDataset, BinDataset, CurriculumSampler
│   ├── preprocess.py       # clean_txt_file, split_and_tokenize
│   └── loader.py           # load_datasets (builds/caches .pt files)
├── model/
│   └── gpt.py              # LayerNorm, Attention, MLP, Block, GPT, masked_sql_loss
├── training/
│   ├── trainer.py          # train_stage, get_lr, evaluate
│   └── checkpoint.py       # save_checkpoint, load_checkpoint
├── inference/
│   └── generate.py         # generate_sql, extract_sql, load_best_model
├── utils/
│   └── plot.py             # training curve plots
├── scripts/
│   ├── train.py            # full curriculum training entrypoint
│   ├── finetune.py         # fine-tune from checkpoint
│   └── evaluate_spider.py  # Spider benchmark evaluation
└── README.md
``` -->

## Usage

### 1. Clean raw data
```python
from data.preprocess import clean_all
clean_all()

2. Build datasets

from data.loader import load_datasets
train_datasets, val_datasets = load_datasets()

3. Train from scratch

python scripts/train.py

4. Fine-tune from checkpoint

python scripts/finetune.py --checkpoint checkpoints/best_moderate.pt --steps 3000 --lr 6e-5

5. Evaluate on Spider

python scripts/evaluate_spider.py --checkpoint checkpoints/best_highly_complex.pt

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors