Skip to content

Releases: SahilKumar75/TuneOS

v0.2.0 — Modal Cloud GPU, DPO Training, Phase 4 Feature Complete

07 Jun 04:47

Choose a tag to compare

What's New in v0.2.0

This release brings TuneOS to feature-complete Phase 4, adding cloud GPU training, DPO preference learning, advanced trainer capabilities, and a completely refreshed UI.


🚀 New Features

Cloud GPU Training (Modal.com)

  • Jobs can now run on a free Modal T4 GPU instead of the local device — ideal when no local GPU is available
  • New Compute backend selector in Step 4 (Local GPU / Modal / HF Spaces)
  • workers/modal_runner.py serialises the dataset, runs the full training pipeline remotely, and streams adapter + eval metrics back to local disk
  • Enable by setting MODAL_TOKEN_ID + MODAL_TOKEN_SECRET; modal is optional (poetry install --with modal)

DPO Preference Training (P4-C)

  • Full DPO recipe wired end-to-end through trainer, API, and UI
  • Step 4 now exposes DPO hyperparameters; dataset step validates preference columns before build
  • TRL DPOTrainer integration with configurable beta and reference model

Advanced Trainer (P4-A / P4-B)

  • Prompt template registry — Alpaca, ChatML, Llama-3, Phi-3, and raw formats; sample packing for throughput
  • Configurable report_to for HF experiment-tracker integration
  • Flash Attention 2 / SDPA and RoPE scaling plumbed into model loading
  • LoRA init strategy exposed via init_lora_weights
  • Extended architecture support: Qwen3, Phi-4, Cohere, OLMo, Mixtral, MPT, StarCoder2, GPT-BigCode
  • use_4bit now optional instead of forced; all checkpoints use safe_serialization=True

Distillation, FSDP & INT8 Export (P4-F)

  • Knowledge distillation from a teacher model
  • FSDP (Fully Sharded Data Parallel) for multi-GPU training
  • Hyperparameter sweep support
  • INT8 export via bitsandbytes

Richer Eval Metrics & Live Streaming (P4-D / P4-E)

  • ROUGE-1 and BLEU computed from held-out evaluation sample after training
  • Live Modal training log streaming in Step 5
  • Eval metrics persisted to SQLite and surfaced in Step 6 Results

Multi-run Compare & Benchmark (P4-F)

  • New compare page with overlaid loss curves across runs
  • lm-eval benchmark wrapper for standardised model evaluation

API Hardening (P4-A)

  • LRU inference model cache (maxsize=3) — avoids GB-scale model reloads
  • GET /api/jobs pagination (default 50, max 500)
  • GET /api/gpu now reports device_count, vram_total_gb, vram_free_gb, cuda_version
  • Eval metrics fall back to durable SQLite when Redis copy has expired

PostgreSQL Experiment Backend

  • Set EXPERIMENTS_DB_URL to a postgresql:// DSN to share experiment data across workers
  • All upserts migrated to portable ON CONFLICT … DO UPDATE (SQLite 3.24+ and PostgreSQL compatible)

UI Polish

  • Chat panel consolidated into a single, cleaner component
  • Step 5/6 surfaces all eval metrics + Modal badge
  • Register-to-model-registry card on Results step
  • Hyperparameter comparison table (Technique, LR, LoRA r, Batch columns)

🔧 CI Improvements

  • Monolithic CI split into dedicated path-scoped jobs (lint, test-core, test-ui, docker, docs)
  • Docker build significantly faster via BuildKit cache injection for pip/poetry downloads
  • mypy static typing added to lint job

📖 Docs

  • Quickstart updated for Celery fallback, HF_TOKEN, and Modal cloud GPU
  • docs/api.md documents GET /api/health/workers and compute_backend
  • docs/DEPLOY.md gains full .env reference table
  • docs/supported-models.md reflects auto target_modules detection

🔄 Changed

  • State data models now use pydantic.BaseModel (Reflex rx.Base removed in newer versions)
  • Minimum trl version raised to >=0.12.0

Full Changelog: v0.1.0...v0.2.0