The single entry repo for learning MinT (Mind Lab Toolkit) — from first API call to advanced RL training.
Important: All experiments run against an already deployed MinT server. This repo does not start MinT backend services locally. You only need valid server endpoint + API key credentials.
| # | Demo | Track | Reward Source / Shape | Script |
|---|---|---|---|---|
| 1 | RL-1 Verifiable Math | RL | Deterministic verifier | demos/rl/adapters/verifiable_math.py |
| 2 | RL-2 Preference Chat | RL | Pairwise/judge preference | demos/rl/adapters/preference_chat.py |
| 3 | RL-3 Environment Tool Use | RL | Code execution feedback | demos/rl/adapters/environment_tooluse.py |
| 4 | Sampling Log | Sampling | Train then inspect model responses | quickstart/sampling_log.py |
| 5 | Embodied-1 OpenPI FAST SDK | Embodied | MinT-only mintx OpenPI client over 3 camera images + state + action-token supervision |
demos/embodied/openpi_vla_sdk.py |
| Demo | Track | Why it exists | Script |
|---|---|---|---|
| OpenPI FAST HTTP | Embodied | Shows the raw wire protocol directly for debugging and request-shape reference | demos/embodied/openpi_vla_http.py |
| # | Demo | Track | Description | Status |
|---|---|---|---|---|
| 6 | VLM-1 Vision QA | VLM | Image + question -> grounded answer | Planned (M2) |
| 7 | VLM-2 Vision Instruction | VLM | Image + task -> action/decision | Planned (M2) |
Requirements: Python >= 3.11, a MinT API key
pip install git+https://github.com/MindLab-Research/mindlab-toolkit.git python-dotenv matplotlib numpyCreate .env in the repo root:
MINT_API_KEY=sk-your-api-key-here
Use the MinT endpoint that matches your region:
- Mainland China:
https://mint-cn.macaron.xin/ - Outside Mainland China:
https://mint.macaron.xin/
- Use SFT when you already know what the model should say or do and you have labeled target outputs.
- Use RL when you do not have one fixed target answer but you can score the model's behavior with a reward, verifier, test suite, or environment feedback.
- If you have both, you can combine them. The common pattern is SFT for the basic behavior, then RL for optimization, but that is not a required order for every task.
Yes. MinT supports SFT directly.
The standard SFT path is:
forward_backward(..., loss_fn="cross_entropy")optim_step(...)
Choose by your network path:
- Mainland China ->
https://mint-cn.macaron.xin/ - Outside Mainland China ->
https://mint.macaron.xin/
If you are unsure, try the one that matches your region first. The practical goal is lower latency and stable connectivity.
MINT_API_KEY is currently issued by the Mind Lab team.
To request access:
- go to
https://macaron.im/mindlab - use Schedule a Demo
- or email
contact@mindlab.ltd
Run the quickstart (SFT then RL in one script):
python quickstart/quickstart.pyOr open the interactive notebook:
jupyter notebook quickstart/mint_quickstart.ipynbOr run a focused quickstart recipe:
python quickstart/custom_reward.py
python quickstart/custom_loss.pypython demos/rl/adapters/verifiable_math.py # RL-1: math with exact-match reward
python demos/rl/adapters/preference_chat.py # RL-2: chat with helpfulness proxy
python demos/rl/adapters/environment_tooluse.py # RL-3: code gen with execution reward
python demos/embodied/openpi_vla_sdk.py # Embodied-1: OpenPI via mintx / mint.mint
python demos/embodied/openpi_vla_http.py # Reference: raw OpenPI FAST HTTP wire shapeAll demos are configurable via environment variables. See demos/rl/README.md for details.
If you want a full checkpoint lifecycle:
python advanced/checkpoint.py save --name my-ckpt
python advanced/checkpoint.py download tinker://<run-id>/weights/<ckpt-name> -o ./ckpts
python advanced/checkpoint.py upload ./ckpts/<archive>.tar.gz
python advanced/checkpoint.py resume tinker://<run-id>/weights/<ckpt-name> --with-optimizer --steps 3See advanced/README.md for the full command matrix, the optimizer-preserving resume shape (create_lora_training_client(...) + load_state_with_optimizer(...)), and guardrails (sampler_weights vs weights).
If you want a focused end-to-end check for session-level Seq-MIS wiring:
python advanced/validate_mis_rollout_correction.py --base-model Qwen/Qwen3-30B-A3B-Instruct-2507See docs/mis_rollout_correction.md for prerequisites, env vars, expected output, and failure modes.
Monitor queue position and estimated wait time for pending sample requests:
python advanced/queue_status.pyUses the low-level AsyncTinker client with backpressure headers to read queue fields from 408 responses.
mint-quickstart/
.env.example # Template for API key configuration
quickstart/
quickstart.py # SFT -> RL in one script
custom_reward.py # Client-side reward shaping + importance_sampling
custom_loss.py # Pairwise preference training via forward_backward_custom
sampling_log.py # Train then inspect model responses
mint_quickstart.ipynb # Interactive notebook version
demos/
rl/ # 3 RL demos (available)
rl_core.py # Shared GRPO training loop
adapters/
verifiable_math.py
preference_chat.py
environment_tooluse.py
vlm/ # 2 VLM demos (coming soon)
embodied/ # primary SDK demo + low-level HTTP reference
advanced/ # Checkpoint workflows, MIS validation, queue status
docs/
roadmap.md # 6-demo roadmap with status tags
troubleshooting.md # Common issues and fixes
migration-from-minT-demo.md
experiments/ # Validation reports for quickstart flows
.pi/
skills/ # Project-local pi skills for API, debugging, and issue reporting
mint-skill/ # AI coding agent migration skill
If you have existing code using import tinker, the lowest-friction MinT migration is:
import mint as tinkerThen point the Tinker-style client surface at MinT:
TINKER_BASE_URL=<your-region-endpoint>
TINKER_API_KEY=<your-mint-api-key>Use the MinT endpoint that matches your region:
- Mainland China:
https://mint-cn.macaron.xin/ - Outside Mainland China:
https://mint.macaron.xin/
Why this is the recommended path:
- raw upstream
import tinkerstill validates API keys with thetml-prefix - MinT API keys start with
sk- import mint as tinkerkeeps the Tinker-style code shape while enabling MinT compatibility patches
If you must keep the exact import tinker statement, import mint earlier in the same process before constructing Tinker clients.
- Roadmap — all 6 demos with availability status
- Troubleshooting — common issues and solutions
- Migration Guide — moving from old MinT-demo repo
- Quickstart Guide — first run plus focused custom reward / custom loss recipes
- RL Demos — detailed docs for the 3 available RL demos
- Embodied Demos — primary OpenPI SDK example plus low-level HTTP reference
- Advanced — checkpoint workflows and MIS validation entry points
- MIS Rollout Correction — targeted Seq-MIS validation flow and troubleshooting
- Experiment Report — quickstart upload-download-resume validation template/results
- Pi Skills — project-local pi skills for API, debugging, and issue reporting
- Migration Skill — AI agent skill for migrating from verl/TRL/OpenRLHF
- 中文 README — Chinese version of this document