Skip to content

Mateces/c3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mahjong AI — Riichi Mahjong (4-player) Model

中文文档

A competitive Riichi Mahjong AI trained via offline reinforcement learning (Conservative Q-Learning) on expert-level game records.

Model

Architecture: 1D CNN with channel attention

  • 192 channels, 40 residual blocks
  • Based on the Mortal v4 observation encoder
  • Auxiliary heads for multi-task learning

Training method: CQL (Conservative Q-Learning) with auxiliary supervision

  • Primary: DQN action-value + CQL regularization + next-rank prediction
  • Auxiliary heads: score prediction, rank prediction (4-player), score-gap prediction
  • Curriculum: initial training on top ~250 players, then expanded to top ~750 players

Training data:

  • ~1.38 million games from ~750 top-level 4-player Riichi Mahjong players
  • East+South (半荘) format, competitive lobby level

Hyperparameters

Parameter Value
conv_channels 192
num_blocks 40
batch_size 256
lr_peak 1e-4
lr_final 1e-5
warmup_steps 200
weight_decay 0.1
max_grad_norm 1.0
gamma 1.0
min_q_weight (CQL) 5.0
next_rank_weight 0.2
score_weight 1.0
rank_weight 0.5
gap_weight 0.3
DDP 2× GPU

Performance

Tested over 4000 games (random seating, East+South):

Model Avg Rank 1st % 2nd % 3rd % 4th %
v4 (baseline) 2.419 27.3 26.2 23.7 22.8
c3 (ours) 2.492 25.5 24.5 25.2 24.8

The model approaches but does not yet surpass the Mortal v4 baseline. Training is ongoing.

Files

weights/                    # Download from Releases
  model-c3-best.pth         # Main model checkpoint (132MB)
  grp-best.pth              # GRP (Game Result Predictor) network (2.2MB)
scripts/
  train_main.py             # Training script (CQL + auxiliary heads)
  dataloader.py             # Data loader (reads .mjson via libriichi)
  mortal_bot_server.py      # Inference bridge (connects model to game engine)
  verify_worker_http.sh     # Distributed verification worker
mahjong/                    # Game engine + AI controller (TypeScript)
  src/                      # Engine source (game logic, shanten, scoring, AI)
  scripts/verify.ts         # Verification orchestrator
  scripts/verify-worker.ts  # Per-worker game runner
  package.json
cf-verify/                  # Cloudflare Worker coordinator for distributed testing

Download weights

Download from Releases:

mkdir -p weights
curl -L -o weights/model-c3-best.pth https://github.com/lynkas/c3/releases/download/v1.0/model-c3-best.pth
curl -L -o weights/grp-best.pth https://github.com/lynkas/c3/releases/download/v1.0/grp-best.pth

Usage

Loading the model for inference

import torch
import sys
sys.path.insert(0, "path/to/mortal/mortal")
from model import Brain

# Load checkpoint
ckpt = torch.load("weights/model-c3-best.pth", map_location="cpu")
model = Brain(version=4, conv_channels=192, num_blocks=40)
model.load_state_dict(ckpt["mortal"])
model.eval()

Training

Requires: Mortal's libriichi compiled, a Python venv with PyTorch, and game data in .mjson format.

# Single GPU
python scripts/train_main.py \
  --grp checkpoints/grp-best.pth \
  --train-glob "data/train/*.mjson" \
  --val-glob "data/val/*.mjson" \
  --save checkpoints/model.pth \
  --tensorboard runs/experiment \
  --device cuda \
  --conv-channels 192 --num-blocks 40 \
  --batch-size 256 \
  --lr-peak 1e-4 --lr-final 1e-5 \
  --warmup-steps 200 --max-steps 200000 \
  --save-every 400 --val-steps 50 --patience 100 \
  --weight-decay 0.1 --max-grad-norm 1.0 \
  --score-weight 1.0 --rank-weight 0.5 --gap-weight 0.3

# Multi-GPU (DDP)
CUDA_VISIBLE_DEVICES=0,1 torchrun --standalone --nproc_per_node=2 scripts/train_main.py \
  [same args as above]

Key arguments:

Argument Description
--grp Path to GRP (Game Result Predictor) checkpoint
--train-glob Glob pattern for training data files
--val-glob Glob pattern for validation data files
--save Output checkpoint path (also used for resume)
--patience Early stopping patience (in validation cycles)
--score-weight Weight for score prediction auxiliary loss
--rank-weight Weight for rank prediction auxiliary loss
--gap-weight Weight for gap prediction auxiliary loss

To resume training, simply point --save to an existing checkpoint.

Distributed Verification System

The verification system runs model-vs-model games across multiple machines, coordinated by a Cloudflare Worker.

1. Deploy the coordinator

cd cf-verify
npm install
# Edit wrangler.toml with your account_id, KV namespace, and tokens
npx wrangler deploy

2. Create a job

curl -X POST "https://your-worker.workers.dev/job" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_ADMIN_TOKEN" \
  -d '{
    "total": 1000,
    "strategies": "mortal:model-a.pth,mortal:model-b.pth,mortal:model-c.pth,mortal:model-d.pth",
    "difficulties": "0,0,0,0",
    "end_round": 8,
    "shuffle_seats": true
  }'

3. Start workers

Each worker claims batches, runs games locally, and reports results back.

COORDINATOR="https://your-worker.workers.dev" \
TOKEN="YOUR_WORKER_TOKEN" \
WORKER_NAME="my-machine" \
WORKERS=2 \
MODELS_DIR="./weights" \
DEVICE_MAP="cuda:0,cuda:0,cuda:0,cuda:0" \
MAHJONG_DIR="/path/to/mahjong" \
MORTAL_PYTHON="/path/to/venv/bin/python3" \
MORTAL_SERVER="/path/to/mortal_bot_server.py" \
bash scripts/verify_worker_http.sh

Worker environment variables:

Variable Required Description
COORDINATOR Coordinator URL
TOKEN Worker auth token
WORKER_NAME Identifier shown on dashboard
WORKERS Parallel game workers (default: 4)
MODELS_DIR Directory containing model .pth files
DEVICE_MAP Comma-separated devices for each strategy (e.g. cuda:0,cuda:0,cuda:1,cuda:1)
MAHJONG_DIR Path to mahjong project (with scripts/verify.ts)
MORTAL_PYTHON Python interpreter with torch + libriichi
MORTAL_SERVER Path to mortal_bot_server.py

4. Monitor progress

  • Dashboard: Visit the coordinator URL in a browser
  • API:
    curl https://your-worker.workers.dev/status     # Job progress + worker status
    curl https://your-worker.workers.dev/aggregate  # Overall rankings
    curl https://your-worker.workers.dev/aggregate?worker=my-machine  # Per-worker stats

5. Manage jobs

# Extend a job's target
curl -X PATCH "https://your-worker.workers.dev/job" \
  -H "Authorization: Bearer YOUR_ADMIN_TOKEN" \
  -d '{"job_id": "abc123", "total": 4000, "set_current": true}'

# List all jobs
curl https://your-worker.workers.dev/jobs

Auxiliary Heads

The model uses multi-task learning with three auxiliary prediction heads that share the backbone:

  1. ScoreHead — Predicts current 4-player scores (MSE loss)
  2. RankHead — Predicts rank distribution of all 4 players (Cross-entropy)
  3. GapHead — Predicts log-scale score gap to top/bottom player (Huber loss)

These heads force the backbone to internalize game-state awareness (score positions, rank dynamics) that the raw observation encoding makes difficult to learn through policy gradients alone.

Training Pipeline

The model is trained in stages (curriculum learning):

Stage 1: Base model (c)

Starting from Mortal v4 pre-trained weights, fine-tune with CQL on top ~242 players:

python scripts/train_main.py \
  --grp checkpoints/grp-best.pth \
  --train-glob "data/top_players_242/*.mjson" \
  --val-glob "data/val/*.mjson" \
  --save checkpoints/model-c.pth \
  --lr-peak 3e-5 --lr-final 1e-6 \
  --weight-decay 0.2 --score-weight 1.0 \
  --rank-weight 0 --gap-weight 0

Stage 2: Expand data (c2)

From model-c, continue training on ~750 players:

cp checkpoints/model-c-best.pth checkpoints/model-c2.pth
python scripts/train_main.py \
  --save checkpoints/model-c2.pth \
  --train-glob "data/top_players_750/*.mjson" \
  --lr-peak 3e-5 --lr-final 1e-6 \
  --weight-decay 0.2 --score-weight 1.0 \
  --rank-weight 0 --gap-weight 0

Stage 3: Add auxiliary heads (c3)

From model-c2, add rank/gap prediction heads with higher learning rate:

cp checkpoints/model-c2-best.pth checkpoints/model-c3.pth
python scripts/train_main.py \
  --save checkpoints/model-c3.pth \
  --train-glob "data/top_players_750/*.mjson" \
  --lr-peak 1e-4 --lr-final 1e-5 \
  --weight-decay 0.1 \
  --score-weight 1.0 --rank-weight 0.5 --gap-weight 0.3

c3-best was achieved after ~14,400 steps of stage 3 training.

Data format

Training data uses .mjson format (gzipped JSON lines), where each file contains one game in mjai event format. The dataloader (scripts/dataloader.py) handles parsing via libriichi.

Requirements

  • Python 3.10+
  • PyTorch 2.0+
  • libriichi (Rust-compiled Python extension)
  • Node.js + tsx (for verification scripts)

License

MIT

Model weights, training code, and verification tools are released under the MIT License.

Note: The runtime inference environment requires Mortal (AGPL-3.0) components (libriichi, observation encoder). This repository does not include Mortal source code.

Acknowledgments

Built on the Mortal framework by Equim.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors