Mahjong AI — Riichi Mahjong (4-player) Model

A competitive Riichi Mahjong AI trained via offline reinforcement learning (Conservative Q-Learning) on expert-level game records.

Model

Architecture: 1D CNN with channel attention

192 channels, 40 residual blocks
Based on the Mortal v4 observation encoder
Auxiliary heads for multi-task learning

Training method: CQL (Conservative Q-Learning) with auxiliary supervision

Primary: DQN action-value + CQL regularization + next-rank prediction
Auxiliary heads: score prediction, rank prediction (4-player), score-gap prediction
Curriculum: initial training on top ~250 players, then expanded to top ~750 players

Training data:

~1.38 million games from ~750 top-level 4-player Riichi Mahjong players
East+South (半荘) format, competitive lobby level

Hyperparameters

Parameter	Value
conv_channels	192
num_blocks	40
batch_size	256
lr_peak	1e-4
lr_final	1e-5
warmup_steps	200
weight_decay	0.1
max_grad_norm	1.0
gamma	1.0
min_q_weight (CQL)	5.0
next_rank_weight	0.2
score_weight	1.0
rank_weight	0.5
gap_weight	0.3
DDP	2× GPU

Performance

Tested over 4000 games (random seating, East+South):

Model	Avg Rank	1st %	2nd %	3rd %	4th %
v4 (baseline)	2.419	27.3	26.2	23.7	22.8
c3 (ours)	2.492	25.5	24.5	25.2	24.8

The model approaches but does not yet surpass the Mortal v4 baseline. Training is ongoing.

Files

weights/                    # Download from Releases
  model-c3-best.pth         # Main model checkpoint (132MB)
  grp-best.pth              # GRP (Game Result Predictor) network (2.2MB)
scripts/
  train_main.py             # Training script (CQL + auxiliary heads)
  dataloader.py             # Data loader (reads .mjson via libriichi)
  mortal_bot_server.py      # Inference bridge (connects model to game engine)
  verify_worker_http.sh     # Distributed verification worker
mahjong/                    # Game engine + AI controller (TypeScript)
  src/                      # Engine source (game logic, shanten, scoring, AI)
  scripts/verify.ts         # Verification orchestrator
  scripts/verify-worker.ts  # Per-worker game runner
  package.json
cf-verify/                  # Cloudflare Worker coordinator for distributed testing

Download weights

Download from Releases:

mkdir -p weights
curl -L -o weights/model-c3-best.pth https://github.com/lynkas/c3/releases/download/v1.0/model-c3-best.pth
curl -L -o weights/grp-best.pth https://github.com/lynkas/c3/releases/download/v1.0/grp-best.pth

Usage

Loading the model for inference

import torch
import sys
sys.path.insert(0, "path/to/mortal/mortal")
from model import Brain

# Load checkpoint
ckpt = torch.load("weights/model-c3-best.pth", map_location="cpu")
model = Brain(version=4, conv_channels=192, num_blocks=40)
model.load_state_dict(ckpt["mortal"])
model.eval()

Training

Requires: Mortal's libriichi compiled, a Python venv with PyTorch, and game data in .mjson format.

# Single GPU
python scripts/train_main.py \
  --grp checkpoints/grp-best.pth \
  --train-glob "data/train/*.mjson" \
  --val-glob "data/val/*.mjson" \
  --save checkpoints/model.pth \
  --tensorboard runs/experiment \
  --device cuda \
  --conv-channels 192 --num-blocks 40 \
  --batch-size 256 \
  --lr-peak 1e-4 --lr-final 1e-5 \
  --warmup-steps 200 --max-steps 200000 \
  --save-every 400 --val-steps 50 --patience 100 \
  --weight-decay 0.1 --max-grad-norm 1.0 \
  --score-weight 1.0 --rank-weight 0.5 --gap-weight 0.3

# Multi-GPU (DDP)
CUDA_VISIBLE_DEVICES=0,1 torchrun --standalone --nproc_per_node=2 scripts/train_main.py \
  [same args as above]

Key arguments:

Argument	Description
`--grp`	Path to GRP (Game Result Predictor) checkpoint
`--train-glob`	Glob pattern for training data files
`--val-glob`	Glob pattern for validation data files
`--save`	Output checkpoint path (also used for resume)
`--patience`	Early stopping patience (in validation cycles)
`--score-weight`	Weight for score prediction auxiliary loss
`--rank-weight`	Weight for rank prediction auxiliary loss
`--gap-weight`	Weight for gap prediction auxiliary loss

To resume training, simply point --save to an existing checkpoint.

Distributed Verification System

The verification system runs model-vs-model games across multiple machines, coordinated by a Cloudflare Worker.

1. Deploy the coordinator

cd cf-verify
npm install
# Edit wrangler.toml with your account_id, KV namespace, and tokens
npx wrangler deploy

2. Create a job

curl -X POST "https://your-worker.workers.dev/job" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_ADMIN_TOKEN" \
  -d '{
    "total": 1000,
    "strategies": "mortal:model-a.pth,mortal:model-b.pth,mortal:model-c.pth,mortal:model-d.pth",
    "difficulties": "0,0,0,0",
    "end_round": 8,
    "shuffle_seats": true
  }'

3. Start workers

Each worker claims batches, runs games locally, and reports results back.

COORDINATOR="https://your-worker.workers.dev" \
TOKEN="YOUR_WORKER_TOKEN" \
WORKER_NAME="my-machine" \
WORKERS=2 \
MODELS_DIR="./weights" \
DEVICE_MAP="cuda:0,cuda:0,cuda:0,cuda:0" \
MAHJONG_DIR="/path/to/mahjong" \
MORTAL_PYTHON="/path/to/venv/bin/python3" \
MORTAL_SERVER="/path/to/mortal_bot_server.py" \
bash scripts/verify_worker_http.sh

Worker environment variables:

Variable	Required	Description
`COORDINATOR`	✅	Coordinator URL
`TOKEN`	✅	Worker auth token
`WORKER_NAME`		Identifier shown on dashboard
`WORKERS`		Parallel game workers (default: 4)
`MODELS_DIR`		Directory containing model .pth files
`DEVICE_MAP`		Comma-separated devices for each strategy (e.g. `cuda:0,cuda:0,cuda:1,cuda:1`)
`MAHJONG_DIR`		Path to mahjong project (with scripts/verify.ts)
`MORTAL_PYTHON`		Python interpreter with torch + libriichi
`MORTAL_SERVER`		Path to mortal_bot_server.py

4. Monitor progress

Dashboard: Visit the coordinator URL in a browser

API:

curl https://your-worker.workers.dev/status     # Job progress + worker status
curl https://your-worker.workers.dev/aggregate  # Overall rankings
curl https://your-worker.workers.dev/aggregate?worker=my-machine  # Per-worker stats

5. Manage jobs

# Extend a job's target
curl -X PATCH "https://your-worker.workers.dev/job" \
  -H "Authorization: Bearer YOUR_ADMIN_TOKEN" \
  -d '{"job_id": "abc123", "total": 4000, "set_current": true}'

# List all jobs
curl https://your-worker.workers.dev/jobs

Auxiliary Heads

The model uses multi-task learning with three auxiliary prediction heads that share the backbone:

ScoreHead — Predicts current 4-player scores (MSE loss)
RankHead — Predicts rank distribution of all 4 players (Cross-entropy)
GapHead — Predicts log-scale score gap to top/bottom player (Huber loss)

These heads force the backbone to internalize game-state awareness (score positions, rank dynamics) that the raw observation encoding makes difficult to learn through policy gradients alone.

Training Pipeline

The model is trained in stages (curriculum learning):

Stage 1: Base model (c)

Starting from Mortal v4 pre-trained weights, fine-tune with CQL on top ~242 players:

python scripts/train_main.py \
  --grp checkpoints/grp-best.pth \
  --train-glob "data/top_players_242/*.mjson" \
  --val-glob "data/val/*.mjson" \
  --save checkpoints/model-c.pth \
  --lr-peak 3e-5 --lr-final 1e-6 \
  --weight-decay 0.2 --score-weight 1.0 \
  --rank-weight 0 --gap-weight 0

Stage 2: Expand data (c2)

From model-c, continue training on ~750 players:

cp checkpoints/model-c-best.pth checkpoints/model-c2.pth
python scripts/train_main.py \
  --save checkpoints/model-c2.pth \
  --train-glob "data/top_players_750/*.mjson" \
  --lr-peak 3e-5 --lr-final 1e-6 \
  --weight-decay 0.2 --score-weight 1.0 \
  --rank-weight 0 --gap-weight 0

Stage 3: Add auxiliary heads (c3)

From model-c2, add rank/gap prediction heads with higher learning rate:

cp checkpoints/model-c2-best.pth checkpoints/model-c3.pth
python scripts/train_main.py \
  --save checkpoints/model-c3.pth \
  --train-glob "data/top_players_750/*.mjson" \
  --lr-peak 1e-4 --lr-final 1e-5 \
  --weight-decay 0.1 \
  --score-weight 1.0 --rank-weight 0.5 --gap-weight 0.3

c3-best was achieved after ~14,400 steps of stage 3 training.

Data format

Training data uses .mjson format (gzipped JSON lines), where each file contains one game in mjai event format. The dataloader (scripts/dataloader.py) handles parsing via libriichi.

Requirements

Python 3.10+
PyTorch 2.0+
libriichi (Rust-compiled Python extension)
Node.js + tsx (for verification scripts)

License

MIT

Model weights, training code, and verification tools are released under the MIT License.

Note: The runtime inference environment requires Mortal (AGPL-3.0) components (libriichi, observation encoder). This repository does not include Mortal source code.

Acknowledgments

Built on the Mortal framework by Equim.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
cf-verify		cf-verify
mahjong		mahjong
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mahjong AI — Riichi Mahjong (4-player) Model

Model

Hyperparameters

Performance

Files

Download weights

Usage

Loading the model for inference

Training

Distributed Verification System

1. Deploy the coordinator

2. Create a job

3. Start workers

4. Monitor progress

5. Manage jobs

Auxiliary Heads

Training Pipeline

Stage 1: Base model (c)

Stage 2: Expand data (c2)

Stage 3: Add auxiliary heads (c3)

Data format

Requirements

License

Acknowledgments

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Mahjong AI — Riichi Mahjong (4-player) Model

Model

Hyperparameters

Performance

Files

Download weights

Usage

Loading the model for inference

Training

Distributed Verification System

1. Deploy the coordinator

2. Create a job

3. Start workers

4. Monitor progress

5. Manage jobs

Auxiliary Heads

Training Pipeline

Stage 1: Base model (c)

Stage 2: Expand data (c2)

Stage 3: Add auxiliary heads (c3)

Data format

Requirements

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages