MuZero Knockoff – IT3105 Spring 2025

Course: AI Programming (IT-3105) – NTNU
Due Date: May 2, 2025

Team Members

Aleksander Olsvik
Edvard Schøyen
Kristian Vaula Jensen
Vetle Ekern

📖 Project Overview

This project is our implementation of a MuZero-inspired reinforcement learning system, developed as the main project for AI Programming (IT3105). MuZero, introduced by DeepMind in 2019, combines model-based and model-free reinforcement learning by simultaneously learning:

A representation network (NNr) – maps real game states to abstract latent states.
A dynamics network (NNd) – predicts the next latent state and immediate reward given a latent state and action.
A prediction network (NNp) – outputs a policy distribution and value estimate from a latent state.

Using these three components, MuZero builds a search tree with Monte Carlo Tree Search (u-MCTS), guiding the agent’s decisions while also generating training targets.

We re-implemented the core pipeline of MuZero from scratch, including training with backpropagation through time (BPTT), episodic buffers, and integration with custom environments.

🧩 Project Structure

src/
├── config/ # Configuration handling (YAML)
├── envs/ # Game environments (SnakePac, RiverRaid, wrappers)
├── gsm/ # Game State Managers
├── networks/ # Representation, Dynamics, Prediction networks
├── self_play/ # u-MCTS implementation
├── storage/ # Episode buffer for replay and training
├── rlm.py # Reinforcement Learning Manager (main training loop)
tests/ # Unit tests
models/ # Saved models/checkpoints
episode_data/ # Training episode storage

Key modules:

ReinforcementLearningManager – orchestrates training and evaluation.
uMCTS – abstract-state Monte Carlo Tree Search implementation.
NeuralNetManager – coordinates training of the three MuZero networks.
SnakePacEnv / RiverraidEnv – custom RL environments used for testing.
EpisodeBuffer – stores episodic data for BPTT training.

🎮 Environments

We implemented and tested MuZero in two environments:

SnakePac (custom)
- A simplified grid-world with coins and movement actions.
- Designed for debugging and rapid iteration.
River Raid (Atari)
- Leveraged ALE via Gymnasium.
- High-dimensional pixel input tested the scalability of our MuZero implementation.

⚙️ Training Pipeline

Episode Generation
- The agent plays episodes using u-MCTS to select actions.
- Each step records:
  - State
  - Value estimate
  - Policy distribution
  - Action taken
  - Reward received
Episode Buffer
- Stores episode histories.
- Supports sampling windows of states and actions for training.
Backpropagation Through Time (BPTT)
- Trains all three networks jointly.
- Uses a sliding window of past and future states (q look-back, w roll-ahead).
Checkpointing & Logging
- Checkpoints saved under checkpoints/.
- Optional logging with Weights & Biases (wandb) for monitoring.

🔑 Key Learnings

Implementing MuZero required integrating planning and learning tightly, reinforcing the idea of extreme bootstrapping.

Working with Atari-scale environments highlighted the challenges of training efficiency and compute limitations.

Producing educational visualizations was just as important as the code, sharpening our ability to communicate complex AI concepts clearly.

Name		Name	Last commit message	Last commit date
Latest commit History 127 Commits
.github/workflows		.github/workflows
.vscode		.vscode
src		src
tests		tests
wandb/run-20250331_101040-i53u6uu1/files		wandb/run-20250331_101040-i53u6uu1/files
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
config.yaml		config.yaml
gantt_diagram.png		gantt_diagram.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MuZero Knockoff – IT3105 Spring 2025

📖 Project Overview

🧩 Project Structure

🎮 Environments

⚙️ Training Pipeline

🔑 Key Learnings

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MuZero Knockoff – IT3105 Spring 2025

📖 Project Overview

🧩 Project Structure

🎮 Environments

⚙️ Training Pipeline

🔑 Key Learnings

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages