Mac Studio AI Cluster Guide

Build a 2TB Unified Memory AI Supercomputer with 4 Mac Studios

Watch the full video: https://youtu.be/bFgTxr5yst0

This is the companion guide to my YouTube video where I cluster 4 Mac Studios together to create a local AI powerhouse capable of running trillion-parameter models.

The Setup

Component	Specs
Nodes	4x Mac Studio M4 Ultra
RAM per Node	512GB Unified Memory
Total Memory	2TB (GPU-accessible)
GPU Cores	320 (80 per node)
Storage	32TB total (8TB per node)
Interconnect	Thunderbolt 5 Mesh + Ethernet

Cost Comparison

Setup	Memory	Approx. Cost
This Mac Cluster	2TB	~$50,000
Equivalent NVIDIA H100s	2TB (26x 80GB)	$780,000+

Requirements

Hardware

4x Mac Studio (M4 Ultra recommended, M4 Max works too)
- 512GB unified memory each (for 2TB total)
- Or mix configs (minimum 2 nodes for meaningful clustering)
Thunderbolt 5 cables (6 cables for full mesh with 4 nodes)
Ethernet switch (2.5GbE minimum, 10GbE recommended for model downloads)
Ethernet cables (1 per node)

Software

macOS Sequoia 15.3+ (Tahoe 26.2 beta or later with RDMA support)
Exo Labs v1.1+ (Download)

Network Topology

Thunderbolt Mesh Connection

Connect all 4 Mac Studios in a mesh topology using Thunderbolt 5:

        Mac 1
       /  |  \
      /   |   \
   Mac 2--+--Mac 4
      \   |   /
       \  |  /
        Mac 3

Cable Connections (6 total):

Mac 1 ↔ Mac 2
Mac 1 ↔ Mac 3
Mac 1 ↔ Mac 4
Mac 2 ↔ Mac 3
Mac 2 ↔ Mac 4
Mac 3 ↔ Mac 4

Ethernet Connection

Connect all nodes to the same Ethernet switch for:

Node discovery
Model downloads
API access

Setup Instructions

Step 1: Enable RDMA

This is the secret sauce. RDMA (Remote Direct Memory Access) reduces latency from 300μs to 3μs - a 100x improvement.

Boot into Recovery Mode
- Shut down your Mac
- Press and hold the power button until "Loading startup options" appears
- Click Options → Continue
Open Terminal (Utilities → Terminal)
Enable RDMA
```
rdma_ctl enable
```
Restart and repeat for each node

Step 2: Install Exo Labs

Download and install from: https://github.com/exo-explore/exo

Or via Homebrew:

brew install exo

Step 3: Connect Hardware

Connect all Thunderbolt 5 cables in mesh configuration
Connect all nodes to Ethernet switch
Power on all nodes

Step 4: Launch Cluster

Open Exo Labs app on all nodes
Nodes should auto-discover via Ethernet
Select your parallelism mode:
- Tensor + RDMA (recommended) - All nodes work on every layer together
- Pipeline (legacy) - Sequential layer processing

Step 5: Verify Cluster

In Exo Labs, you should see all 4 nodes connected with their combined resources displayed.

Understanding Parallelism

Pipeline Parallelism (Old Way)

Each Mac processes different layers sequentially:

Prompt → [Mac 1: Layers 1-20] → [Mac 2: Layers 21-40] → [Mac 3: Layers 41-60] → [Mac 4: Layers 61-80] → Response

Problem: Sequential = waiting. Each node waits for the previous one.

Tensor Parallelism (New Way with RDMA)

All Macs work on every layer together:

Layer 1: Mac1(25%) + Mac2(25%) + Mac3(25%) + Mac4(25%) → Combine → Layer 2...

Benefit: True parallel processing. ~3.5x faster than pipeline.

Why RDMA Matters

Metric	Without RDMA	With RDMA
Latency	300 μs	3 μs
Improvement	-	100x
Parallelism	Pipeline only	Tensor enabled

RDMA bypasses the traditional TCP/IP stack, allowing direct GPU-to-GPU memory access over Thunderbolt.

Benchmarks

Model Performance (4-Node Cluster with RDMA)

Model	Parameters	Size	Tokens/sec	Notes
Llama 3.2 3B	3B	~2GB	240	Small model baseline
Llama 3.3 70B FP16	70B	~140GB	16	Full precision
Qwen 3 Coder 480B	480B (MoE)	~280GB	40	Mixture of Experts
Kimi K2	1T (MoE)	~658GB	28-30	Thinking model
DeepSeek V3.1 671B	671B	~713GB	26-27	8-bit quantized

Single Node vs Cluster

Model	1 Node	4 Nodes	Speedup
Llama 3.2 3B	147 tok/s	240 tok/s	1.6x
Llama 3.3 70B	5 tok/s	16 tok/s	3.2x
Qwen 3 Coder 480B	27 tok/s	40 tok/s	1.5x

Pipeline vs Tensor (Same Cluster)

Mode	Llama 70B	Improvement
Pipeline (no RDMA)	5 tok/s	baseline
Tensor (no RDMA)	3 tok/s	slower (too much chatter)
Tensor + RDMA	16 tok/s	3.2x faster

Running Multiple Models

One of the coolest features - run multiple models simultaneously:

Loaded Models (tested simultaneously):
├── Kimi K2 (1T params) - 33% RAM per node
├── DeepSeek V3.1 671B - ~18% RAM per node
├── Llama 3.3 70B FP16 - ~9% RAM per node
├── Llama 3.3 70B 4-bit - ~5% RAM per node
└── Llama 3.2 3B - <1% RAM per node

Total: 5 models loaded and responsive simultaneously.

Integration with Apps

Open WebUI

Exo Labs exposes an OpenAI-compatible API. Point Open WebUI to your cluster:

# docker-compose.yml addition
environment:
  - OPENAI_API_BASE=http://<mac-studio-ip>:8000/v1
  - OPENAI_API_KEY=not-needed

Xcode

Works with Xcode's AI coding features when configured as a local model endpoint.

Claude Code / Cursor / Continue

Any tool that supports OpenAI-compatible endpoints can use your cluster:

export OPENAI_API_BASE="http://<cluster-ip>:8000/v1"
export OPENAI_API_KEY="local"

Troubleshooting

Nodes Not Discovering Each Other

Cycle the Ethernet interfaces to trigger discovery:

# Run on each node (or use the script in /scripts)
ETH=$(networksetup -listallnetworkservices | grep -i ethernet | head -1)
sudo networksetup -setnetworkserviceenabled "$ETH" off
sleep 2
sudo networksetup -setnetworkserviceenabled "$ETH" on

Cluster Crashes Under Load

This can happen with beta software. Full restart sequence:

# See scripts/restart-cluster.sh for full automation

Model Loading Fails

Ensure all nodes have the model downloaded
Check available memory across cluster
Try with fewer nodes first

Thunderbolt Connection Issues

Use Apple-certified Thunderbolt 5 cables
Check System Information → Thunderbolt for connection status
Reconnect cables if nodes drop out

Scripts

See the /scripts directory for automation helpers:

check-status.sh - Check if all nodes are online
restart-cluster.sh - Full cluster reboot sequence
fix-discovery.sh - Cycle network for node discovery
start-exo.sh - Launch Exo on all nodes

Power Consumption

State	Per Node	4-Node Cluster
Idle	~30W	~120W
Light Load	~80W	~320W
Full Inference	~130-150W	~520-600W

Resources

Exo Labs: https://github.com/exo-explore/exo
MLX Framework: https://github.com/ml-explore/mlx
NetworkChuck Discord: https://discord.gg/networkchuck
NetworkChuck Academy: https://academy.networkchuck.com

FAQ

Q: Do I need 4 identical Mac Studios? A: No, you can mix configurations. Even 2 nodes with different RAM amounts will work.

Q: Can I use M3 or M2 Macs? A: Yes, but you need macOS with RDMA support (Sequoia 15.3+) and Thunderbolt 4/5.

Q: Is RDMA available on all Macs? A: Currently requires Apple Silicon with Thunderbolt and the appropriate macOS version.

Q: Can I add more than 4 nodes? A: Exo Labs supports larger clusters, but Thunderbolt mesh topology becomes complex beyond 4 nodes.

Q: What about fine-tuning? A: This setup is optimized for inference. Fine-tuning workflows are still evolving for MLX.

Credits

Apple - For enabling RDMA over Thunderbolt
Exo Labs - For the clustering software
MLX Team - For the machine learning framework that makes this possible

License

MIT License - Feel free to use, modify, and share.

Built by NetworkChuck
YouTube • Twitter • Discord

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
scripts		scripts
.gitignore		.gitignore
BENCHMARKS.md		BENCHMARKS.md
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Mac Studio AI Cluster Guide

The Setup

Cost Comparison

Requirements

Hardware

Software

Network Topology

Thunderbolt Mesh Connection

Ethernet Connection

Setup Instructions

Step 1: Enable RDMA

Step 2: Install Exo Labs

Step 3: Connect Hardware

Step 4: Launch Cluster

Step 5: Verify Cluster

Understanding Parallelism

Pipeline Parallelism (Old Way)

Tensor Parallelism (New Way with RDMA)

Why RDMA Matters

Benchmarks

Model Performance (4-Node Cluster with RDMA)

Single Node vs Cluster

Pipeline vs Tensor (Same Cluster)

Running Multiple Models

Integration with Apps

Open WebUI

Xcode

Claude Code / Cursor / Continue

Troubleshooting

Nodes Not Discovering Each Other

Cluster Crashes Under Load

Model Loading Fails

Thunderbolt Connection Issues

Scripts

Power Consumption

Resources

Related Videos

FAQ

Credits

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages