Wire mlx-lm Vulkan fused kernels by goniz · Pull Request #52 · goniz/mlx-vulkan

goniz · 2026-06-01T18:47:17Z

Summary

Closes #51.

This PR wires the Vulkan fused-kernel work into the parent mlx-vulkan workspace:

Adds the mlx-lm submodule at goniz/mlx-lm.
Updates the mlx submodule to include Vulkan fused fast-kernel/API support from Add Vulkan fused fast kernel paths mlx#50.
Updates the mlx-lm submodule to include the Vulkan gated-delta implementation from Add Vulkan gated delta kernel mlx-lm#1.
Adjusts the workflow/dev script paths for the new mlx-lm submodule layout.

Related PRs:

Validation

Run from this parent mlx-vulkan workspace:

./dev.sh test-cpp
- 249 test cases passed
- 3314 assertions passed
./dev.sh test-py
- 699 passed
- 13 skipped
./dev.sh generate
- coherent output
./dev.sh generate --model mlx-community/Qwen3.6-35B-A3B-8bit
- coherent output
./dev.sh run python scripts/model_generation_report.py --json-output /tmp/model_generation_report.json
- all 10 model entries generated coherent output

Benchmarks

`./dev.sh benchmark bf16`

Running benchmark for model: mlx-community/Qwen3-0.6B-bf16
Running warmup..
Timing with prompt_tokens=4096, generation_tokens=128, batch_size=1.
Trial 1:  prompt_tps=2433.556, generation_tps=65.807, peak_memory=2.614, total_time=3.732
Trial 2:  prompt_tps=2419.242, generation_tps=65.866, peak_memory=2.614, total_time=3.746
Trial 3:  prompt_tps=2432.330, generation_tps=65.850, peak_memory=2.614, total_time=3.735
Trial 4:  prompt_tps=2425.489, generation_tps=65.864, peak_memory=2.615, total_time=3.740
Trial 5:  prompt_tps=2386.391, generation_tps=65.856, peak_memory=2.615, total_time=3.768
Averages: prompt_tps=2419.402, generation_tps=65.849, peak_memory=2.614

`./dev.sh benchmark 8bit`

Running benchmark for model: mlx-community/Qwen3-0.6B-8bit
Running warmup..
Timing with prompt_tokens=4096, generation_tokens=128, batch_size=1.
Trial 1:  prompt_tps=1349.699, generation_tps=88.834, peak_memory=2.055, total_time=4.586
Trial 2:  prompt_tps=1349.641, generation_tps=88.652, peak_memory=2.055, total_time=4.585
Trial 3:  prompt_tps=1345.601, generation_tps=88.767, peak_memory=2.056, total_time=4.589
Trial 4:  prompt_tps=1353.543, generation_tps=88.747, peak_memory=2.056, total_time=4.573
Trial 5:  prompt_tps=1343.560, generation_tps=88.527, peak_memory=2.056, total_time=4.600
Averages: prompt_tps=1348.409, generation_tps=88.705, peak_memory=2.056

Additional Qwen3.6 35B prefill check:

MLX_MPI_LIBNAME=/dev/null OMPI_MCA_accelerator=^rocm ./dev.sh run mlx_lm.benchmark --model mlx-community/Qwen3.6-35B-A3B-8bit -p 4096 -g 1 -n 3
Averages: prompt_tps=119.139, generation_tps=24232.064, peak_memory=40.349

…mlxlm

goniz added 2 commits May 31, 2026 23:55

added mlx-lm submodule

c58e2b6

updated mlx

2925621

goniz marked this pull request as ready for review June 1, 2026 18:48

Merge branch 'main' of https://github.com/goniz/mlx-vulkan into feat/…

008b440

…mlxlm

goniz merged commit c0c3da7 into main Jun 4, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Wire mlx-lm Vulkan fused kernels#52

Wire mlx-lm Vulkan fused kernels#52
goniz merged 3 commits into
mainfrom
feat/mlxlm

goniz commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

goniz commented Jun 1, 2026

Summary

Validation

Benchmarks

./dev.sh benchmark bf16

./dev.sh benchmark 8bit

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`./dev.sh benchmark bf16`

`./dev.sh benchmark 8bit`