Skip to content

Wire mlx-lm Vulkan fused kernels#52

Merged
goniz merged 3 commits into
mainfrom
feat/mlxlm
Jun 4, 2026
Merged

Wire mlx-lm Vulkan fused kernels#52
goniz merged 3 commits into
mainfrom
feat/mlxlm

Conversation

@goniz

@goniz goniz commented Jun 1, 2026

Copy link
Copy Markdown
Owner

Summary

Closes #51.

This PR wires the Vulkan fused-kernel work into the parent mlx-vulkan workspace:

Related PRs:

Validation

Run from this parent mlx-vulkan workspace:

  • ./dev.sh test-cpp
    • 249 test cases passed
    • 3314 assertions passed
  • ./dev.sh test-py
    • 699 passed
    • 13 skipped
  • ./dev.sh generate
    • coherent output
  • ./dev.sh generate --model mlx-community/Qwen3.6-35B-A3B-8bit
    • coherent output
  • ./dev.sh run python scripts/model_generation_report.py --json-output /tmp/model_generation_report.json
    • all 10 model entries generated coherent output

Benchmarks

./dev.sh benchmark bf16

Running benchmark for model: mlx-community/Qwen3-0.6B-bf16
Running warmup..
Timing with prompt_tokens=4096, generation_tokens=128, batch_size=1.
Trial 1:  prompt_tps=2433.556, generation_tps=65.807, peak_memory=2.614, total_time=3.732
Trial 2:  prompt_tps=2419.242, generation_tps=65.866, peak_memory=2.614, total_time=3.746
Trial 3:  prompt_tps=2432.330, generation_tps=65.850, peak_memory=2.614, total_time=3.735
Trial 4:  prompt_tps=2425.489, generation_tps=65.864, peak_memory=2.615, total_time=3.740
Trial 5:  prompt_tps=2386.391, generation_tps=65.856, peak_memory=2.615, total_time=3.768
Averages: prompt_tps=2419.402, generation_tps=65.849, peak_memory=2.614

./dev.sh benchmark 8bit

Running benchmark for model: mlx-community/Qwen3-0.6B-8bit
Running warmup..
Timing with prompt_tokens=4096, generation_tokens=128, batch_size=1.
Trial 1:  prompt_tps=1349.699, generation_tps=88.834, peak_memory=2.055, total_time=4.586
Trial 2:  prompt_tps=1349.641, generation_tps=88.652, peak_memory=2.055, total_time=4.585
Trial 3:  prompt_tps=1345.601, generation_tps=88.767, peak_memory=2.056, total_time=4.589
Trial 4:  prompt_tps=1353.543, generation_tps=88.747, peak_memory=2.056, total_time=4.573
Trial 5:  prompt_tps=1343.560, generation_tps=88.527, peak_memory=2.056, total_time=4.600
Averages: prompt_tps=1348.409, generation_tps=88.705, peak_memory=2.056

Additional Qwen3.6 35B prefill check:

MLX_MPI_LIBNAME=/dev/null OMPI_MCA_accelerator=^rocm ./dev.sh run mlx_lm.benchmark --model mlx-community/Qwen3.6-35B-A3B-8bit -p 4096 -g 1 -n 3
Averages: prompt_tps=119.139, generation_tps=24232.064, peak_memory=40.349

@goniz goniz marked this pull request as ready for review June 1, 2026 18:48
@goniz goniz merged commit c0c3da7 into main Jun 4, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Track Vulkan fused fast kernels for Qwen3.6 performance

1 participant