Skip to content

BUG REPORT: Inverse Boolean Logic in "Limit Model Offloading to GPU Dedicated Memory" Toggle affecting MoE Models #232

@jayvysson

Description

@jayvysson

System Environment:

  • LM Studio Version: 0.4.17
  • OS: Windows 10
  • Hardware: AMD Ryzen 7 5700X3D | NVIDIA RTX 5060 Ti (16GB VRAM) | 32GB RAM
  • Model Tested: Qwen 3.6 35B A3B (GGUF) with large context 64K

Description of the Bug & Visual Evidence:

There is an inverse boolean logic bug with the "Limit Model Offloading to GPU Dedicated Memory" toggle when running Qwen 3.6 35B A3B. The behavior of the switch is completely inverted compared to its visual label:

  • When ON (Toggle Enabled): VRAM caps at ~13.5 GB, offloading processing to the CPU (causing CPU usage to spike). Performance drops to ~23 tok/s.
Image
  • When OFF (Toggle Disabled): The backend correctly allocates maximum VRAM (~15.2 GB), keeping shared memory at 0 GB, and performance flies at +72 tok/s with low CPU usage.
Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions