I use the following command to run llama.cpp:
llama-server --batch-size 4096 --ubatch-size 4096 \
--model ~/Downloads/Models/Qwen3.6-35B-A3B-BF16-00001-of-00002.gguf \
--mmproj ~/Downloads/Models/Qwen3.6-35B-A3B-mmproj-BF16.gguf \
--alias qwen3.6:35b \
--chat-template-kwargs '{"preserve_thinking": true}' \
--kv-unified --cache-type-k q8_0 --cache-type-v q8_0 \
--flash-attn on --fit on \
--spec-default \
--temp 0.60 --top-p 0.95 --top-k 20 --min-p 0.00 --presence-penalty 0.00 --repeat-penalty 1.00 \
--n-gpu-layers all \
--offline --grammar-file structured_cot_code.gbnf --reasoning-format none
And in codex, it seems not work:

I use the following command to run llama.cpp:
llama-server --batch-size 4096 --ubatch-size 4096 \ --model ~/Downloads/Models/Qwen3.6-35B-A3B-BF16-00001-of-00002.gguf \ --mmproj ~/Downloads/Models/Qwen3.6-35B-A3B-mmproj-BF16.gguf \ --alias qwen3.6:35b \ --chat-template-kwargs '{"preserve_thinking": true}' \ --kv-unified --cache-type-k q8_0 --cache-type-v q8_0 \ --flash-attn on --fit on \ --spec-default \ --temp 0.60 --top-p 0.95 --top-k 20 --min-p 0.00 --presence-penalty 0.00 --repeat-penalty 1.00 \ --n-gpu-layers all \ --offline --grammar-file structured_cot_code.gbnf --reasoning-format noneAnd in codex, it seems not work: