Skip to content

cli: fix missing enumeration in switch#2

Merged
Acly merged 20 commits into
mainfrom
dev
Aug 13, 2025
Merged

cli: fix missing enumeration in switch#2
Acly merged 20 commits into
mainfrom
dev

Conversation

@Acly
Copy link
Copy Markdown
Owner

@Acly Acly commented Aug 1, 2025

No description provided.

Acly added 18 commits August 1, 2025 10:33
…o gguf

* introduce `model_file` to read key-value data from gguf files
* conditionally set `cwhn` flag based on gguf tensor data layout
migan: can now run in cwhn and whcn mode (but cwhn remains faster in all cases)
* convert weights on CPU after model load
* whcn is slower on both cpu or vulkan
* whcn is more correct on vulkan, likely there is a bug in cwhn version of conv2d/deform
…meter

* even though almost all tests use cwhn right now, whcn is the default in both ggml and pytorch.
* it's only available for WHCN layout for now, but faster than anything else
* WHCN for model weights
* WHCN for Vulkan
* CWHN for CPU (converted at model load)
* probably CWHN version of birefnet is still somewhat broken, but since WHCN doesn't have the issue and is way faster there's not much incentive atm to fix
* times old -> new / new with coopmat2
* birefnet: 268ms -> 315ms / 243ms
* birefnet-lite: 109ms -> 119 / 87ms
* deform conv2d is a bit slower without coopmat2 support, but also requires much less vram, so still worth it
@Acly Acly merged commit ddeea58 into main Aug 13, 2025
3 checks passed
dbrain added a commit to dbrain/hbd-vision.cpp that referenced this pull request May 30, 2026
…s, OpenBLAS slower)

Documents an honest negative result for the zero-quality-loss CPU speedup
attempt: F16-weights-on-CPU segfaults at load (cwhn layout + CPU ops assume
F32), OpenBLAS measures ~2% slower than ggml's AVX2 MMQ on Zen3, and the 2x
encoder / CONT levers have no lossless redundancy to remove. CPU floor stays
~9.9s @ YAVG 180.974. No source change kept; ggml untouched (af69870c).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant