Skip to content

chore: bump llama.cpp to b9557#17

Open
github-actions[bot] wants to merge 1 commit into
mainfrom
automation/bump-llama-cpp
Open

chore: bump llama.cpp to b9557#17
github-actions[bot] wants to merge 1 commit into
mainfrom
automation/bump-llama-cpp

Conversation

@github-actions

@github-actions github-actions Bot commented May 19, 2026

Copy link
Copy Markdown

llama.cpp update

Upstream changelog

Release notes for b9557
Details

cuda: reset cuda context after reading memory size (#23935)

  • cuda: reset device in get_memory function if no backend is active

  • also count device and host buffers

  • exclude hip and musa from counting and device reset

  • use device mutex instead of atomic

  • undo backend_free function move

macOS/iOS:

Linux:

Android:

Windows:

openEuler:

  • DISABLED
  • openEuler x86 (310p)
  • openEuler x86 (910b, ACL Graph)
  • openEuler aarch64 (310p)
  • openEuler aarch64 (910b, ACL Graph)

UI:

Commit range

Commits from b9165 to b9557 (first 80)
  • ci : move most slim jobs to self-hosted runners (#23619) (28123a3)
  • perplexity : fix even more integer overflows (#23623) (6d57c26)
  • server: fix checkpoints creation (#22929) (e2ef8fe)
  • vendor : update cpp-httplib to 0.45.1 (#23639) (9627d0f)
  • ui: media attachments before text (#23467) (b964876)
  • ggml : Parallelize quant LUT init (#23595) (826539c)
  • ci : install host compiler on android-ndk build (#23630) (d55fb97)
  • llama : document that only one on-device state can be saved per sequence (#23520) (314e729)
  • ci : fix pre-tokenizer-hashes check (#23651) (062d311)
  • ci : update spacemit toolchain url and enhance curl command (#23642) (5fdf07e)
  • server: MTP layer kv-cache should respect draft type ctk (#23646) (6c4cbdc)
  • ggml: gguf_init_from_callback and gguf_init_from_buffer (#22341) (66efd13)
  • TP: fix ggml context size calculation (#22616) (ae251b5)
  • ggml-alloc: fix out-of-bounds read in ggml_dyn_tallocr_remove_block (ggml/1492) (fa97041)
  • ggml.h: correct ggml_silu_back arg docstring (a=dy, b=x) (ggml/1500) (b251f74)
  • ggml : bump version to 0.12.1 (ggml/1508) (ce5890b)
  • sync : ggml (22307b3)
  • ggml : bump version to 0.13.0 (ggml/1510) (45158f4)
  • sync : ggml (d161ea7)
  • convert : add compressed-tensors NVFP4 support (#21095) (a4d2d4a)
  • ui: fix stop/continue during an agentic loop (#23356) (5a4126a)
  • CUDA: add fast walsh-hadamard transform (#23615) (c1f1e28)
  • model: tag ffn_latent as MUL_MAT to fix buft probe (#23664) (328874d)
  • ci : reduce PR jobs by matching backend paths (#23675) (302e2c2)
  • snapdragon: bump toolchain docker to v0.7 to fix ui build issues (#23680) (4bead4e)
  • metal : add apple device id (#23566) (35c9b1f)
  • CUDA: missing PDL sync for FWHT, better fallback (#23690) (192d8ae)
  • [WebGPU] Check batch_compute_passes before sending passes when not doing GPU profiling (#23457) (54121f7)
  • ggml-webgpu: Add MMVQ path for Q4/Q8/Q2_K/Q4_K and clean up legacy MUL_MAT pipeline (#23594) (1506d39)
  • model : add support for talkie-1930-13b (#22596) (c9d9829)
  • tests: test-backend-ops -j to run tests in parallel (#23637) (7623de1)
  • SYCL: implement ggml_sycl_pool_vmm (#22862) (581d020)
  • models : Attach Mistral3 NVFP4 weight scales (#23629) (6fe90de)
  • convert : support Gemma4ForCausalLM architecture (#23682) (dbe9c0c)
  • ci : reduce (disable SYCL and CANN builds/releases) (#23705) (3dc7684)
  • ci : move sanitizer jobs to self-hosted runners (#23713) (ef41a69)
  • ci : move more CPU jobs to self-hosted runners (#23715) (678d43d)
  • hexagon: add support for CONCAT op (#23648) (ef66bfa)
  • ci : remove vulkan SDK dep from webgpu job (#23718) (3a3ed15)
  • vulkan: optimize conv2d and implement coopmat1 support (#22620) (7799d31)
  • ci : move macos jobs to the apple workflow + fix names (#23721) (5190c2e)
  • ci : add [no release] keyword + fix sanitizer builds (#23728) (35a74c8)
  • ci : move [no release] check to dedicated check_release job (#23734) (08bc21b)
  • ci : do not allocate ccache for 3rd-party hosted runners (#23730) (0d18aaa)
  • ggml-zendnn : fixed naming of matmul function (#20964) (b4c0549)
  • server : fix the log message when using SSL (#23393) (7085492)
  • convert: add MiniCPM5 tokenizer support (#23384) (9777256)
  • docs : fix duplicated "the" in granitevision and model-conversion docs (#23767) (1d971bb)
  • ci : add ccache to server builds + fix undefined sanitizer build (#23763) (0d227ec)
  • vulkan: avoid preferring transfer queue on AMD UMA devices (#22455) (4d8cc0c)
  • ci : remove wasm test (#23733) (b3a739c)
  • ci : fix windows ccaches (#23777) (9f0e4b1)
  • common : fix env names to all have LLAMA_ARG_ prefix (#23778) (6b4e4bd)
  • ci : bump cuda release to 13.3 (#23749) (2d0656f)
  • CUDA: restrict PDL to CTK >= 12.3 due to MSVC issues (#23742) (fda8528)
  • pyproject : add conversion folder and update dependencies (#23746) (87b0a60)
  • vendor : update cpp-httplib to 0.46.0 (#23650) (617255d)
  • ci : move ARM jobs to self-hosted + disable kleidiai mac release (#23780) (ba4dd0b)
  • vulkan: add REPEAT op support for f16 to f16. (#23298) (837bb6b)
  • vulkan: use GL_NV_cooperative_matrix_decode_vector for faster matmul (#23541) (b36eefc)
  • vulkan: Switch MUL_MAT_VEC to 4 K per iteration for F16/32 (#22887) (c6e4088)
  • ggml-webgpu: Fix how to dispatch WG to some ops (#23750) (c40006a)
  • hexagon: add support for Q4_1 in MUL_MAT and MUL_MAT_ID (#23647) (aa50b2c)
  • ggml-webgpu: remove legacy constants (#23672) (f12cc6d)
  • opencl: OP_GATED_DELTA_NET (#23312) (8ad8aef)
  • Hexagon: OP_GATED_DELTA_NET K>1 support (#23531) (939a7dd)
  • ci : refactor (#23789) (491c4d7)
  • ggml: fixed Arm SVE usage bug in vec.h, vec.cpp (#22841) (e31cdaa)
  • convert : add FP8 to Q8 conversion (#23250) (c522908)
  • perplexity : fix format specifier in LOG_ERR (#23788) (48e7eae)
  • cuda : fix KQ mask offset integer overflow in fattn MMA kernel (#23610) (09e7b76)
  • docker : add ZenDNN Dockerfile (#23716) (e8d2567)
  • server, ui : Add support for HTTP ETags in llama-server (#23701) (d205df6)
  • vulkan: Fix memory logger unsafe iterator access (#23667) (91eb8f4)
  • vulkan: fix wrong index variable in inner loop (#23665) (7c48fb8)
  • chat : add Granite 4.1 chat template (#23518) (bb771cb)
  • vulkan: fast path for walsh-hadamard transform (#23687) (48e7078)
  • hexagon: minor refresh for HMX FA and MM (#23796) (a919001)
  • server: minor tweaks to use more cpp features (#23785) (0b24686)
  • CUDA: route batch>=4 quantized matmul to MMQ on AMD MFMA hardware (#23227) (bc81d47)

Web bridge review focus

Please pay extra attention to upstream changes touching:

  • WebGPU, WASM, Emscripten, pthreads, or memory64 build behavior
  • ggml backend APIs used by the bridge
  • model loading, tokenizer, chat template, context/state persistence, or cache semantics
  • CMake/build flags that can affect the generated JS/WASM artifacts

Validation

  • Emscripten build passed
  • Browser WebGPU/state-persistence smoke passed
  • Generated bridge artifacts include wasm32 and memory64 outputs
  • No stale hard-coded llama.cpp tag remains in CI/publish defaults

Automation behavior

This PR is managed from the stable branch automation/bump-llama-cpp. If another llama.cpp release appears before merge, the scheduled workflow updates this same PR instead of opening a duplicate. The workflow skips if a non-automation PR already changes llama_cpp.version.

@github-actions github-actions Bot force-pushed the automation/bump-llama-cpp branch from c374d7d to b0e1e3f Compare May 19, 2026 13:32
@github-actions github-actions Bot force-pushed the automation/bump-llama-cpp branch from b0e1e3f to dcacf23 Compare May 20, 2026 12:39
@github-actions github-actions Bot changed the title chore: bump llama.cpp to b9222 chore: bump llama.cpp to b9247 May 20, 2026
@github-actions github-actions Bot changed the title chore: bump llama.cpp to b9247 chore: bump llama.cpp to b9264 May 21, 2026
@github-actions github-actions Bot force-pushed the automation/bump-llama-cpp branch from dcacf23 to d82afc2 Compare May 21, 2026 13:32
@github-actions github-actions Bot changed the title chore: bump llama.cpp to b9264 chore: bump llama.cpp to b9279 May 22, 2026
@github-actions github-actions Bot force-pushed the automation/bump-llama-cpp branch from d82afc2 to 74a6dbd Compare May 22, 2026 12:35
@github-actions github-actions Bot changed the title chore: bump llama.cpp to b9279 chore: bump llama.cpp to b9310 May 25, 2026
@github-actions github-actions Bot force-pushed the automation/bump-llama-cpp branch from 74a6dbd to 56845d4 Compare May 25, 2026 13:43
@github-actions github-actions Bot changed the title chore: bump llama.cpp to b9310 chore: bump llama.cpp to b9360 May 27, 2026
@github-actions github-actions Bot force-pushed the automation/bump-llama-cpp branch from 56845d4 to a8ccf0f Compare May 27, 2026 13:49
@github-actions github-actions Bot changed the title chore: bump llama.cpp to b9360 chore: bump llama.cpp to b9374 May 28, 2026
@github-actions github-actions Bot force-pushed the automation/bump-llama-cpp branch from a8ccf0f to c6e61ba Compare May 28, 2026 14:06
@github-actions github-actions Bot changed the title chore: bump llama.cpp to b9374 chore: bump llama.cpp to b9406 May 29, 2026
@github-actions github-actions Bot force-pushed the automation/bump-llama-cpp branch from c6e61ba to d5f6ea3 Compare May 29, 2026 13:33
@github-actions github-actions Bot changed the title chore: bump llama.cpp to b9406 chore: bump llama.cpp to b9453 Jun 1, 2026
@github-actions github-actions Bot force-pushed the automation/bump-llama-cpp branch from d5f6ea3 to 7dd05aa Compare June 1, 2026 16:20
@github-actions github-actions Bot changed the title chore: bump llama.cpp to b9453 chore: bump llama.cpp to b9479 Jun 2, 2026
@github-actions github-actions Bot force-pushed the automation/bump-llama-cpp branch 2 times, most recently from df4139e to 414160e Compare June 3, 2026 15:05
@github-actions github-actions Bot changed the title chore: bump llama.cpp to b9479 chore: bump llama.cpp to b9491 Jun 3, 2026
@github-actions github-actions Bot force-pushed the automation/bump-llama-cpp branch from 414160e to 4f6bee8 Compare June 4, 2026 13:32
@github-actions github-actions Bot changed the title chore: bump llama.cpp to b9491 chore: bump llama.cpp to b9505 Jun 4, 2026
@github-actions github-actions Bot force-pushed the automation/bump-llama-cpp branch from 4f6bee8 to 91ddc1e Compare June 5, 2026 13:25
@github-actions github-actions Bot changed the title chore: bump llama.cpp to b9505 chore: bump llama.cpp to b9528 Jun 5, 2026
@github-actions github-actions Bot force-pushed the automation/bump-llama-cpp branch from 91ddc1e to cdb5b75 Compare June 8, 2026 14:31
@github-actions github-actions Bot changed the title chore: bump llama.cpp to b9528 chore: bump llama.cpp to b9557 Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant