Skip to content

[pull] main from huggingface:main#41

Open
pull[bot] wants to merge 416 commits into
EricLBuehler:mainfrom
huggingface:main
Open

[pull] main from huggingface:main#41
pull[bot] wants to merge 416 commits into
EricLBuehler:mainfrom
huggingface:main

Conversation

@pull

@pull pull Bot commented Nov 19, 2024

Copy link
Copy Markdown

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

@coderabbitai

coderabbitai Bot commented May 8, 2025

Copy link
Copy Markdown

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Join our Discord community for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

greenrazer and others added 26 commits June 26, 2025 15:11
* fixed clippy and new rust fmt

* po3 clippy changes

* updated workflow for PYO3

* tell maturin that version will be determined dynamically

* changed names of maturin workflow to avoid naming conflicts when uploading wheels

* switch to macos-13 bc ring crypto dependency not working on arm based macs

* back to failing macOS-latest since we should be compiling against arm mac, and this error isn't our fault'
* qwen3 bugfix: compute rope freqs in f32

* qwen example: run model in bf16 on metal
* Support new arch of GLM4 models

* Clippy fix & update ReadMe

* Integrate old and new GLM4 into one example & fix eos and chat template bugs for old GLM4

* Remove either crate usage

* clippy

---------

Co-authored-by: keighbee <kb@huggingface.co>
* feat: added Elu operator

* feat: added implementation of onehot

* added test for one hot

* feat: add handling negative indicies value in OneHot operator

* lint

* lint

---------

Co-authored-by: misadowsk <michalsad.protondynamic@gmail.com>
Co-authored-by: keighbee <kb@huggingface.co>
…rator logic

- Re-exported the FileReader trait from the parquet crate so users can call methods
  like .metadata(), .num_row_groups(), etc., without needing to depend on parquet directly.
- Simplified the from_hub iterator logic by replacing filter_map<Option<Result<_>>> with
  a clean filter + map + collect<Result<_, _>>() chain.
- Added doc comments and example usage to clarify trait import and improve API ergonomics.
* feat: added Elu operator

* feat: added selu implementation in candle-nn

* feat: added selu onnx operator implementation

* test: added test for selu onnx operator

* test: added more tests for selu

* deleted elu

* tests: added test based on onnx specification

* lint

---------

Co-authored-by: misadowsk <michalsad.protondynamic@gmail.com>
Co-authored-by: keighbee <kb@huggingface.co>
* Add GGUF BF16 support (#17)

* Add GGUF bf16 type support

* Add non avx impl for vec_dot_bf16

* Fix from_u32

* Fix loading

* Fix dequant of bf16

* Update kernels for metal bf16 (#19)

* Update kernels for metal bf16

* Fix typo

* Check if have bfloat

* Sync ggml metal kernels (#33)

* Metal qmatmul mat-mat product (#39)

* Test passes

* All tests pass

* Now all the tests really pass

* Try out always using mm

* Mirror llama.cpp metric

* Mirror llama.cpp metric

* Update test

* Update test

* fixed merge error

---------

Co-authored-by: keighbee <kb@huggingface.co>
This change corrects an issue causing AVX2 intrinsics to be called on
CPUs that do not support them. These intrinsics were conditionally
compiled in behind an `avx` feature gate. This change correctly uses
them only if `avx2` is available. This change should have no impact
on modern CPUs, but it allows older CPUs to work properly using the
unoptimized code path.
* feat: implement some configs in voxtral

* fix: fixed imports, implement more func

* feat: implemented full version, need fixes

* fix: fixed some compile errors

* feat: add initial examples

* fix: fixed voxtral.rs

* fix: fixed compile errors in examples

* fix: fixed compile errors

* fix: update model integration

* First working example

* Remove unused melfilters code

* Remove unused code

* Reuse whisper's pcm_decode

* Simplify generation function

* Remove unnecessary post-process fun

* Reuse snac's resample

* Apply clippy suggestions

* Remove unused filters

* Improve example

* Update tekken-rs

* Clippy fixes

---------

Co-authored-by: Max <naturale@hufs.ac.kr>
* fp8 support

* use float8 crate with cudarc 16.x, fix errors

* fix Tensor::ones for fp8

* fp8: fix failing tests

* more fp8

* add fp8 where bf16 is in tests

* skip fp8 testing on metal

* fixed onnx eval match statements that didn't have full coverage

* Unused import backend::BackendDevice

* kernels: fix cuda_arch guards for fp8 ops

---------

Co-authored-by: keighbee <kb@huggingface.co>
)

* Apply timestamp rules in whisper decoder and add support for maximum initial timestamp index

* Optimize mask generation in decoder by pre-allocating a reusable buffer

* Refactor timestamp probability calculations in decoder to use log-softmax for numerical stability
- Change tensor b from [1,2] row vector to [2,1] column vector
- Fix assertion to match expected result after column replacement
- Resolves shape mismatch error that prevented example from running
* Add simple atomics to ulong via atomic_uintx2 struct

* Remove u32::max restriction from metal device.set_seed
agerasev and others added 30 commits April 22, 2026 14:08
* GradStore: Allow user to merge them together and to create an empty one

* Remove unecessary ?

---------

Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
* fix paths and improve formatting for book (#1584)

* Comment  in book TOML (#1584)

---------

Co-authored-by: Timothy Cronin <tcroniniv@gmail.com>
Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
…3254)

* add flash attn to qwen3

* feature flash-attn

* flash generative true

* no mask in cpu flash

* add causal loop-bound optimization to cpu_flash_attention

* attempt at hybrid

* working tiling

* working but poor performance tiled flash

* factorized attention cpu flash

* logging

* causal cpu flash

* depracation warning

* depracation warning specific

* formatting

* interleaved will live in cpu_flash

* clippy resolve

* AttnMask: remove unnecessary lifetime

* fix: flash-attn KV cache masking during decode

* single batch CPU flash + integrate CPU varlen

* fail B>1 my cpu flash

* benchmark qwen3

* clean up

* varlen force

* dispatch logic smolLM3

* interleaved cpu

* interleaved smollm3

* memory leak

* back to exact exp

* simplfy

* cargo fmt

* clippy

* errors; fmt

* cpu flash comment

* remove bench script

* merge main + resolve; default CPU flash (note that quantized varlen CPU flash not implemented) and remove --use-flash-attn CLI (note there is no path to standard attention for qwen3 and quantized qwen3)

* debug cfg lazylock only

* remove unused cli

* programatic commentary address

* proposed comments for internal use

---------

Co-authored-by: michaelfeil <me@michaelfeil.eu>
Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
* Concurrent metal dispatching with automatic memory barriers from read/write dependency detection

* fmt
* Impl new inter-encoder scheduling/synchronization scheme using dependency tracking, fences, and untracked buffers

* Add mlx gemv and gemv_t kernels

* Fix docs

* Simplify Commands::end_encoding

* Avoid double blit end_encoding in QMetalStorage::data

* Make candle-metal-kernels resource options match definition in candle-core
)

Updates the requirements on [fancy-regex](https://github.com/fancy-regex/fancy-regex) to permit the latest version.
- [Release notes](https://github.com/fancy-regex/fancy-regex/releases)
- [Changelog](https://github.com/fancy-regex/fancy-regex/blob/main/CHANGELOG.md)
- [Commits](fancy-regex/fancy-regex@0.17.0...0.18.0)

---
updated-dependencies:
- dependency-name: fancy-regex
  dependency-version: 0.18.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Updates the requirements on [parquet](https://github.com/apache/arrow-rs) to permit the latest version.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/main/CHANGELOG.md)
- [Commits](apache/arrow-rs@57.0.0...58.1.0)

---
updated-dependencies:
- dependency-name: parquet
  dependency-version: 58.1.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
Updates the requirements on [cpal](https://github.com/RustAudio/cpal) to permit the latest version.
- [Release notes](https://github.com/RustAudio/cpal/releases)
- [Changelog](https://github.com/RustAudio/cpal/blob/master/CHANGELOG.md)
- [Commits](RustAudio/cpal@v0.15.2...v0.17.3)

---
updated-dependencies:
- dependency-name: cpal
  dependency-version: 0.17.3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
Updates the requirements on [yew](https://github.com/yewstack/yew) to permit the latest version.
- [Release notes](https://github.com/yewstack/yew/releases)
- [Changelog](https://github.com/yewstack/yew/blob/master/CHANGELOG.md)
- [Commits](yewstack/yew@yew-v0.20.0...yew-v0.23.0)

---
updated-dependencies:
- dependency-name: yew
  dependency-version: 0.23.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
Updates the requirements on [zip](https://github.com/zip-rs/zip2) to permit the latest version.
- [Release notes](https://github.com/zip-rs/zip2/releases)
- [Changelog](https://github.com/zip-rs/zip2/blob/master/CHANGELOG.md)
- [Commits](zip-rs/zip2@v7.2.0...v8.6.0)

---
updated-dependencies:
- dependency-name: zip
  dependency-version: 8.6.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
Mirrors the pattern already used in qwen3, gemma3, phi, smollm3, and
several other transformer models. Lets callers free cached attention
state between independent conversations without recreating the model.
* Fix FAv3 packing and symbol overlap

* cuda: cache small htod copies for graph capture

* Add CUDA graph HtoD cache guard
* Intiail fixes

* Cleanup

* Fixes
…g) (#3558)

CUDA 12.x removed the software fallback for __hmax_nan/__hmin_nan on
  architectures below sm_80. The previous guard __CUDA_ARCH__ < 750 excluded
  exactly sm_75 (GTX 1650/1660/T4). Changed boundary to < 800 since these
  become native hardware instructions on sm_80 (Ampere) and above.

Co-authored-by: 4aMgLn4 <aadaa88@wedatalab.com>
Updates the requirements on [symphonia](https://github.com/pdeljanov/Symphonia) to permit the latest version.
- [Release notes](https://github.com/pdeljanov/Symphonia/releases)
- [Commits](pdeljanov/Symphonia@v0.5.3...v0.6.0)

---
updated-dependencies:
- dependency-name: symphonia
  dependency-version: 0.6.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
Updates the requirements on [hf-hub](https://github.com/huggingface/hf-hub) to permit the latest version.
- [Release notes](https://github.com/huggingface/hf-hub/releases)
- [Changelog](https://github.com/huggingface/hf-hub/blob/main/RELEASE.md)
- [Commits](huggingface/hf-hub@v0.4.1...v0.5.0)

---
updated-dependencies:
- dependency-name: hf-hub
  dependency-version: 0.5.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
Updates the requirements on [rubato](https://github.com/HEnquist/rubato) to permit the latest version.
- [Release notes](https://github.com/HEnquist/rubato/releases)
- [Commits](HEnquist/rubato@v1.0.0...v2.0.0)

---
updated-dependencies:
- dependency-name: rubato
  dependency-version: 2.0.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
* Add scalar support to metal binary kernels

* Add Layout::is_scalar and is_scalar_like helpers. Let kernel name decide dispatch in metal binary
Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
* Add vec_dot benchmark

* Improve neon cpu impl for f32/f16/bf16

Uses inline asm where rust API is unstable.
When fp16/bf16 target features are missing we use load into f32 fallback.
Slight improvements to generic vec_dot algorithm (easier for compiler to unroll/vectorize)

* Simplify Cpu* simd trait abstractions

* Add debug print to track down windows CI bug. Remove later

* hail mary avx attempt without an avx machine

* Fix vec_dot_f16/bf16 CurrentCpuF16/BF16::STEP usage

* Temporarily break AVX CurrentCpuBF16 to investigate

* bug confirmed, adding transmute

* Use is_x86_feature_detected instead of cfg gate
* Cap allocations in the GGUF loader

Fixes #3533. The loader passed caller-controlled length fields into
allocation calls without bounds checks. Adds size caps matching
ggml-org/llama.cpp#19856, a remaining-bytes check, a GGML_MAX_DIMS
cap on tensor dimensions, and a recursion depth cap on Value::Array.

* Avoid re-seeking on every GGUF length check

Capture the file size once in Content::read and pass it through
read_string/Value::read instead of seeking to the end and back on
every length-prefixed read, which roughly doubled load time.

---------

Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
)

Updates the requirements on [enterpolation](https://github.com/NicolasKlenert/enterpolation) to permit the latest version.
- [Release notes](https://github.com/NicolasKlenert/enterpolation/releases)
- [Changelog](https://github.com/NicolasKlenert/enterpolation/blob/main/RELEASES.md)
- [Commits](https://github.com/NicolasKlenert/enterpolation/commits/v0.3.0)

---
updated-dependencies:
- dependency-name: enterpolation
  dependency-version: 0.3.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
* chore(deps): update web-sys requirement from =0.3.70 to =0.3.99

Updates the requirements on [web-sys](https://github.com/wasm-bindgen/wasm-bindgen) to permit the latest version.
- [Release notes](https://github.com/wasm-bindgen/wasm-bindgen/releases)
- [Changelog](https://github.com/wasm-bindgen/wasm-bindgen/blob/main/CHANGELOG.md)
- [Commits](https://github.com/wasm-bindgen/wasm-bindgen/commits)

---
updated-dependencies:
- dependency-name: web-sys
  dependency-version: 0.3.99
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* Use set_stroke_style_str over deprecated set_stroke_style

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Ivar Flakstad <69173633+ivarflakstad@users.noreply.github.com>
Updates the requirements on [gloo](https://github.com/rustwasm/gloo) to permit the latest version.
- [Release notes](https://github.com/rustwasm/gloo/releases)
- [Changelog](https://github.com/ranile/gloo/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rustwasm/gloo/commits)

---
updated-dependencies:
- dependency-name: gloo
  dependency-version: 0.11.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
* Add LFM2.5 (Liquid Foundation Model 2.5) support

Add model implementation and example for LiquidAI's LFM2.5 hybrid
architecture that combines attention and short convolution layers.

Supports LFM2.5-1.2B and LFM2.5-1.2B-Thinking variants.

* fix typo issue and crate::utils::build_causal_mask(seq_len, index_pos, device) as quantized_lfm2.rs

* Apply rustfmt

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Fix clippy warnings (manual_div_ceil, large_enum_variant)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⤵️ pull merge-conflict Resolve conflicts manually

Projects

None yet

Development

Successfully merging this pull request may close these issues.