[pull] main from huggingface:main#41
Open
pull[bot] wants to merge 416 commits into
Open
Conversation
|
Important Review skippedBot user detected. To trigger a single review, invoke the You can disable this status message by setting the 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Join our Discord community for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
* fixed clippy and new rust fmt * po3 clippy changes * updated workflow for PYO3 * tell maturin that version will be determined dynamically * changed names of maturin workflow to avoid naming conflicts when uploading wheels * switch to macos-13 bc ring crypto dependency not working on arm based macs * back to failing macOS-latest since we should be compiling against arm mac, and this error isn't our fault'
* qwen3 bugfix: compute rope freqs in f32 * qwen example: run model in bf16 on metal
* Support new arch of GLM4 models * Clippy fix & update ReadMe * Integrate old and new GLM4 into one example & fix eos and chat template bugs for old GLM4 * Remove either crate usage * clippy --------- Co-authored-by: keighbee <kb@huggingface.co>
* feat: added Elu operator * feat: added implementation of onehot * added test for one hot * feat: add handling negative indicies value in OneHot operator * lint * lint --------- Co-authored-by: misadowsk <michalsad.protondynamic@gmail.com> Co-authored-by: keighbee <kb@huggingface.co>
…rator logic - Re-exported the FileReader trait from the parquet crate so users can call methods like .metadata(), .num_row_groups(), etc., without needing to depend on parquet directly. - Simplified the from_hub iterator logic by replacing filter_map<Option<Result<_>>> with a clean filter + map + collect<Result<_, _>>() chain. - Added doc comments and example usage to clarify trait import and improve API ergonomics.
* feat: added Elu operator * feat: added selu implementation in candle-nn * feat: added selu onnx operator implementation * test: added test for selu onnx operator * test: added more tests for selu * deleted elu * tests: added test based on onnx specification * lint --------- Co-authored-by: misadowsk <michalsad.protondynamic@gmail.com> Co-authored-by: keighbee <kb@huggingface.co>
* Add GGUF BF16 support (#17) * Add GGUF bf16 type support * Add non avx impl for vec_dot_bf16 * Fix from_u32 * Fix loading * Fix dequant of bf16 * Update kernels for metal bf16 (#19) * Update kernels for metal bf16 * Fix typo * Check if have bfloat * Sync ggml metal kernels (#33) * Metal qmatmul mat-mat product (#39) * Test passes * All tests pass * Now all the tests really pass * Try out always using mm * Mirror llama.cpp metric * Mirror llama.cpp metric * Update test * Update test * fixed merge error --------- Co-authored-by: keighbee <kb@huggingface.co>
This change corrects an issue causing AVX2 intrinsics to be called on CPUs that do not support them. These intrinsics were conditionally compiled in behind an `avx` feature gate. This change correctly uses them only if `avx2` is available. This change should have no impact on modern CPUs, but it allows older CPUs to work properly using the unoptimized code path.
* feat: implement some configs in voxtral * fix: fixed imports, implement more func * feat: implemented full version, need fixes * fix: fixed some compile errors * feat: add initial examples * fix: fixed voxtral.rs * fix: fixed compile errors in examples * fix: fixed compile errors * fix: update model integration * First working example * Remove unused melfilters code * Remove unused code * Reuse whisper's pcm_decode * Simplify generation function * Remove unnecessary post-process fun * Reuse snac's resample * Apply clippy suggestions * Remove unused filters * Improve example * Update tekken-rs * Clippy fixes --------- Co-authored-by: Max <naturale@hufs.ac.kr>
* fp8 support * use float8 crate with cudarc 16.x, fix errors * fix Tensor::ones for fp8 * fp8: fix failing tests * more fp8 * add fp8 where bf16 is in tests * skip fp8 testing on metal * fixed onnx eval match statements that didn't have full coverage * Unused import backend::BackendDevice * kernels: fix cuda_arch guards for fp8 ops --------- Co-authored-by: keighbee <kb@huggingface.co>
- Change tensor b from [1,2] row vector to [2,1] column vector - Fix assertion to match expected result after column replacement - Resolves shape mismatch error that prevented example from running
* Add simple atomics to ulong via atomic_uintx2 struct * Remove u32::max restriction from metal device.set_seed
…3254) * add flash attn to qwen3 * feature flash-attn * flash generative true * no mask in cpu flash * add causal loop-bound optimization to cpu_flash_attention * attempt at hybrid * working tiling * working but poor performance tiled flash * factorized attention cpu flash * logging * causal cpu flash * depracation warning * depracation warning specific * formatting * interleaved will live in cpu_flash * clippy resolve * AttnMask: remove unnecessary lifetime * fix: flash-attn KV cache masking during decode * single batch CPU flash + integrate CPU varlen * fail B>1 my cpu flash * benchmark qwen3 * clean up * varlen force * dispatch logic smolLM3 * interleaved cpu * interleaved smollm3 * memory leak * back to exact exp * simplfy * cargo fmt * clippy * errors; fmt * cpu flash comment * remove bench script * merge main + resolve; default CPU flash (note that quantized varlen CPU flash not implemented) and remove --use-flash-attn CLI (note there is no path to standard attention for qwen3 and quantized qwen3) * debug cfg lazylock only * remove unused cli * programatic commentary address * proposed comments for internal use --------- Co-authored-by: michaelfeil <me@michaelfeil.eu> Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
* Concurrent metal dispatching with automatic memory barriers from read/write dependency detection * fmt
* Impl new inter-encoder scheduling/synchronization scheme using dependency tracking, fences, and untracked buffers * Add mlx gemv and gemv_t kernels * Fix docs * Simplify Commands::end_encoding * Avoid double blit end_encoding in QMetalStorage::data * Make candle-metal-kernels resource options match definition in candle-core
) Updates the requirements on [fancy-regex](https://github.com/fancy-regex/fancy-regex) to permit the latest version. - [Release notes](https://github.com/fancy-regex/fancy-regex/releases) - [Changelog](https://github.com/fancy-regex/fancy-regex/blob/main/CHANGELOG.md) - [Commits](fancy-regex/fancy-regex@0.17.0...0.18.0) --- updated-dependencies: - dependency-name: fancy-regex dependency-version: 0.18.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Updates the requirements on [parquet](https://github.com/apache/arrow-rs) to permit the latest version. - [Release notes](https://github.com/apache/arrow-rs/releases) - [Changelog](https://github.com/apache/arrow-rs/blob/main/CHANGELOG.md) - [Commits](apache/arrow-rs@57.0.0...58.1.0) --- updated-dependencies: - dependency-name: parquet dependency-version: 58.1.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
Updates the requirements on [cpal](https://github.com/RustAudio/cpal) to permit the latest version. - [Release notes](https://github.com/RustAudio/cpal/releases) - [Changelog](https://github.com/RustAudio/cpal/blob/master/CHANGELOG.md) - [Commits](RustAudio/cpal@v0.15.2...v0.17.3) --- updated-dependencies: - dependency-name: cpal dependency-version: 0.17.3 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
Updates the requirements on [yew](https://github.com/yewstack/yew) to permit the latest version. - [Release notes](https://github.com/yewstack/yew/releases) - [Changelog](https://github.com/yewstack/yew/blob/master/CHANGELOG.md) - [Commits](yewstack/yew@yew-v0.20.0...yew-v0.23.0) --- updated-dependencies: - dependency-name: yew dependency-version: 0.23.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
Updates the requirements on [zip](https://github.com/zip-rs/zip2) to permit the latest version. - [Release notes](https://github.com/zip-rs/zip2/releases) - [Changelog](https://github.com/zip-rs/zip2/blob/master/CHANGELOG.md) - [Commits](zip-rs/zip2@v7.2.0...v8.6.0) --- updated-dependencies: - dependency-name: zip dependency-version: 8.6.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
Mirrors the pattern already used in qwen3, gemma3, phi, smollm3, and several other transformer models. Lets callers free cached attention state between independent conversations without recreating the model.
* Fix FAv3 packing and symbol overlap * cuda: cache small htod copies for graph capture * Add CUDA graph HtoD cache guard
* Intiail fixes * Cleanup * Fixes
…g) (#3558) CUDA 12.x removed the software fallback for __hmax_nan/__hmin_nan on architectures below sm_80. The previous guard __CUDA_ARCH__ < 750 excluded exactly sm_75 (GTX 1650/1660/T4). Changed boundary to < 800 since these become native hardware instructions on sm_80 (Ampere) and above. Co-authored-by: 4aMgLn4 <aadaa88@wedatalab.com>
Updates the requirements on [symphonia](https://github.com/pdeljanov/Symphonia) to permit the latest version. - [Release notes](https://github.com/pdeljanov/Symphonia/releases) - [Commits](pdeljanov/Symphonia@v0.5.3...v0.6.0) --- updated-dependencies: - dependency-name: symphonia dependency-version: 0.6.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
Updates the requirements on [hf-hub](https://github.com/huggingface/hf-hub) to permit the latest version. - [Release notes](https://github.com/huggingface/hf-hub/releases) - [Changelog](https://github.com/huggingface/hf-hub/blob/main/RELEASE.md) - [Commits](huggingface/hf-hub@v0.4.1...v0.5.0) --- updated-dependencies: - dependency-name: hf-hub dependency-version: 0.5.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
Updates the requirements on [rubato](https://github.com/HEnquist/rubato) to permit the latest version. - [Release notes](https://github.com/HEnquist/rubato/releases) - [Commits](HEnquist/rubato@v1.0.0...v2.0.0) --- updated-dependencies: - dependency-name: rubato dependency-version: 2.0.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
* Add scalar support to metal binary kernels * Add Layout::is_scalar and is_scalar_like helpers. Let kernel name decide dispatch in metal binary
Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
* Add vec_dot benchmark * Improve neon cpu impl for f32/f16/bf16 Uses inline asm where rust API is unstable. When fp16/bf16 target features are missing we use load into f32 fallback. Slight improvements to generic vec_dot algorithm (easier for compiler to unroll/vectorize) * Simplify Cpu* simd trait abstractions * Add debug print to track down windows CI bug. Remove later * hail mary avx attempt without an avx machine * Fix vec_dot_f16/bf16 CurrentCpuF16/BF16::STEP usage * Temporarily break AVX CurrentCpuBF16 to investigate * bug confirmed, adding transmute * Use is_x86_feature_detected instead of cfg gate
* Cap allocations in the GGUF loader Fixes #3533. The loader passed caller-controlled length fields into allocation calls without bounds checks. Adds size caps matching ggml-org/llama.cpp#19856, a remaining-bytes check, a GGML_MAX_DIMS cap on tensor dimensions, and a recursion depth cap on Value::Array. * Avoid re-seeking on every GGUF length check Capture the file size once in Content::read and pass it through read_string/Value::read instead of seeking to the end and back on every length-prefixed read, which roughly doubled load time. --------- Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
) Updates the requirements on [enterpolation](https://github.com/NicolasKlenert/enterpolation) to permit the latest version. - [Release notes](https://github.com/NicolasKlenert/enterpolation/releases) - [Changelog](https://github.com/NicolasKlenert/enterpolation/blob/main/RELEASES.md) - [Commits](https://github.com/NicolasKlenert/enterpolation/commits/v0.3.0) --- updated-dependencies: - dependency-name: enterpolation dependency-version: 0.3.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
* chore(deps): update web-sys requirement from =0.3.70 to =0.3.99 Updates the requirements on [web-sys](https://github.com/wasm-bindgen/wasm-bindgen) to permit the latest version. - [Release notes](https://github.com/wasm-bindgen/wasm-bindgen/releases) - [Changelog](https://github.com/wasm-bindgen/wasm-bindgen/blob/main/CHANGELOG.md) - [Commits](https://github.com/wasm-bindgen/wasm-bindgen/commits) --- updated-dependencies: - dependency-name: web-sys dependency-version: 0.3.99 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * Use set_stroke_style_str over deprecated set_stroke_style --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Ivar Flakstad <69173633+ivarflakstad@users.noreply.github.com>
Updates the requirements on [gloo](https://github.com/rustwasm/gloo) to permit the latest version. - [Release notes](https://github.com/rustwasm/gloo/releases) - [Changelog](https://github.com/ranile/gloo/blob/master/CHANGELOG.md) - [Commits](https://github.com/rustwasm/gloo/commits) --- updated-dependencies: - dependency-name: gloo dependency-version: 0.11.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
* Add LFM2.5 (Liquid Foundation Model 2.5) support Add model implementation and example for LiquidAI's LFM2.5 hybrid architecture that combines attention and short convolution layers. Supports LFM2.5-1.2B and LFM2.5-1.2B-Thinking variants. * fix typo issue and crate::utils::build_causal_mask(seq_len, index_pos, device) as quantized_lfm2.rs * Apply rustfmt Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Fix clippy warnings (manual_div_ceil, large_enum_variant) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )