Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
416 commits
Select commit Hold shift + click to select a range
a6e8aae
fixed errors with hardswish merge (#3006)
greenrazer Jun 26, 2025
0cd4fc4
Fixed Failing CI (#3007)
greenrazer Jun 26, 2025
ab14581
Qwen3: fix quality loss due to rope freq precision (#3005)
zackangelo Jun 26, 2025
d0a3b33
fixed ring mac error (#3008)
greenrazer Jun 27, 2025
317a3ae
Support new arch of GLM4 models (#2991)
guoqingbao Jul 7, 2025
be411aa
candle-onnx: Implement One Hot operator (#2979)
Michall00 Jul 7, 2025
9c8a02f
fix (candle-datasets): re-export FileReader and simplify from_hub ite…
xavierforge Jul 16, 2025
16b7b77
candle-datasets: add fashion-mnist (#3021)
slckl Jul 16, 2025
1f07074
candle-onnx: Implement Selu operator (#2978)
Michall00 Jul 16, 2025
6c95317
fix: DAC model prefix (#3020)
piedshag Jul 17, 2025
1ef1341
*Major T/s improvement* Use the Metal qmatmul MM kernels (#2615)
EricLBuehler Jul 18, 2025
42bd33e
Fix discord badge (#3033)
strickvl Jul 23, 2025
da5498c
Added GradStore::insert_id(id, grad)
EthanAlmloff Jul 29, 2025
26a3222
Support building on CPUs with AVX but not AVX2 (#3040)
jncraton Jul 31, 2025
21032cb
[FEAT] Voxtral Support (#3036)
jorge-menjivar Aug 4, 2025
96415a4
ignored url that was interpreted as a secret by trufflehog (#3046)
greenrazer Aug 4, 2025
af5a69e
fp8 support (#2989)
zackangelo Aug 4, 2025
86bcf1e
Load safetensors i8 (#3042)
chadvoegele Aug 5, 2025
1829812
Fix sort kernel launch bug when nrows exceed gridDim.y limit (65535) …
guoqingbao Aug 11, 2025
be4f920
clippy fixes (#3053)
greenrazer Aug 12, 2025
d7c5c8a
Add timestamp rules and constraints to decoder in Whisper example (#3…
rsb-tbg Aug 18, 2025
f1286e6
Fix wasm build by enabling getrandom wasm_js backend (#3055)
lucky-bai Aug 18, 2025
16e1d73
pick seed <= u32::MAX when using metal (#3045)
rgbkrk Aug 20, 2025
730fa9c
Fix broken slice_scatter example in basics.rs
davenpi Aug 21, 2025
5d6407f
Run cargo fmt on basics.rs
davenpi Aug 22, 2025
98c64c0
Metal device.set_seed full u64 support (#3067)
ivarflakstad Aug 25, 2025
03e9ce0
disable affine fp8 bench on metal as it is not supported yet (#3065)
ivarflakstad Aug 25, 2025
02cf3eb
Bench using chosen device only (#3066)
ivarflakstad Aug 26, 2025
fd350c4
Fixes metal randn determinism. Ensure we use the 2 atomic_uints buffe…
ivarflakstad Aug 27, 2025
bf82629
build: Make build.rs candle-kernels compatible with Nix and sandboxed…
joeldsouzax Aug 28, 2025
06387ae
[Metal] update to objc2_metal (#3064)
ivarflakstad Aug 29, 2025
d4a9179
Fused CPU attention kernels (~4x performance increase) (#2973)
EricLBuehler Aug 29, 2025
41b1e95
Fix typos
szepeviktor Aug 30, 2025
93845ed
Merge pull request #3072 from szepeviktor/typos
ivarflakstad Aug 30, 2025
390b87a
Fix iOS app store validation issues (#3071)
greenrazer Sep 3, 2025
402782c
Merge pull request #3038 from NoodlesOfWrath/gradstore_insert_id
ivarflakstad Sep 6, 2025
f62e725
clean candle-core typos.
zhanluxianshen Sep 7, 2025
0bbf9c7
Ensure metal tensors are send/sync via thread isolated command buffer…
ivarflakstad Sep 8, 2025
3b35cfc
Update kv_cache.rs (#3035)
jhqxxx Sep 8, 2025
0cf516d
[Metal] Refactor (#3070)
ivarflakstad Sep 8, 2025
87fadf6
Merge pull request #3077 from zhanluxianshen/typo-candle-core
ivarflakstad Sep 8, 2025
0950959
Fix metal exports (#3081)
ivarflakstad Sep 8, 2025
a7fbc63
Merge branch 'main' into metal-tensor-fix-send-sync
ivarflakstad Sep 9, 2025
65055f6
Merge pull request #3079 from huggingface/metal-tensor-fix-send-sync
ivarflakstad Sep 9, 2025
b1dbce0
Merge pull request #3062 from davenpi/fix/core-basics-example
ivarflakstad Sep 9, 2025
8045af9
Add CUDA 13 support (#3078)
jfernandez Sep 9, 2025
97594d2
Fix indentation
ivarflakstad Sep 9, 2025
038e28b
Fix indentation (ok but for real)
ivarflakstad Sep 9, 2025
372c9cf
Merge pull request #2937 from ljt019/fix-phi3-kv-cache-reset
ivarflakstad Sep 9, 2025
41a674c
add impl for mish activation function (#3051)
oa-root Sep 12, 2025
dd12467
Upgrade ug dep for CUDA 13 support
grahamking Sep 18, 2025
1a699fb
Merge pull request #3089 from grahamking/main
ivarflakstad Sep 20, 2025
ec3d92e
Various minor improvements, some suggested by clippy
ivarflakstad Sep 22, 2025
f583891
Merge pull request #3023 from xavierforge/bug/metadata-method-not-found
ivarflakstad Sep 22, 2025
944947a
Add command buffer thread map. Remove unecessary failure points
ivarflakstad Sep 30, 2025
b06d2fd
Merge pull request #3092 from huggingface/metal-clippy-fixes
ivarflakstad Sep 30, 2025
bc13c4b
Merge branch 'main' into improve-metal-command-buffer-map
ivarflakstad Sep 30, 2025
d205fb4
Fix multiple clippy warnings (#3101)
ivarflakstad Sep 30, 2025
d16eaf5
Merge branch 'main' into improve-metal-command-buffer-map
ivarflakstad Oct 1, 2025
7bfc5af
Wait until completed on command buffer status: scheduled as well
ivarflakstad Oct 1, 2025
df50343
Add metal conv for more dtypes
ivarflakstad Oct 2, 2025
c16785b
Allow based to run with bf16 on metal
ivarflakstad Oct 2, 2025
26c7868
Add backtracing to metal kernel errors for clarity
ivarflakstad Oct 2, 2025
7c5a8f2
Merge pull request #3103 from huggingface/metal-fix-conv
ivarflakstad Oct 2, 2025
e3fd0da
bump gemm dependency to 0.18.2 to match ug
slckl Oct 2, 2025
0ad167d
Merge pull request #3100 from huggingface/improve-metal-command-buffe…
ivarflakstad Oct 3, 2025
58811e8
Merge pull request #3105 from slckl/gemm-bump
ivarflakstad Oct 3, 2025
e677576
[Metal] Buffer improvements (#3093)
ivarflakstad Oct 3, 2025
a708b7a
Various quantization improvements. Direct copy. Verified block sizes.…
ivarflakstad Oct 3, 2025
742dfef
make cuda benches run again (#3111)
slckl Oct 4, 2025
9b476b2
Capture command buffer errors if they exist (#3106)
ivarflakstad Oct 4, 2025
716e126
[Metal] Improve wait_for_completed command buffers locking (#3107)
ivarflakstad Oct 4, 2025
671de1d
Skip unsupported quantized matmul tests for metal (#3115)
ivarflakstad Oct 5, 2025
bcc34bc
Fix beit on metal by adding additional affine implementations (#3116)
ivarflakstad Oct 6, 2025
a1350d6
Rough example of inlining model files into binary (#3104)
matthewhaynesonline Oct 7, 2025
ca35cf9
Where cond get_strided_index conditionally based on function constant…
ivarflakstad Oct 7, 2025
0374ff3
feat(stable-diffusion): add build_unet_sharded method (#3118)
hoodiecollin Oct 8, 2025
ad1da34
Fix metal get_function error (#3114)
ivarflakstad Oct 8, 2025
256c4e2
Quantization use debug_assert in hot paths (#3109)
ivarflakstad Oct 8, 2025
6fb56c3
Adding inference for GraniteMoeHybrid models from IBM (#3117)
atilag Oct 8, 2025
7b8f2b4
Fix failing `cuda` build (#3121)
LLukas22 Oct 9, 2025
cc967fc
feat: add metal_if_available method for graceful Metal fallback (#3041)
xavierforge Oct 9, 2025
bffa5e1
Fix metal quantized to_float calls (#3123)
ivarflakstad Oct 9, 2025
41fa5f1
Add more conv2d bench cases to candle-nn benches (#3131)
slckl Oct 13, 2025
9fe6232
Fix single file binary builder to only run when env var is set (#3126)
ivarflakstad Oct 13, 2025
f601fd8
Update modernbert.rs (#3010)
whitebox2 Oct 16, 2025
701205a
Update dependencies (#3135)
ivarflakstad Oct 16, 2025
1febb7b
Ensure output of Transpose is contiguous to prevent downstream MatMul…
kshitijl Oct 17, 2025
2bce4e5
In the BERT example: apply the attention mask from tokenization durin…
kshitijl Oct 18, 2025
a52f22f
Skip q8k and q8_1 tests on cuda (#3140)
ivarflakstad Oct 20, 2025
36b7517
Implement qwen3 vl
EricLBuehler Oct 23, 2025
fd379c5
Clippy
EricLBuehler Oct 23, 2025
59aeed4
Bump candle version to 0.9.2-alpha.1 (#3146)
ivarflakstad Oct 23, 2025
5b7858c
Remove unused
EricLBuehler Oct 23, 2025
e3228c1
Add Qwen 3 VL to candle-transformers
EricLBuehler Oct 23, 2025
d312da2
Improve candle example buildtime downloader (#3147)
ivarflakstad Oct 23, 2025
a23a48f
CPU Conv2d: separate module, tiled im2col, specialization (#3136)
slckl Oct 25, 2025
31d6698
rust-ci: add --benches to clippy, fix warnings (#3148)
slckl Oct 25, 2025
df618f8
candle-core: add `broadcast_add` benches (#3149)
slckl Oct 25, 2025
fab0c45
fix: build errors for compute cap 7.5 (#3142)
neksodebe Oct 28, 2025
a05b549
Update cargo build instructions to use double colon syntax (#3132)
matthewhaynesonline Oct 28, 2025
8f27f5c
Add flash attn v3: `candle-flash-attn-v3` (#3152)
EricLBuehler Oct 28, 2025
7669ed1
Add nccl feature to candle-core (#3155)
EricLBuehler Oct 30, 2025
3c7a63d
clippy default fixes (#3160)
ivarflakstad Oct 31, 2025
b8c2ee8
Fix Metal matmul failure in `ModernBertHead::forward` by ensuring con…
whitebox2 Oct 31, 2025
ca3aee8
Add varbuilder get_unchecked methods (#3157)
EricLBuehler Oct 31, 2025
d4545eb
Add unsafe from_storage apis (#3156)
EricLBuehler Nov 1, 2025
b06a02c
[Metal] Ensure metal backend is send/sync via status semaphore (#3164)
ivarflakstad Nov 6, 2025
ade0918
Add sqrt2 as constant for gelu_erf and use `libm` erf (#3168)
vrdn-23 Nov 7, 2025
4ff99ba
candle-core: strided-index inline next + size_hint + exact size itera…
slckl Nov 8, 2025
836540f
Fix DINOv2 no-interpolation shortcut (#3172)
pcuenca Nov 8, 2025
bf3d3f2
Use Tensor::argmax instead of manual cpu impl (#3173)
ivarflakstad Nov 9, 2025
87653ca
Fix argmax. Higher index should also be taken into account (#3179)
ivarflakstad Nov 11, 2025
db08cc0
Add command buffer pool for improved multi-threaded Metal performance…
anonenity Nov 11, 2025
60252cc
feat(candle-nn) ConcatKvCache for 2-5x GPU speedup on autoregressive …
DrJesseGlass Nov 14, 2025
8ebfc22
Add `cublas_handle` api, update safetensors (#3192)
EricLBuehler Nov 17, 2025
ab56dfe
Update CI (#3194)
ivarflakstad Nov 17, 2025
549eacb
Add initial support for imatrix quantization (#3193)
EricLBuehler Nov 18, 2025
eb651c8
add clear kv cache to quantized qwen3 weights (#3189)
anonenity Nov 18, 2025
3390caa
fix typo preventing usage on mac (#3201)
amritsingh183 Nov 20, 2025
27cd43c
CUDA: Fix integer reductions by removing +/-INF initialization (#3200)
TimmyOVO Nov 20, 2025
9ca71de
fix for https://github.com/huggingface/candle/issues/3203 (#3204)
amritsingh183 Nov 20, 2025
b801ef6
Add lld installation and test steps for Linux (#3213)
haricot Nov 25, 2025
01bea21
Add dummy dtypes (#3195)
EricLBuehler Nov 25, 2025
95ea453
Add more misc. changes from candle fork (#3196)
EricLBuehler Nov 25, 2025
2ac3fe0
.gitignore: add .zed to ignored editor configs (#3218)
slckl Nov 30, 2025
c39d5f0
chore(dep): bump cudarc to 0.18.1 (#3219)
mayocream Dec 2, 2025
08d7b64
Hotfix: Bump float8 to 0.5.0 (#3223)
EricLBuehler Dec 3, 2025
2664a21
[Metal] Make fast math mode optional (#3205)
ivarflakstad Dec 4, 2025
9ede204
Update pyo3 (#3202)
ivarflakstad Dec 4, 2025
3d3cc49
[Metal] unary and affine improvements (#3230)
ivarflakstad Dec 6, 2025
72238a7
[Metal] binary improvements (#3231)
ivarflakstad Dec 8, 2025
d91be02
fix(metal): add missing softcapping field to AttnParams struct (#3233)
amritsingh183 Dec 8, 2025
2a797ea
Format sdpa (#3235)
EricLBuehler Dec 8, 2025
d23664f
Fix metal argmax (#3238)
EricLBuehler Dec 9, 2025
73fd9c3
[Metal] further improve unary and binary (#3239)
ivarflakstad Dec 10, 2025
e33d776
[Metal] cast improvements (#3241)
ivarflakstad Dec 10, 2025
4b46187
[Metal] Improve ternary further (#3242)
ivarflakstad Dec 14, 2025
8839457
Bump candle version to 0.9.2-alpha.2 (#3248)
ivarflakstad Dec 16, 2025
689d255
add candle flash attention 3 copyright markers (#3256)
michaelfeil Dec 21, 2025
ab6d97e
fix: replace deprecated cudarc memcpy methods (#3228)
DrJesseGlass Dec 23, 2025
f2d5aab
Support Fused MoE & Qwen3 GGUF MoE models (#3221)
guoqingbao Dec 23, 2025
049c06d
Upgrade GitHub Actions for Node 24 compatibility (#3255)
salmanmkc Dec 24, 2025
0e4dc02
Adds onnx ops to support debertav3/piiranha (#3260)
skeet70 Dec 26, 2025
5498dff
Add bilinear interpolation support (upsample_bilinear2d) (#3237)
SpenserCai Dec 26, 2025
63437a4
Fix remnant memcpy_stod call (#3267)
ivarflakstad Dec 27, 2025
f2bd79e
Sort on cuda fails when tensor size exceeds 1024 (#3271)
slckl Dec 30, 2025
e717779
make candle ops public (#3226)
zackangelo Dec 30, 2025
4ea88fa
fix(quantized_gemma3): auto-detect GGUF metadata prefix for gemma-emb…
clocksmith Dec 30, 2025
5de3d0f
add HuberLoss (#3252)
donjuanplatinum Dec 31, 2025
d8fb848
feat!: Make `ug` dependency optional (#3268)
DanikVitek Dec 31, 2025
3a0d1cb
Add Z-Image Text-to-Image Generation Support (#3261)
SpenserCai Jan 2, 2026
43be23c
fix(candle-kernels): conditionally link stdc++ for non-MSVC targets (…
Elvis339 Jan 3, 2026
a4ad7c7
replace cutlass submodule references with explicit build step (#3234)
jacobgorm Jan 4, 2026
fd8448d
Rename compute capability defines in CUDA kernels (#3275)
FerrisMind Jan 6, 2026
c3ed240
Fix MoE WMMA kernel on V100 (#3282)
guoqingbao Jan 6, 2026
db3d5d9
[Metal] improve normalization (#3283)
ivarflakstad Jan 6, 2026
54131f1
Fix BF16 conv_transpose2d using wrong kernel on Metal (#3279)
amritsingh183 Jan 6, 2026
42a4edc
Mamba2 implementation (#3264)
Anri-Lombard Jan 7, 2026
f526033
feat: paddleocr-vl model and example (#3273)
danielclough Jan 7, 2026
f2d30fb
chore(dep): bump cudarc to 0.18.2 (#3293)
staceymelville Jan 14, 2026
dbb8c2d
Add SmolLM3: Full and Quantized Implementation (#3180)
DrJesseGlass Jan 14, 2026
a2029da
example: add quantized qwen3 wasm with SIMD optimizations (#3159)
DrJesseGlass Jan 14, 2026
aaf5c86
Hotfix: Remove fastmath from candle-kernels (#3309)
EricLBuehler Jan 17, 2026
261f727
feat: add quantized lfm2 model support (#3244)
fffonion Jan 17, 2026
23182cf
Support new arch of GLM4 GGUF models (#2992)
guoqingbao Jan 17, 2026
cc8ec5e
feat: simplify metal reduce kernels and standardize on u32 indexing (…
drbh Jan 18, 2026
a3969ed
rms/layer norm accumulate in f32 for improved precision (#3315)
ivarflakstad Jan 19, 2026
0f6b303
Remove `test.onnx` (uploaded by mistake?) (#3316)
alvarobartt Jan 19, 2026
06cb713
Metal GEMM Dynamic Tile Selection and Batch Collapse Optimization (#3…
SpenserCai Jan 21, 2026
8d5873b
Update deps (#3320)
ivarflakstad Jan 22, 2026
88ed791
CUDA Tensor::toDevice copies directly via memcopy_dtod (#3312)
krampenschiesser Jan 23, 2026
f041b87
[Cuda] Use upstream bindgen_cuda crate (#3328)
ivarflakstad Jan 24, 2026
e53310d
Bump candle version to 0.9.2 (#3329)
ivarflakstad Jan 24, 2026
3b39794
Add dep versioning for candle-flash-attn-build (#3330)
ivarflakstad Jan 24, 2026
061c392
Bump float8 to 0.7.0, cudarc to 0.19.1 (#3360)
EricLBuehler Feb 4, 2026
971e7ed
Bump float8 to 0.7.0, cudarc to 0.19.1 (#3360) (#3361)
EricLBuehler Feb 4, 2026
c3bb5bf
Use cudaforge for kernel build (#3346)
guoqingbao Feb 10, 2026
f2cb5b4
Add candle-video library for text-to-video generation in README.md (#…
FerrisMind Feb 10, 2026
a0dbd8b
fix: disable bf16 WMMA kernels on pre-Ampere GPUs (compute cap < 80) …
asglover Feb 15, 2026
2bd0256
feat: rwkv_v7 model and examples (#3372)
danielclough Feb 17, 2026
6ce61d3
fix: add chat template for Qwen3 models in qwen example (#3167) (#3377)
olafurjohannsson Feb 19, 2026
026b4f2
feat: nomic-embed-text-v1.5 model and examples (#3374)
danielclough Feb 19, 2026
8cc682c
fix(metal-kernel): index select with u32 indices and i64 source (#3371)
Elvis339 Feb 19, 2026
38e7202
feat: allow tokenizer to load from GGUF metadata (#3245)
fffonion Feb 19, 2026
bf9e950
fix: conv2d_tiled produces wrong results when C == H == W (#3405)
developer0hye Mar 22, 2026
7769e3b
Add support for head dim 512 to FA v2/v3 and metal SDPA, CUDA 13.2 (#…
EricLBuehler Mar 28, 2026
df22d80
fix: correct CPU scatter RequiresContiguous op for non-contiguous ids…
cuiweixie Mar 28, 2026
15969b5
fix(metal): seed buffer size 4 → 8 bytes (u64) (#3408)
jamesbrink Mar 28, 2026
f7e39a4
feat: add eps() and remove_mean() getters to LayerNorm and RmsNorm (#…
jgremes Mar 29, 2026
e0e33e9
[Metal] Use StorageModePrivate for intermediate compute buffers (#3416)
jamisonl Mar 29, 2026
6b4d8a1
fix: exclude tokenizers dependency on wasm32 targets (#3414)
ilnmtlbnm Mar 29, 2026
c6a4649
Guard against NULL architecture pointer on simulators (#3392)
setoelkahfi Mar 29, 2026
a7d8427
fix(metal): buffer pool rounds 2 down by 1 (#3394)
setoelkahfi Mar 29, 2026
25805ff
Fix some NaNs with GGML quantized (#3428)
EricLBuehler Mar 30, 2026
58694bb
feat: add #[non_exhaustive] to DType enum (#3412)
eyupcanakman Mar 30, 2026
a9060dc
Run conv2d_c_eq_h_eq_w test in f32 as it is still precise and metal d…
ivarflakstad Mar 30, 2026
96066e8
fix(flash-attn-v3): add -fPIC on non-MSVC targets (#3380)
glaziermag Mar 30, 2026
153cadd
Bump candle version to 0.10.0 (#3432)
ivarflakstad Mar 31, 2026
cb94e4f
Use AnyObject instead of c_void, as it has the @ encoding that Object…
ivarflakstad Apr 1, 2026
904bf22
Bump candle version to 0.10.1 (#3436)
ivarflakstad Apr 1, 2026
46928bc
Fix sliding window full sdpa corner case (#3438)
EricLBuehler Apr 1, 2026
aa812d7
fix(models): rectangular causal mask for prefix KV caching (#3437)
ArthurZucker Apr 1, 2026
7c7a8c5
Bump candle version to 0.10.2 (#3441)
ivarflakstad Apr 1, 2026
1f3cc15
🔒 pin maturin.yml actions to commit SHAs
paulinebm Apr 2, 2026
c8ddf6d
🔒 pin python.yml actions to commit SHAs
paulinebm Apr 2, 2026
321b894
🔒 pin ci_cuda.yaml actions to commit SHAs
paulinebm Apr 2, 2026
6f9afa3
🔒 pin trufflehog.yml actions to commit SHAs
paulinebm Apr 2, 2026
097655a
Implement the new Google model (#3443)
EricLBuehler Apr 2, 2026
c42e1fe
Merge pull request #3442 from huggingface/security/pin-actions-to-sha
paulinebm Apr 8, 2026
34625ab
Handle case with gemma4 without safetensors index (#3457)
EricLBuehler Apr 9, 2026
b503458
Add fast CUDA MMVQ GGUF kernels (#3463)
EricLBuehler Apr 13, 2026
aff7c10
Add fast CUDA MMQ GGUF kernels (#3465)
EricLBuehler Apr 15, 2026
4bc89dd
Fix RoPE convention for NEOX-style models in quantized_llama.rs (#3411)
joelteply Apr 15, 2026
5d20d1f
Fix metal GgmlDType::BF16 mapping (#3471)
ivarflakstad Apr 16, 2026
cce7901
Fix fmt and clippy remnants (#3472)
ivarflakstad Apr 16, 2026
c5d7d49
Fix: use MTLCopyAllDevices() for reliable Metal device enumeration (#…
romnn Apr 22, 2026
bb2d400
fix clippy lints surfaced by Rust 1.95 (#3481)
tomsanbear Apr 22, 2026
c8c7663
update `erf::polynomial` (#1413)
chris-ha458 Apr 22, 2026
2bbbb4e
Add rustup wasm doc to wasm example (#1438)
jk2K Apr 22, 2026
9e71760
Extend `GradStore` public functionality (#1483)
agerasev Apr 22, 2026
7fa19d2
Remove unnecessary task (#1925)
kejcao Apr 22, 2026
b43326e
fix: candle-book paths (#3386)
IamPhytan Apr 22, 2026
5bd5618
Remove unwrap()s from candle-metal-kernels/src/metal/device.rs (#3382)
jacobgorm Apr 22, 2026
5447a87
Optimization for CPU Causal Flash Attention (integrated into Qwen3) (…
DrJesseGlass Apr 23, 2026
b7eb2f0
[Metal] Concurrent dispatching (#3511)
ivarflakstad May 14, 2026
ad4298f
[Metal] Improved inter-encoder sync and gemv (#3532)
ivarflakstad May 15, 2026
05eeac9
chore(deps): update fancy-regex requirement from 0.17.0 to 0.18.0 (#3…
dependabot[bot] May 16, 2026
f2a9852
chore(deps): update parquet requirement from 57 to 58 (#3501)
dependabot[bot] May 16, 2026
691f1af
chore(deps): update cpal requirement from 0.15.2 to 0.17.3 (#3499)
dependabot[bot] May 16, 2026
0dcfd62
chore(deps): update yew requirement from 0.20.0 to 0.23.0 (#3498)
dependabot[bot] May 16, 2026
538934b
chore(deps): update zip requirement from 7.2.0 to 8.6.0 (#3500)
dependabot[bot] May 16, 2026
525b81e
Add clear_kv_cache to quantized_llama and quantized_qwen2 (#3536)
Sok205 May 16, 2026
3df8203
Introduce new StridedBlocks type UniformBlocks (#3495)
ivarflakstad May 16, 2026
6a05b4b
Fix FAv3 packing and symbol overlap (#3559)
EricLBuehler May 24, 2026
cabbc30
Add htod cache for cuda graphs (#3564)
EricLBuehler May 24, 2026
3ba428d
Improve CUDA stream consistency (#3565)
EricLBuehler May 26, 2026
1d7e927
fix(kernels): provide __hmax_nan/__hmin_nan fallback for sm_75 (Turin…
4mGLn May 26, 2026
dcc708e
chore(deps): update symphonia requirement from 0.5.3 to 0.6.0 (#3546)
dependabot[bot] May 28, 2026
14f923b
chore(deps): update hf-hub requirement from 0.4.1 to 0.5.0 (#3545)
dependabot[bot] May 28, 2026
d2ba757
chore(deps): update rubato requirement from 1 to 2 (#3544)
dependabot[bot] May 28, 2026
09b9145
Binary broadcast scalar support (#3487)
ivarflakstad May 29, 2026
5404348
metal: add copy2d kernels for I16 / I32 (#3478)
tomsanbear Jun 1, 2026
39355c6
Improved neon vec dot (#3570)
ivarflakstad Jun 1, 2026
1a63038
metal: fix RMSNorm NaN on large-magnitude F32 inputs (#3477)
tomsanbear Jun 4, 2026
c72a17f
Cap allocations in the GGUF loader (#3556)
HueCodes Jun 6, 2026
b5a101a
chore(deps): update enterpolation requirement from 0.2.1 to 0.3.0 (#3…
dependabot[bot] Jun 6, 2026
33b00d2
chore(deps): update web-sys requirement from =0.3.70 to =0.3.99 (#3575)
dependabot[bot] Jun 7, 2026
fea3a99
chore(deps): update gloo requirement from 0.11 to 0.12 (#3573)
dependabot[bot] Jun 7, 2026
3d3d9c4
Add LFM2.5 (Liquid Foundation Model 2.5) support (#3400)
Jacqkues Jun 8, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .cargo/config.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
rustflags = ["-C", "target-cpu=native"]

[target.wasm32-unknown-unknown]
rustflags = ["-C", "target-feature=+simd128"]
rustflags = ["-C", "target-feature=+simd128", "--cfg", 'getrandom_backend="wasm_js"']

[target.x86_64-apple-darwin]
rustflags = ["-C", "target-feature=-avx,-avx2"]
40 changes: 0 additions & 40 deletions .github/workflows/book-cd.yml

This file was deleted.

29 changes: 0 additions & 29 deletions .github/workflows/book.yml

This file was deleted.

15 changes: 8 additions & 7 deletions .github/workflows/ci_cuda.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,9 @@ jobs:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true
runs-on:
group: aws-g4dn-2xlarge
group: aws-g5-4xlarge-cache
container:
image: nvidia/cuda:12.3.1-devel-ubuntu22.04
options: --gpus 0
image: nvidia/cuda:13.0.2-cudnn-devel-ubuntu24.04
if: ${{ github.event.pull_request.head.repo.full_name == github.event.pull_request.base.repo.full_name }}
permissions:
contents: write
Expand All @@ -22,13 +21,15 @@ jobs:
# with sigstore/fulcio when running outside of PRs.
id-token: write
security-events: write
env:
CUDA_COMPUTE_CAP: 86
steps:
- name: Checkout repository
uses: actions/checkout@v3
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Install dependencies
run: apt-get update && apt install curl build-essential libssl-dev protobuf-compiler pkg-config -y
run: apt update && apt install curl build-essential libssl-dev protobuf-compiler pkg-config -y
- name: Install Rust Stable
uses: actions-rust-lang/setup-rust-toolchain@v1
- uses: Swatinem/rust-cache@v2
uses: dtolnay/rust-toolchain@29eef336d9b2848a0b548edc03f92a220660cdb8 # stable
- uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
- name: Test (cuda)
run: cargo test --features cuda
Binary file modified .github/workflows/maturin.yml
Binary file not shown.
18 changes: 8 additions & 10 deletions .github/workflows/python.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,30 +20,28 @@ jobs:
os: [ubuntu-latest] # For now, only test on Linux
steps:
- name: Checkout repository
uses: actions/checkout@v4
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

- name: Install Rust
uses: actions-rs/toolchain@v1
with:
toolchain: stable
uses: dtolnay/rust-toolchain@29eef336d9b2848a0b548edc03f92a220660cdb8 # stable

- name: Install Python
uses: actions/setup-python@v4
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6
with:
python-version: 3.11
python-version: 3.13
architecture: "x64"

- name: Cache Cargo Registry
uses: actions/cache@v1
uses: actions/cache@668228422ae6a00e4ad889ee87cd7109ec5666a7 # v5
with:
path: ~/.cargo/registry
key: ${{ runner.os }}-cargo-registry-${{ hashFiles('**/Cargo.lock') }}

- name: Install Protoc
uses: arduino/setup-protoc@v2
uses: arduino/setup-protoc@a8b67ba40b37d35169e222f3bb352603327985b6 # v2
with:
version: "25.0"
repo-token: ${{ secrets.GITHUB_TOKEN }}
version: "25.0"
repo-token: ${{ secrets.GITHUB_TOKEN }}

- name: Install
working-directory: ./candle-pyo3
Expand Down
105 changes: 64 additions & 41 deletions .github/workflows/rust-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,68 +11,91 @@ jobs:
name: Check
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, windows-latest, macOS-latest]
rust: [stable]
os: [ubuntu-latest, ubuntu-24.04, windows-latest, macOS-latest, ubuntu-24.04-arm]
steps:
- uses: actions/checkout@v4
- uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: ${{ matrix.rust }}
override: true
- uses: actions-rs/cargo@v1
- uses: actions/checkout@v6
- uses: actions/setup-python@v6
with:
command: check
args: --workspace
python-version: "3.13"
- name: Remove cargo config (macOS ring crate fix)
if: runner.os == 'macOS'
run: rm -f .cargo/config.toml
- uses: dtolnay/rust-toolchain@stable

- name: Run macos with metal
if: matrix.os == 'macOS-latest'
run: cargo check --workspace --features metal

- name: Run normal cpu
if: matrix.os == 'ubuntu-latest' || matrix.os == 'windows-latest'
run: cargo check --workspace

- name: Run with avx2
if: matrix.os == 'ubuntu-24.04'
run: |
export RUSTFLAGS="-C target-feature=avx2"
cargo check --workspace

- name: Run with arm neon
if: matrix.os == 'ubuntu-24.04-arm'
run: |
export RUSTFLAGS="-C target-feature=neon"
cargo check --workspace

test:
name: Test Suite
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, windows-latest, macOS-latest]
rust: [stable]
steps:
- uses: actions/checkout@v4
- uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: ${{ matrix.rust }}
override: true
- uses: actions-rs/cargo@v1
- name: Free disk space (Linux)
if: runner.os == 'Linux'
run: |
sudo rm -rf /opt/hostedtoolcache
sudo rm -rf /usr/share/dotnet
sudo rm -rf /usr/local/lib/android
sudo rm -rf /opt/ghc
df -h
- uses: actions/checkout@v6
- uses: actions/setup-python@v6
with:
command: test
args: --workspace
python-version: "3.13"
- name: Remove cargo config (macOS ring crate fix)
if: runner.os == 'macOS'
run: rm -f .cargo/config.toml
- uses: dtolnay/rust-toolchain@stable
- name: Install lld (Linux only)
if: runner.os == 'Linux'
run: sudo apt-get update && sudo apt-get install -y lld
- name: Run tests (with lld on Linux)
if: runner.os == 'Linux'
env:
RUSTFLAGS: "-C link-arg=-fuse-ld=lld"
run: cargo test --workspace
- name: Run tests (Windows & macOS)
if: runner.os != 'Linux'
run: cargo test --workspace

fmt:
name: Rustfmt
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions-rs/toolchain@v1
- uses: actions/checkout@v6
- uses: dtolnay/rust-toolchain@stable
with:
profile: minimal
toolchain: stable
override: true
- run: rustup component add rustfmt
- uses: actions-rs/cargo@v1
with:
command: fmt
args: --all -- --check
components: rustfmt
- run: cargo fmt --all -- --check

clippy:
name: Clippy
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions-rs/toolchain@v1
with:
profile: minimal
toolchain: stable
override: true
- run: rustup component add clippy
- uses: actions-rs/cargo@v1
- uses: actions/checkout@v6
- uses: dtolnay/rust-toolchain@stable
with:
command: clippy
args: --workspace --tests --examples -- -D warnings
components: clippy
- run: cargo clippy --workspace --tests --examples --benches -- -D warnings

12 changes: 6 additions & 6 deletions .github/workflows/trufflehog.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@ jobs:
trufflehog:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Secret Scanning
uses: trufflesecurity/trufflehog@main
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
- name: Secret Scanning
uses: trufflesecurity/trufflehog@6bd2d14f7a4bc1e569fa3550efa7ec632a4fa67b # main
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Cargo.lock
# editor config
.helix
.vscode
.zed

# These are backup files generated by rustfmt
**/*.rs.bk
Expand Down Expand Up @@ -46,3 +47,4 @@ out.wav
bria.mp3
bria.safetensors
bria.wav
bench_results/
3 changes: 0 additions & 3 deletions .gitmodules

This file was deleted.

11 changes: 0 additions & 11 deletions .vscode/settings.json

This file was deleted.

Loading