Why? - Smaller memory footprint - Possibly faster - Can possibly either have a smaller model (on disk) or a more accurate one https://deepmind.google/models/gemma/gemma-4/ https://github.com/huggingface/candle/blob/main/candle-examples/examples/gemma4/main.rs
Why?
https://deepmind.google/models/gemma/gemma-4/
https://github.com/huggingface/candle/blob/main/candle-examples/examples/gemma4/main.rs