Investigate replacing mistral with the gemma model

Why?

- Smaller memory footprint
- Possibly faster
- Can possibly either have a smaller model (on disk) or a more accurate one

https://deepmind.google/models/gemma/gemma-4/

https://github.com/huggingface/candle/blob/main/candle-examples/examples/gemma4/main.rs