Add TranslateGemma support and reorganize Gemma module structure CAN-34#10
Add TranslateGemma support and reorganize Gemma module structure CAN-34#10DrJesseGlass wants to merge 15 commits into
Conversation
Key and value states become non-contiguous after transpose but KvCache::append() requires contiguous tensors for slice_set. This worked for some model dimensions but failed for others (e.g., TranslateGemma 4B with different GQA ratios).
|
Full precision models auto-download from HuggingFace. Quantized inference requires a local GGUF file via --model-path, as there are no official GGUF conversions from Google. Community conversions are available on HuggingFace. |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b83bcc5add
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let prompt = format_translate_prompt(text, source.code(), target.code()); | ||
|
|
||
| let encoding = self | ||
| .tokenizer | ||
| .encode(prompt.as_str(), true) | ||
| .map_err(E::msg)?; |
There was a problem hiding this comment.
Avoid double-adding special tokens in TranslateGemma
format_translate_prompt already embeds <bos>/turn markers, so calling encode(..., true) will run the tokenizer post-processor again. For Gemma tokenizers that add BOS/EOS, this yields a duplicated <bos> and a trailing <eos> in the input, which can cause empty/early-terminated translations or otherwise shift the generation context. Consider encode(..., false) here (or remove <bos> from the prompt) so the prompt is not post-processed twice.
Useful? React with 👍 / 👎.
…ls; however this is because quantized_gemma3 and gemma3 have different activation functions
This PR adds support for Google's TranslateGemma translation models and reorganizes the Gemma model family into a unified module structure.
Changes
Module reorganization:
Moved gemma.rs → gemma/gemma1.rs
Consolidated gemma2.rs, gemma3.rs, quantized_gemma3.rs under gemma/
Added gemma/mod.rs with re-exports for backward compatibility
New: TranslateGemma support:
Added gemma/translate_gemma.rs with prompt formatting utilities and ISO 639-1 language codes (55 languages)
Added examples/translate-gemma.rs supporting both full precision and quantized inference
Bug fixes:
gemma3.rs: Make KV tensors contiguous before cache append. This fixes slice-set only supports contiguous tensors error that surfaces with certain GQA ratios (e.g., TranslateGemma 4B)
quantized_gemma3.rs: Added clear_kv_cache() method for multi-turn inference