Large Llama3 GGUF models can emit incoherent text even when decoding stops normally.
Root cause found in the portable path: Llama3 needed the correct chat prologue and normal RoPE handling, not the Qwen/Phi NEOX RoPE path. The downstream app reproduced this with Meta-Llama-3.1-8B-Instruct-Q4_K_M and verified the fix with a deterministic 123+456 smoke returning 579.
Follow-up: keep the Llama3 prompt/RoPE behavior covered on the pie.app/v1-base-shmem line after PR #388.
Large Llama3 GGUF models can emit incoherent text even when decoding stops normally.
Root cause found in the portable path: Llama3 needed the correct chat prologue and normal RoPE handling, not the Qwen/Phi NEOX RoPE path. The downstream app reproduced this with Meta-Llama-3.1-8B-Instruct-Q4_K_M and verified the fix with a deterministic 123+456 smoke returning 579.
Follow-up: keep the Llama3 prompt/RoPE behavior covered on the pie.app/v1-base-shmem line after PR #388.