Skip to content

Llama3 GGUF emits incoherent output #389

Description

@shsym

Large Llama3 GGUF models can emit incoherent text even when decoding stops normally.

Root cause found in the portable path: Llama3 needed the correct chat prologue and normal RoPE handling, not the Qwen/Phi NEOX RoPE path. The downstream app reproduced this with Meta-Llama-3.1-8B-Instruct-Q4_K_M and verified the fix with a deterministic 123+456 smoke returning 579.

Follow-up: keep the Llama3 prompt/RoPE behavior covered on the pie.app/v1-base-shmem line after PR #388.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions