Robust SafeTensors/GGUF parsing, quantization fixes, and download file handling#2
Robust SafeTensors/GGUF parsing, quantization fixes, and download file handling#2NightVibes33 wants to merge 6 commits into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5c40678d52
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| case "F16", "float16", "BF16", "bfloat16": | ||
| return .float16 |
There was a problem hiding this comment.
Convert BF16 tensors before treating them as Float16
Mapping "BF16"/"bfloat16" to .float16 causes BF16 payloads to be decoded with IEEE FP16 semantics later in tensorFloatValues(from:), which corrupts tensor values (for example, BF16 1.0 bits do not represent FP16 1.0). This will produce incorrect quantized output for BF16 SafeTensors models, which are common on Hugging Face, so these tensors need explicit BF16-to-Float conversion instead of reusing the FP16 path.
Useful? React with 👍 / 👎.
| guard count >= 0, offset >= 0, offset + count <= data.count else { | ||
| return Data() | ||
| } |
There was a problem hiding this comment.
Fail fast on out-of-bounds GGUF reads
Returning empty Data() on bounds failures without advancing offset or throwing means parsing can keep rereading the same position indefinitely while parseHeader() loops over file-declared counts. With a truncated or malformed GGUF that advertises large metadata/tensor counts, this can turn into very long/hanging parse loops instead of a deterministic format error.
Useful? React with 👍 / 👎.
Motivation
Description
FileHandleinHuggingFaceAPIto avoid errors when writing streamed bytes and keep progress updates intact.data_offsetsinstead of relying on sequential offsets and padding inQuantizationEngine, and skip the__metadata__entry when present.ggmlType(for:)and map toGGMLType, and usetensorFloatValues(from:)to extract[Float]from.float32and.float16tensors for quantizers.q4_0,q4_1,q8_0,convertToFP16, etc.) to use the extracted float arrays, compute block sizes correctly, and produce consistent output data.readDatabounds checks, little-endian unaligned loads forUInt32/UInt64, saferreadString, and explicit metadata array element parsing viareadMetadataArrayElement.GGUFParser.floatToHalfandhalfToFloatconversions forFloat16instead of a CI stub.filterModelsand usefilterLocalModels(query:)when merging API results inQuantizeViewModel.Testing
swift buildto validate compilation and platform type changes, which succeeded.swift testcovering GGUF parsing and quantization helpers, which passed.Codex Task