Skip to content

Robust SafeTensors/GGUF parsing, quantization fixes, and download file handling#2

Open
NightVibes33 wants to merge 6 commits into
mainfrom
codex/fully-debug-app-code-jcyzdu
Open

Robust SafeTensors/GGUF parsing, quantization fixes, and download file handling#2
NightVibes33 wants to merge 6 commits into
mainfrom
codex/fully-debug-app-code-jcyzdu

Conversation

@NightVibes33
Copy link
Copy Markdown
Owner

Motivation

  • Fix incorrect binary parsing and endian handling in SafeTensors/GGUF readers to avoid corrupted reads and crashes.
  • Make quantization and FP16/float conversions robust by reading tensor data according to declared types and by providing proper Float16 conversion routines.
  • Ensure safe file creation and progress reporting when streaming downloads from Hugging Face.

Description

  • Create the destination file before opening a FileHandle in HuggingFaceAPI to avoid errors when writing streamed bytes and keep progress updates intact.
  • Parse SafeTensors headers and tensor data using little-endian-aware reads and explicit data_offsets instead of relying on sequential offsets and padding in QuantizationEngine, and skip the __metadata__ entry when present.
  • Replace ad-hoc dtype size logic with ggmlType(for:) and map to GGMLType, and use tensorFloatValues(from:) to extract [Float] from .float32 and .float16 tensors for quantizers.
  • Update quantizers (q4_0, q4_1, q8_0, convertToFP16, etc.) to use the extracted float arrays, compute block sizes correctly, and produce consistent output data.
  • Improve GGUF parser safety with readData bounds checks, little-endian unaligned loads for UInt32/UInt64, safer readString, and explicit metadata array element parsing via readMetadataArrayElement.
  • Compute tensor data sizes correctly for plain and block-quantized formats when reading tensors in GGUFParser.
  • Implement proper floatToHalf and halfToFloat conversions for Float16 instead of a CI stub.
  • Minor ViewModel tweak: removed placeholder filterModels and use filterLocalModels(query:) when merging API results in QuantizeViewModel.

Testing

  • Ran swift build to validate compilation and platform type changes, which succeeded.
  • Executed unit tests with swift test covering GGUF parsing and quantization helpers, which passed.
  • Performed an automated smoke test of downloading a SafeTensors file and converting it to GGUF in CI, which completed successfully.

Codex Task

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5c40678d52

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +410 to +411
case "F16", "float16", "BF16", "bfloat16":
return .float16
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Convert BF16 tensors before treating them as Float16

Mapping "BF16"/"bfloat16" to .float16 causes BF16 payloads to be decoded with IEEE FP16 semantics later in tensorFloatValues(from:), which corrupts tensor values (for example, BF16 1.0 bits do not represent FP16 1.0). This will produce incorrect quantized output for BF16 SafeTensors models, which are common on Hugging Face, so these tensors need explicit BF16-to-Float conversion instead of reusing the FP16 path.

Useful? React with 👍 / 👎.

Comment on lines +895 to +897
guard count >= 0, offset >= 0, offset + count <= data.count else {
return Data()
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fail fast on out-of-bounds GGUF reads

Returning empty Data() on bounds failures without advancing offset or throwing means parsing can keep rereading the same position indefinitely while parseHeader() loops over file-declared counts. With a truncated or malformed GGUF that advertises large metadata/tensor counts, this can turn into very long/hanging parse loops instead of a deterministic format error.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant