feat: add M3 model loading and llama runner#14
Conversation
Add mmap-backed GGUF and safetensors tensor directory loading, expose tensor views through the public C API, and add a CLI inspect command for the new loader seam. Signed-off-by: Zach Bennett <zach@worldflowai.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a complete Llama inference stack: POSIX ChangesLlama inference stack: model loading, tokenization, runtime, and CLI
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Parse GGUF metadata into model-file views, expose scalar/string accessors, and cover tokenizer string arrays in the model loader tests. Signed-off-by: Zach Bennett <zach@worldflowai.com>
Add a GGUF llama tokenizer backed by retained metadata, token buffer APIs, encode/decode helpers, and synthetic tokenizer fixtures. Signed-off-by: Zach Bennett <zach@worldflowai.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/model/model_file.c (1)
681-695:⚠️ Potential issue | 🔴 CriticalDo not reject valid JSON strings with escapes.
Safetensors headers are JSON, but
json_string()fails on any\. This rejects valid files with escaped tensor names/keys or escaped string values inside__metadata__. The code needs escape-aware string consumption and decoding for strings whose logical value is used as a tensor name/key.The error message "escaped/control JSON strings are not supported yet" suggests this is a known limitation, but there is no documentation explicitly restricting safetensors headers to unescaped JSON only. Any safetensors file with escaped characters (e.g.,
"format":"p\"t") in the header will fail to load.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/model/model_file.c` around lines 681 - 695, The `json_string()` function currently rejects any backslash character in JSON strings, which prevents parsing valid escaped characters like `\"` or `\\`. Instead of treating backslashes as an error, modify the while loop logic to properly handle escape sequences by detecting when a backslash is encountered and then skipping over both the backslash and the following escaped character to continue parsing. This will allow valid JSON escape sequences in safetensors headers (tensor names, keys, and metadata values) to be properly consumed and decoded.
🧹 Nitpick comments (2)
include/peregrine/model.h (2)
47-55: ⚡ Quick winAdd a bool conversion helper for the public bool metadata type.
The API exposes
PG_METADATA_TYPE_BOOL, but callers must inspectvalue->as.booleandirectly because there is nopg_metadata_as_bool()helper alongside the other scalar conversions.API shape suggestion
int pg_metadata_as_u64(const PgMetadataValue *value, uint64_t *out); int pg_metadata_as_i64(const PgMetadataValue *value, int64_t *out); int pg_metadata_as_f64(const PgMetadataValue *value, double *out); +int pg_metadata_as_bool(const PgMetadataValue *value, int *out); int pg_metadata_as_string(const PgMetadataValue *value, PgStringView *out); int pg_metadata_array_string(const PgMetadataValue *value, size_t idx, PgStringView *out);Also applies to: 110-115
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@include/peregrine/model.h` around lines 47 - 55, Add a `pg_metadata_as_bool()` helper function to match the pattern of other scalar metadata conversion helpers like those for u64, i64, and f64 types. This function should accept a metadata value parameter and return the boolean value from the metadata structure, allowing callers to safely convert PG_METADATA_TYPE_BOOL values without directly accessing the internal union field.
19-22: ⚡ Quick winDocument the
PgStringViewlifetime and terminator contract.
PgStringViewis now used for metadata keys/values, but only tensor names are documented as non-NUL-terminated. Add the non-owning/lifetime contract so API users do not treatdataas a C string or keep it afterpg_model_file_free.Suggested header wording
+/* Non-owning byte view. `data` is not NUL-terminated and remains valid only + * while the `PgModelFile` that returned it remains alive. */ typedef struct PgStringView { const char *data; size_t len; } PgStringView;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@include/peregrine/model.h` around lines 19 - 22, The PgStringView struct typedef lacks documentation about its lifetime and NUL-termination contract, which can lead to API misuse. Add a documentation comment above the PgStringView typedef that clearly documents: it is a non-owning view into string data, the data is not guaranteed to be NUL-terminated, the pointer validity is bounded by the owning structure's lifetime, users should not treat it as a C string, and the pointer becomes invalid after pg_model_file_free is called.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/model/model_file.c`:
- Around line 356-458: The code performs calloc() allocations for metadata
arrays without first validating that the multiplication of count and the element
size would not overflow. Before each calloc() call in the switch statement for
cases PG_METADATA_TYPE_U64, PG_METADATA_TYPE_I64, PG_METADATA_TYPE_F64,
PG_METADATA_TYPE_BOOL, and PG_METADATA_TYPE_STRING, add an overflow check that
verifies count does not exceed SIZE_MAX divided by the size of each array type's
element (e.g., count > SIZE_MAX / sizeof(uint64_t) for the U64 case). If the
check fails, call reader_err() with an appropriate message and return false
before attempting the allocation.
---
Outside diff comments:
In `@src/model/model_file.c`:
- Around line 681-695: The `json_string()` function currently rejects any
backslash character in JSON strings, which prevents parsing valid escaped
characters like `\"` or `\\`. Instead of treating backslashes as an error,
modify the while loop logic to properly handle escape sequences by detecting
when a backslash is encountered and then skipping over both the backslash and
the following escaped character to continue parsing. This will allow valid JSON
escape sequences in safetensors headers (tensor names, keys, and metadata
values) to be properly consumed and decoded.
---
Nitpick comments:
In `@include/peregrine/model.h`:
- Around line 47-55: Add a `pg_metadata_as_bool()` helper function to match the
pattern of other scalar metadata conversion helpers like those for u64, i64, and
f64 types. This function should accept a metadata value parameter and return the
boolean value from the metadata structure, allowing callers to safely convert
PG_METADATA_TYPE_BOOL values without directly accessing the internal union
field.
- Around line 19-22: The PgStringView struct typedef lacks documentation about
its lifetime and NUL-termination contract, which can lead to API misuse. Add a
documentation comment above the PgStringView typedef that clearly documents: it
is a non-owning view into string data, the data is not guaranteed to be
NUL-terminated, the pointer validity is bounded by the owning structure's
lifetime, users should not treat it as a C string, and the pointer becomes
invalid after pg_model_file_free is called.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 97ac3c8b-d825-4b12-9a73-0d3ecf9d861e
📒 Files selected for processing (3)
include/peregrine/model.hsrc/model/model_file.ctests/test_model_file.c
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/test_model_file.c
| if (!rd_u32(r, &elem_raw_type) || !rd_u64(r, &len)) | ||
| return false; | ||
| elem_type = gguf_metadata_type(elem_raw_type); | ||
| if (elem_type == PG_METADATA_TYPE_UNKNOWN || elem_type == PG_METADATA_TYPE_ARRAY) { | ||
| reader_err(r, "unsupported GGUF metadata array type %u", elem_raw_type); | ||
| return false; | ||
| } | ||
| if (len > SIZE_MAX) { | ||
| reader_err(r, "metadata array too large"); | ||
| return false; | ||
| } | ||
|
|
||
| count = (size_t)len; | ||
| value->type = PG_METADATA_TYPE_ARRAY; | ||
| value->elem_type = elem_type; | ||
| value->count = count; | ||
|
|
||
| switch (elem_type) { | ||
| case PG_METADATA_TYPE_U64: { | ||
| uint64_t *arr = calloc(count ? count : 1, sizeof(*arr)); | ||
| if (!arr) { | ||
| reader_err(r, "out of memory"); | ||
| return false; | ||
| } | ||
| for (i = 0; i < count; i++) { | ||
| PgMetadataValue elem; | ||
| if (!gguf_read_scalar(r, elem_raw_type, &elem)) { | ||
| free(arr); | ||
| return false; | ||
| } | ||
| arr[i] = elem.as.u64; | ||
| } | ||
| value->as.u64_array = arr; | ||
| return true; | ||
| } | ||
| case PG_METADATA_TYPE_I64: { | ||
| int64_t *arr = calloc(count ? count : 1, sizeof(*arr)); | ||
| if (!arr) { | ||
| reader_err(r, "out of memory"); | ||
| return false; | ||
| } | ||
| for (i = 0; i < count; i++) { | ||
| PgMetadataValue elem; | ||
| if (!gguf_read_scalar(r, elem_raw_type, &elem)) { | ||
| free(arr); | ||
| return false; | ||
| } | ||
| arr[i] = elem.as.i64; | ||
| } | ||
| value->as.i64_array = arr; | ||
| return true; | ||
| } | ||
| case PG_METADATA_TYPE_F64: { | ||
| double *arr = calloc(count ? count : 1, sizeof(*arr)); | ||
| if (!arr) { | ||
| reader_err(r, "out of memory"); | ||
| return false; | ||
| } | ||
| for (i = 0; i < count; i++) { | ||
| PgMetadataValue elem; | ||
| if (!gguf_read_scalar(r, elem_raw_type, &elem)) { | ||
| free(arr); | ||
| return false; | ||
| } | ||
| arr[i] = elem.as.f64; | ||
| } | ||
| value->as.f64_array = arr; | ||
| return true; | ||
| } | ||
| case PG_METADATA_TYPE_BOOL: { | ||
| int *arr = calloc(count ? count : 1, sizeof(*arr)); | ||
| if (!arr) { | ||
| reader_err(r, "out of memory"); | ||
| return false; | ||
| } | ||
| for (i = 0; i < count; i++) { | ||
| PgMetadataValue elem; | ||
| if (!gguf_read_scalar(r, elem_raw_type, &elem)) { | ||
| free(arr); | ||
| return false; | ||
| } | ||
| arr[i] = elem.as.boolean; | ||
| } | ||
| value->as.bool_array = arr; | ||
| return true; | ||
| } | ||
| case PG_METADATA_TYPE_STRING: { | ||
| PgStringView *arr = calloc(count ? count : 1, sizeof(*arr)); | ||
| if (!arr) { | ||
| reader_err(r, "out of memory"); | ||
| return false; | ||
| } | ||
| for (i = 0; i < count; i++) { | ||
| PgMetadataValue elem; | ||
| if (!gguf_read_scalar(r, elem_raw_type, &elem)) { | ||
| free(arr); | ||
| return false; | ||
| } | ||
| arr[i] = elem.as.string; | ||
| } | ||
| value->as.string_array = arr; | ||
| return true; | ||
| } |
There was a problem hiding this comment.
Preflight GGUF metadata array sizes before allocating.
len is only narrowed to size_t before the per-type calloc(). A malformed GGUF can declare a huge array and force a large allocation attempt before the reader discovers the file does not contain that many elements. Check the minimum encoded payload size against r->end - r->p, and check count <= SIZE_MAX / sizeof(*arr) for each allocated array type before calling calloc().
Guard the declared array length before allocation
+static size_t gguf_scalar_min_size(uint32_t raw_type)
+{
+ switch (raw_type) {
+ case 0:
+ case 1:
+ case 7:
+ return 1;
+ case 2:
+ case 3:
+ return 2;
+ case 4:
+ case 5:
+ case 6:
+ return 4;
+ case 8:
+ return 8; /* string length prefix; payload is checked by rd_string */
+ case 10:
+ case 11:
+ case 12:
+ return 8;
+ default:
+ return 0;
+ }
+}
+
count = (size_t)len;
+ {
+ size_t min_size = gguf_scalar_min_size(elem_raw_type);
+ size_t min_bytes;
+
+ if (min_size == 0 ||
+ !checked_mul_size(count, min_size, &min_bytes) ||
+ min_bytes > (size_t)(r->end - r->p)) {
+ reader_err(r, "GGUF metadata array extends past end of file");
+ return false;
+ }
+ }
value->type = PG_METADATA_TYPE_ARRAY;Apply the same explicit count > SIZE_MAX / sizeof(*arr) guard before each per-type calloc().
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/model/model_file.c` around lines 356 - 458, The code performs calloc()
allocations for metadata arrays without first validating that the multiplication
of count and the element size would not overflow. Before each calloc() call in
the switch statement for cases PG_METADATA_TYPE_U64, PG_METADATA_TYPE_I64,
PG_METADATA_TYPE_F64, PG_METADATA_TYPE_BOOL, and PG_METADATA_TYPE_STRING, add an
overflow check that verifies count does not exceed SIZE_MAX divided by the size
of each array type's element (e.g., count > SIZE_MAX / sizeof(uint64_t) for the
U64 case). If the check fails, call reader_err() with an appropriate message and
return false before attempting the allocation.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/token/tokenizer.c`:
- Around line 259-262: The tokenizer vocabulary is borrowing token text memory
directly from the model file without copying it, creating a dangling pointer
risk if the model is freed before the tokenizer. In the loop where
tok->vocab[i].text is assigned from tokens_entry->value.as.string_array[i], you
need to allocate and copy the string data instead of storing the borrowed
pointer. Ensure proper memory allocation for each token text and use string copy
functions to make the tokenizer own the memory. Apply the same fix to the second
occurrence mentioned in lines 272-279 where similar borrowing occurs.
- Around line 67-68: The buffer capacity doubling arithmetic at line 67
(buf->capacity * 2) and the size multiplication at line 68 (cap * sizeof(*next))
lack overflow protection, which can cause integer wraparound on large inputs and
result in under-allocation followed by out-of-bounds writes. Add overflow checks
before both arithmetic operations to ensure buf->capacity * 2 and cap *
sizeof(*next) do not overflow; if overflow is detected, handle the error
appropriately (such as returning an error or terminating gracefully). Apply the
same fix to the other buffer growth locations mentioned at lines 299-301,
344-346, and 398.
- Around line 238-240: The three metadata_i32 calls for
"tokenizer.ggml.bos_token_id", "tokenizer.ggml.eos_token_id", and
"tokenizer.ggml.unknown_token_id" are casting their return values to void, which
silently ignores any parsing or validation failures. Instead of discarding these
return values, capture the return status from each metadata_i32 call for
tok->bos_id, tok->eos_id, and tok->unk_id, then validate that when these special
token metadata keys are present in the file, they contain valid values (not
malformed or out-of-range). If validation fails, log an appropriate error and
fail the tokenizer initialization rather than falling back to defaults with
potentially corrupt data.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1e828d22-563d-4b03-af90-62f476e8aa16
📒 Files selected for processing (7)
Makefile.bootstrapinclude/peregrine/peregrine.hinclude/peregrine/tokenizer.hmeson.buildsrc/token/tokenizer.ctests/meson.buildtests/test_tokenizer.c
✅ Files skipped from review due to trivial changes (1)
- include/peregrine/peregrine.h
🚧 Files skipped from review as they are similar to previous changes (2)
- meson.build
- Makefile.bootstrap
| for (i = 0; i < tok->vocab_size; i++) { | ||
| tok->vocab[i].text = tokens_entry->value.as.string_array[i]; | ||
| tok->vocab[i].score = scores_entry ? scores_entry->value.as.f64_array[i] : 0.0; | ||
| } |
There was a problem hiding this comment.
PgTokenizer currently borrows token text memory from PgModelFile, creating a dangling-pointer risk.
Line 260 stores PgStringView pointers owned by the model metadata. If the model is freed before the tokenizer, later encode/decode reads invalid memory.
Suggested fix (make tokenizer own token text)
struct PgTokenizer {
TokenEntry *vocab;
+ char **owned_text;
size_t vocab_size;
@@
tok->vocab = calloc(tok->vocab_size ? tok->vocab_size : 1, sizeof(*tok->vocab));
+ tok->owned_text = calloc(tok->vocab_size ? tok->vocab_size : 1, sizeof(*tok->owned_text));
- if (!tok->vocab) {
+ if (!tok->vocab || !tok->owned_text) {
pg_tokenizer_free(tok);
set_err(err, err_len, "out of memory");
return NULL;
}
@@
for (i = 0; i < tok->vocab_size; i++) {
- tok->vocab[i].text = tokens_entry->value.as.string_array[i];
+ PgStringView src = tokens_entry->value.as.string_array[i];
+ char *dup = malloc(src.len ? src.len : 1);
+ if (!dup) {
+ pg_tokenizer_free(tok);
+ set_err(err, err_len, "out of memory");
+ return NULL;
+ }
+ if (src.len)
+ memcpy(dup, src.data, src.len);
+ tok->owned_text[i] = dup;
+ tok->vocab[i].text.data = dup;
+ tok->vocab[i].text.len = src.len;
tok->vocab[i].score = scores_entry ? scores_entry->value.as.f64_array[i] : 0.0;
}
@@
void pg_tokenizer_free(PgTokenizer *tok)
{
+ size_t i;
if (!tok)
return;
+ for (i = 0; i < tok->vocab_size; i++)
+ free(tok->owned_text ? tok->owned_text[i] : NULL);
+ free(tok->owned_text);
free(tok->vocab);
free(tok->hash_ids);
free(tok);
}Also applies to: 272-279
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/token/tokenizer.c` around lines 259 - 262, The tokenizer vocabulary is
borrowing token text memory directly from the model file without copying it,
creating a dangling pointer risk if the model is freed before the tokenizer. In
the loop where tok->vocab[i].text is assigned from
tokens_entry->value.as.string_array[i], you need to allocate and copy the string
data instead of storing the borrowed pointer. Ensure proper memory allocation
for each token text and use string copy functions to make the tokenizer own the
memory. Apply the same fix to the second occurrence mentioned in lines 272-279
where similar borrowing occurs.
Signed-off-by: Zach Bennett <zach@worldflowai.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/cli/peregrine.c (1)
77-89: 💤 Low value
strtoulsilently accepts negative numbers.
strtoul("-5", ...)wraps to a very large positive value (≈ULONG_MAX) instead of failing. If a user accidentally passes a negative-nvalue, generation would run for an extremely long time.Consider adding a leading minus check:
Proposed defensive check
static int parse_size_arg(const char *s, size_t *out) { char *end = NULL; unsigned long v; - if (!s || !*s) + if (!s || !*s || *s == '-') return -1; v = strtoul(s, &end, 10); if (!end || *end != '\0') return -1; *out = (size_t)v; return 0; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cli/peregrine.c` around lines 77 - 89, The parse_size_arg function does not validate against negative input values, allowing strtoul to silently wrap negative numbers to large positive values. After the initial null/empty string check in parse_size_arg, add a validation check to detect if the first character of the input string s is a minus sign and return an error code if it is. This prevents negative values from being passed to strtoul and wrapped to unexpectedly large positive numbers.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/cli/peregrine.c`:
- Line 191: The main usage text for the `run` subcommand is missing the optional
`-n <tokens>` argument which is documented in the internal usage text. Update
the format string in the help text (the line containing " %s run -m
<model.gguf> -p <txt> run greedy f32 GGUF inference\n") to include the optional
`-n <tokens>` option between the existing options and the description, ensuring
consistency with the internal usage documentation shown at lines 116/121.
---
Nitpick comments:
In `@src/cli/peregrine.c`:
- Around line 77-89: The parse_size_arg function does not validate against
negative input values, allowing strtoul to silently wrap negative numbers to
large positive values. After the initial null/empty string check in
parse_size_arg, add a validation check to detect if the first character of the
input string s is a minus sign and return an error code if it is. This prevents
negative values from being passed to strtoul and wrapped to unexpectedly large
positive numbers.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: caf89cdb-2416-4c5f-afbf-05d66876e7d3
📒 Files selected for processing (12)
.github/workflows/ci.ymlMakefile.bootstrapinclude/peregrine/llama.hinclude/peregrine/peregrine.hmeson.buildsrc/cli/peregrine.csrc/runtime/llama.csrc/tensor/kernels/rope/rope_init.csrc/tensor/kernels/rope/x86/rope_avx2.asmtests/checkasm/checkasm_rope.ctests/meson.buildtests/test_llama.c
✅ Files skipped from review due to trivial changes (1)
- tests/checkasm/checkasm_rope.c
🚧 Files skipped from review as they are similar to previous changes (1)
- include/peregrine/peregrine.h
Signed-off-by: Zach Bennett <zach@worldflowai.com>
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/runtime/llama.c (2)
323-334:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReject invalid RMSNorm/RoPE scalar metadata at load time.
NaN,inf, zero, or negative values currently load successfully and then propagate invalid logits through RMSNorm/RoPE. Validate the casted floats before accepting the model.Proposed validation
if (metadata_f64_default(m->file, "llama.attention.layer_norm_rms_epsilon", 1.0e-5, &f) != 0) { set_err(err, err_len, "invalid metadata 'llama.attention.layer_norm_rms_epsilon'"); return -1; } m->norm_eps = (float)f; + if (!isfinite(m->norm_eps) || m->norm_eps <= 0.0f) { + set_err(err, err_len, + "invalid metadata 'llama.attention.layer_norm_rms_epsilon'"); + return -1; + } if (metadata_f64_default(m->file, "llama.rope.freq_base", 10000.0, &f) != 0) { set_err(err, err_len, "invalid metadata 'llama.rope.freq_base'"); return -1; } m->rope_theta = (float)f; + if (!isfinite(m->rope_theta) || m->rope_theta <= 0.0f) { + set_err(err, err_len, "invalid metadata 'llama.rope.freq_base'"); + return -1; + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/runtime/llama.c` around lines 323 - 334, Add validation checks immediately after casting the metadata values to float in both the norm_eps and rope_theta assignments. After the cast in the line assigning to m->norm_eps (the layer_norm_rms_epsilon value), validate that the float is not NaN, not infinite, and greater than zero. Similarly, after the cast in the line assigning to m->rope_theta (the rope.freq_base value), validate that the float is not NaN, not infinite, and greater than zero. If any validation fails, set an appropriate error message using set_err and return -1 to reject the invalid model at load time.
375-380:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDo not silently tie embeddings when
output.weightis present but invalid.Any shape/type mismatch from
load_mat()falls back totoken_embd.weight, masking malformed or unsupported GGUF tensors and producing wrong logits. Only use tied embeddings whenoutput.weightis actually absent.Proposed fix
- if (load_mat(m->file, "output.weight", m->vocab, m->n_embd, - &m->output, NULL, 0) != 0) { + if (pg_model_file_find_tensor(m->file, "output.weight", + strlen("output.weight"))) { + if (load_mat(m->file, "output.weight", m->vocab, m->n_embd, + &m->output, err, err_len) != 0) + return -1; + } else { m->output.data = m->token_embd; m->output.rows = m->vocab; m->output.cols = m->n_embd; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/runtime/llama.c` around lines 375 - 380, The current code at the load_mat call for output.weight unconditionally falls back to using token_embd whenever load_mat returns non-zero, which masks shape and type mismatches. Distinguish between output.weight being genuinely absent from the file versus being present but having incompatible shape or type. Only apply the fallback that ties output to token_embd when output.weight is actually missing; when load_mat detects a shape or type mismatch, the error should propagate so the model loading fails rather than silently producing incorrect logits.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@include/peregrine/llama.h`:
- Around line 20-24: The comments in the PgSamplerParams struct definition on
lines 21 and 23 claim that temperature <= 0 enables greedy mode and top_p <= 0
disables sampling, but the pg_sampler_new() constructor validation actually
rejects temperature < 0 and top_p < 0. Update the comments for the temperature
and top_p fields in the struct to reflect the actual validation behavior by
changing the threshold from <= 0 to < 0, so the documented contract matches what
the constructor enforces.
In `@src/cli/peregrine.c`:
- Around line 91-103: The parse_u64_arg function does not validate for negative
inputs or overflow conditions. Add a check to reject inputs that start with a
minus sign (negative values) before calling strtoull to prevent signed values
like -1 from being wrapped into large unsigned values. Additionally, check the
errno variable after the strtoull call to detect overflow conditions and return
an error code if errno is set to ERANGE, ensuring that values exceeding the
uint64_t range are properly rejected instead of being accepted as valid.
In `@tests/test_llama_cpp_parity.py`:
- Around line 17-23: The env_path function only validates that a path exists but
does not verify it is an executable file, allowing directories or non-executable
files to pass validation and fail later during subprocess execution. Add
validation in the env_path function to check that the path is both a regular
file (using path.is_file()) and is executable (using os.access(path, os.X_OK))
before returning it, ensuring only valid executable paths are returned and
preventing downstream subprocess errors.
---
Outside diff comments:
In `@src/runtime/llama.c`:
- Around line 323-334: Add validation checks immediately after casting the
metadata values to float in both the norm_eps and rope_theta assignments. After
the cast in the line assigning to m->norm_eps (the layer_norm_rms_epsilon
value), validate that the float is not NaN, not infinite, and greater than zero.
Similarly, after the cast in the line assigning to m->rope_theta (the
rope.freq_base value), validate that the float is not NaN, not infinite, and
greater than zero. If any validation fails, set an appropriate error message
using set_err and return -1 to reject the invalid model at load time.
- Around line 375-380: The current code at the load_mat call for output.weight
unconditionally falls back to using token_embd whenever load_mat returns
non-zero, which masks shape and type mismatches. Distinguish between
output.weight being genuinely absent from the file versus being present but
having incompatible shape or type. Only apply the fallback that ties output to
token_embd when output.weight is actually missing; when load_mat detects a shape
or type mismatch, the error should propagate so the model loading fails rather
than silently producing incorrect logits.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: dbad5b90-4d27-438c-ab54-8f81942fa456
📒 Files selected for processing (10)
include/peregrine/llama.hmeson.buildsrc/cli/peregrine.csrc/runtime/llama.csrc/token/tokenizer.ctests/meson.buildtests/test_llama.ctests/test_llama_cpp_parity.pytests/test_tokenizer.ctools/llama_cpp_parity.py
🚧 Files skipped from review as they are similar to previous changes (2)
- meson.build
- tests/test_tokenizer.c
| typedef struct PgSamplerParams { | ||
| float temperature; /* <= 0: greedy */ | ||
| size_t top_k; /* 0: disabled */ | ||
| float top_p; /* <= 0 or >= 1: disabled */ | ||
| uint64_t seed; |
There was a problem hiding this comment.
Align sampler parameter comments with constructor validation.
Line 21 and Line 23 document negative values as greedy/disabled, but pg_sampler_new() rejects temperature < 0 and top_p < 0. Update the comments or allow those values in the constructor so callers can rely on the public contract.
Proposed header-only fix
typedef struct PgSamplerParams {
- float temperature; /* <= 0: greedy */
+ float temperature; /* 0: greedy; < 0 invalid */
size_t top_k; /* 0: disabled */
- float top_p; /* <= 0 or >= 1: disabled */
+ float top_p; /* 0 or >= 1: disabled; < 0 invalid */
uint64_t seed;
} PgSamplerParams;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| typedef struct PgSamplerParams { | |
| float temperature; /* <= 0: greedy */ | |
| size_t top_k; /* 0: disabled */ | |
| float top_p; /* <= 0 or >= 1: disabled */ | |
| uint64_t seed; | |
| typedef struct PgSamplerParams { | |
| float temperature; /* 0: greedy; < 0 invalid */ | |
| size_t top_k; /* 0: disabled */ | |
| float top_p; /* 0 or >= 1: disabled; < 0 invalid */ | |
| uint64_t seed; |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@include/peregrine/llama.h` around lines 20 - 24, The comments in the
PgSamplerParams struct definition on lines 21 and 23 claim that temperature <= 0
enables greedy mode and top_p <= 0 disables sampling, but the pg_sampler_new()
constructor validation actually rejects temperature < 0 and top_p < 0. Update
the comments for the temperature and top_p fields in the struct to reflect the
actual validation behavior by changing the threshold from <= 0 to < 0, so the
documented contract matches what the constructor enforces.
| static int parse_u64_arg(const char *s, uint64_t *out) | ||
| { | ||
| char *end = NULL; | ||
| unsigned long long v; | ||
|
|
||
| if (!s || !*s) | ||
| return -1; | ||
| v = strtoull(s, &end, 0); | ||
| if (!end || *end != '\0') | ||
| return -1; | ||
| *out = (uint64_t)v; | ||
| return 0; | ||
| } |
There was a problem hiding this comment.
Reject signed/overflowed --seed inputs in parse_u64_arg.
parse_u64_arg currently accepts values like -1 (wrapped) and does not reject overflow via errno, so invalid seed input can be treated as a valid large value instead of returning usage error.
Suggested fix
+#include <errno.h>
@@
static int parse_u64_arg(const char *s, uint64_t *out)
{
char *end = NULL;
unsigned long long v;
if (!s || !*s)
return -1;
- v = strtoull(s, &end, 0);
- if (!end || *end != '\0')
+ if (*s == '-')
+ return -1;
+ errno = 0;
+ v = strtoull(s, &end, 10);
+ if (errno == ERANGE || end == s || !end || *end != '\0')
return -1;
*out = (uint64_t)v;
return 0;
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/cli/peregrine.c` around lines 91 - 103, The parse_u64_arg function does
not validate for negative inputs or overflow conditions. Add a check to reject
inputs that start with a minus sign (negative values) before calling strtoull to
prevent signed values like -1 from being wrapped into large unsigned values.
Additionally, check the errno variable after the strtoull call to detect
overflow conditions and return an error code if errno is set to ERANGE, ensuring
that values exceeding the uint64_t range are properly rejected instead of being
accepted as valid.
| def env_path(name): | ||
| value = os.environ.get(name) | ||
| if not value: | ||
| return None | ||
| path = pathlib.Path(value) | ||
| return path if path.exists() else None | ||
|
|
There was a problem hiding this comment.
Validate executable tool paths before invoking subprocesses.
Current checks only verify existence. A directory or non-executable file can pass validation and then fail at subprocess.run instead of cleanly skipping with a prerequisite message.
Suggested fix
def env_path(name):
@@
path = pathlib.Path(value)
return path if path.exists() else None
+
+
+def is_executable(path):
+ return path is not None and path.is_file() and os.access(path, os.X_OK)
@@
- if not peregrine.exists():
+ if not is_executable(peregrine):
return skip("peregrine CLI was not built")
@@
- if llama_tokenize is None:
+ if not is_executable(llama_tokenize):
return skip("set PG_LLAMA_CPP_TOKENIZE to llama-tokenize")
- if llama_debug is None:
+ if not is_executable(llama_debug):
return skip("set PG_LLAMA_CPP_DEBUG to llama-debug")Also applies to: 37-44
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/test_llama_cpp_parity.py` around lines 17 - 23, The env_path function
only validates that a path exists but does not verify it is an executable file,
allowing directories or non-executable files to pass validation and fail later
during subprocess execution. Add validation in the env_path function to check
that the path is both a regular file (using path.is_file()) and is executable
(using os.access(path, os.X_OK)) before returning it, ensuring only valid
executable paths are returned and preventing downstream subprocess errors.
Signed-off-by: Zach Bennett <zach@worldflowai.com>
Signed-off-by: Zach Bennett <zach@worldflowai.com>
|
M3 decode perf update (28f9a5d):
Validation run locally: ARM release/debug Meson tests, x86 Meson tests, bootstrap checkasm. GitHub arm64/x86_64/CodeQL/CodeRabbit are green. |
Signed-off-by: Zach Bennett <zach@worldflowai.com>
|
Update after |
What & why
Builds the first runnable M3 slice:
peregrine run -m model.gguf -p "..."now loads an open GGUF model, tokenizes a prompt, executes a f32 Llama-class decoder, and greedily emits tokens.include/peregrine/llama.hplus an f32 GGUF Llama decoder with contiguous KV cache.runcommand for greedy generation.rope_f32.avx2under forced AVX2.Current limits are explicit: f32 GGUF Llama only, greedy sampling only, approximate tokenizer parity, and unfused attention. Quantized/f16 decode, top-k/top-p/temp, and llama.cpp parity checks remain follow-ups.
Verification
Forced x86 AVX2 correctness for the new RoPE assembly is covered by the updated GitHub Actions QEMU/SDE expected-variant lists.
Sign-off
Signed-off-by:(DCO)Summary by CodeRabbit
peregrineCLI subcommands:inspect,tokenize,logits, and a fullrunloop for greedy generation.