Conversation
example/ (singular) was added to the build unconditionally but no CI job ever compiled it, so it drifted against the current factory layer-init API and no longer builds (cmake --build --preset examples on the default 'all' target fails). Delete it: examples/ (plural) is the canonical example location, and UnitTestMnistSmoke keeps MNIST end-to-end coverage in CI. Also switch the clang-format gate (ci.yml + its devenv.nix mirror) from the now-deleted 'example' to 'examples'. This removes the dangling find path and closes the pre-existing blind spot where the plural examples' C files were never format-checked (verified all currently clean). Closes #235 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… parity
The Conv1d/Linear/Conv1dTransposed factories hardcoded KAIMING_UNIFORM gain=sqrt(2) (He) for weights and zeroed biases - flagged in-code as deferred to 'Issue C'. That could not reproduce PyTorch's default init, so train-from-scratch parity demos diverged.
Add a weightInit_t {initScheme_t scheme; float gain;} field to each init struct (zero-init -> INIT_DEFAULT, mirroring the existing bias_t idiom). Route all three factories' weight+bias allocation through shared helpers initWeightTensor/initBiasTensor (new src/userApi/LayerCommon.c, now a compiled static lib).
Schemes: INIT_DEFAULT -> kaimingUniform(gain=sqrt(1/3), fan_in) = uniform(+/-1/sqrt(fan_in)), exactly PyTorch's kaiming_uniform_(a=sqrt(5)) weight default; bias (all schemes) -> uniform(+/-1/sqrt(fan_in)) per PyTorch; INIT_KAIMING_UNIFORM -> He (gain sqrt(2) default, overridable); INIT_XAVIER_UNIFORM -> Glorot (gain 1 default, overridable). Fan modes match PyTorch _calculate_fan_in_and_fan_out per layout (Conv1d in*k, Linear in, ConvT out*k).
TDD: seeded statistical default-bound + override value tests added to UnitTestConv1dApi / UnitTestLinear / UnitTestConv1dTransposedApi; mutation-verified non-vacuous. Existing structural tests unaffected; 62/62 ctest. Implements the deferred Issue C (distribution parametrization for PyTorch-compatible init).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…r example The factory-API migration left a parallel _v2 C trainer per example (har_classifier_v2, ecg_anomaly_ae_v2) alongside the original Legacy-API train_c.c, which was redundant, never built in CI, and pinned to deprecated APIs. Collapse to one directory per example. Move each _v2 train_c.c (factory API + StateDictApi) into its parent dir; delete the Legacy train_c.c and the _v2 dirs; targets keep their canonical names (train_c_har_classifier, train_c_ecg_anomaly_ae). Rewire the CI bit-parity job + examples/CMakeLists.txt to the consolidated targets/paths. Rewrite the two READMEs to the real flow (bit-parity primary). Make ECG compare.py informational: the train-from-scratch comparison is a sanity check, not a gate (independent init + a C-vs-PyTorch training-dynamics difference push the anomaly AUC outside tolerance; bit-parity is the exact gate). Bit-parity verified identical for both examples (HAR int32, ECG float allclose). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…guard) The bit-parity job built only the two named example targets, so any other example executable was never compiled in CI - exactly how example/MnistExperiment (#235) and the legacy v1 trainers rotted unnoticed. Build the default 'all' target instead; a future example that fails to compile now fails CI. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
torchaudio 2.11 (maintenance mode) routes its dataset decode through torchcodec (needs a system FFmpeg), so iterating SPEECHCOMMANDS raised ImportError at prepare time. Switch to ds.get_metadata (no decode) + a stdlib `wave` reader (int16 PCM / 32768) — the fallback the spec blessed, byte-identical output, no torchcodec/FFmpeg/scipy dependency. Also make the loader path-based: collect paths per label, then decode only the clips a split keeps (all 4 keywords + the sampled "unknown"). Peak RAM for 6-class drops ~5.8 GB -> ~1.4 GB, fitting the 7 GB CI runner.
The stdlib wave reader interprets frames as int16/32768; a non-mono or non-16-bit clip would be silently misdecoded. Assert the format (the corpus is uniformly 16 kHz mono 16-bit, so this never trips — it guards a future corpus swap). Flagged by the PR-A final review as the one silent-decode path.
…ence The raw model at the spec lr=0.001/15ep never escapes random init (flat loss, all-class-0 predictions) → a degenerate bit-parity gate. Add a rate-agnostic LayerNorm(64) on the pooled features before fc and tune to lr=0.005 / 20 epochs: the model now reaches test_acc 0.588 with predictions spread across all 6 classes, making the AvgPool[1,16000] §11.3 bit-parity proof meaningful (C reproduces the diverse PyTorch predictions exactly: 2483/2483 int32-identical). LayerNorm is the framework's only bit-parity-covered normalizer (BatchNorm is not), so the gate is preserved. C model: MODEL_SIZE 14->15, layerNormLayerInit at model[12] (normalizedShape=[64], eps 1e-5 = PyTorch default), 5-entry state-dict (+ln gamma/beta), +LayerNormApi/LayerNorm link libs. kws_mfcc (shipped) is unchanged.
…ing fragile end-LN
A 10-seed x 3-placement x 3-lr sweep (50 epochs) showed the previously-shipped
end-feature LayerNorm(64) is the WORST option: 0.47 +/- 0.25 test_acc, collapsing
to a one-class reference on ~40% of seeds (the original seed-42 pick was lucky).
Per-conv LayerNorm([C,L]) over each conv's full feature map (pre-ReLU) is the best
and most stable: 0.72 +/- 0.01, all 10 seeds converge across all 6 classes. (Plain
no-LayerNorm also trains fine at 50 epochs, 0.70 +/- 0.02 — the raw model was never
un-trainable, just slow; LayerNorm is kept as the framework's bit-parity-covered
normalizer and to exercise it in the gate.)
Model: 3x [Conv1d -> LayerNorm([C,L]) -> ReLU -> MaxPool(4)] (shapes [16,1000],
[32,250], [64,62]), lr=0.005, 50 epochs. C: MODEL_SIZE 15->17, three
layerNormLayerInit(numNormDims=2, eps=1e-5) at model[2]/[6]/[10], 7-entry
state-dict {conv1,ln1,conv2,ln2,conv3,ln3,fc}. Gate PASSES bit-identical (2483/2483)
with diverse predictions across all 6 classes -- first bit-parity exercise of a
multi-dim [C,L] LayerNorm in an example.
…ce sink Adds TraceApi.h with traceSink_t typedef; extracts the body of calculateGradsSequential into a static calculateGradsImpl(..., sink, sinkCtx) with three guarded sink calls (fwd/lossgrad/agrad). calculateGradsSequential becomes a NULL-sink wrapper — production path byte-identical. Characterisation test (UnitTestCalculateGradsSequential) pins closed-form CE+softmax gradients and guards the refactor.
Subagent edits skipped the clang-format PostToolUse hook; this brings the Phase-1 unit test into clang-format-21 compliance (braces on the guard return, call-arg reflow). Production .c/.h were already clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…yer dumps Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…yTorch hooks) The "agrad" probe was firing gradCurr (∂L/∂layer_input) after backward, so e.g. conv1.agrad had shape [1,1,1000] instead of the expected [1,16,1000], breaking comparison with PyTorch's activation.grad. Move the sink call to the top of the backward loop and fire gradNext (the upstream wire gradient = ∂L/∂layer_output) before initiating gradCurr or calling backward. With sink==NULL the production path is byte-identical to before. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…eep aggregator
Add --abs-floor (default 1e-4) to trace_compare.py: drift now requires BOTH
abs error > floor AND relative jump > JUMP_FACTOR, preventing near-zero act-grads
(abs ~3e-7) from firing a spurious flag while still surfacing param-grad signal.
Extract compare_pairs(c_dir, pt_dir) -> list[dict] as a reusable function
returning {probe, phase, tier, max_abs, max_rel} for every matched pair.
Add examples/_shared/trace_sweep.py: runs N non-overlapping batches, calls
compare_pairs per batch, aggregates mean/max per (probe, phase), and prints
a full tier-sorted table plus a focused param-grad summary sorted descending
by mean_abs — the key signal for which layer diverges robustly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Subagent edits to npy_dump_sink.c and trace_c.c skipped the clang-format PostToolUse hook; bring them into clang-format-21 compliance. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… cleanup, messages - Assert ndim==2 for all 4 events in testTracedGradsFiresInOrder (spec §7 discharged) - Add structural comment: tracedGrads/calculateGradsSequential share calculateGradsImpl - compare_dir returns (rc, first_drift) tuple; self_test asserts first-drift probe+phase - trace_pytorch: assert list(acts)==FWD_PROBES before return (manifest drift guard) - npy_dump_sink.h: fix comment "asserts" -> "hard-errors (exit 1)" - trace_c.c batch-clamp fprintf: show requested g_batch alongside clamped effB - trace_sweep: rmtree dump dirs before each batch (stale .sNN guard); relabel mean_abs -> mean(maxabs) in aggregate + focused-summary tables Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ed docs Honesty fixes: - Quantization.h: drop the stale "grad accumulators are value-sums, int16" comment -- weightGrad is a sum of products; grad storage-type is under #261. - QuantizationApi.h: fix the quantizationInitSymInt32WithBits docstring -- the plain initializer uses the int12 operand default (not "hardcodes 16"), and drop the "32 bits for full int32 range" over-promise (not cast-safe, #202). - CONVENTIONS.md: correct the value-sums claim for grad accumulators. Conventions reorg: - Split docs/CONVENTIONS.md into per-subsystem docs/conventions/*.md (verbatim moves); CONVENTIONS.md becomes a thin index + the memory-over-accuracy vision. - Add the "SYM_INT32 is a compute format, not storage (#261)" convention. No behavior change. Build clean; 63/63 unit tests pass. Relates to #261. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Warning Review limit reached
Next review available in: 45 minutes Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available. How can I continue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews. How do review limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please refer docs for additional details. Review details⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (9)
📝 WalkthroughWalkthroughThis PR adds new MNIST MLP/CNN and KWS (MFCC and raw-waveform) parity examples with shared MNIST/SpeechCommands data loaders, trace/dump/compare tooling, and a BIT_PARITY exact-match mode; introduces a shared weight-initialization API adopted by Conv1d/Conv1dTransposed/Linear; adds a gradient-tracing API; updates CI, CMake, and READMEs; and removes legacy v2/MnistExperiment examples. ChangesWeight Initialization and Gradient Tracing APIs
Estimated code review effort: 4 (Complex) | ~75 minutes New Example Suite and Shared Infrastructure
Estimated code review effort: 5 (Critical) | ~120 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 inconclusive)
✅ Passed checks (3 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 13
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
examples/ecg_anomaly_ae/train_c.c (1)
361-369: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winRestore
c_train_recons.npyfor the train-from-scratch path.Lines 361-363 now only write
c_reconstructions.npy, butexamples/ecg_anomaly_ae/compare.pystill unconditionally loadsoutputs/c_train_recons.npy, and the README tells users to run that script after training. As-is, the documented demo crashes before it can emit the informational parity report or plots.Proposed fix
int status = 0; + if (bitParity == NULL || bitParity[0] == '\0') { + int trainRc = writeAllReconstructions(model, MODEL_SIZE, getTrainSample, getTrainSize(), + "examples/ecg_anomaly_ae/outputs/c_train_recons.npy"); + if (trainRc != 0) { + fprintf(stderr, "ERROR: c_train_recons.npy write failed (rc=%d)\n", trainRc); + status = 1; + } + } + int rc = writeAllReconstructions(model, MODEL_SIZE, getTestSample, getTestSize(), "examples/ecg_anomaly_ae/outputs/c_reconstructions.npy"); if (rc != 0) { fprintf(stderr, "ERROR: c_reconstructions.npy write failed (rc=%d)\n", rc); status = 1;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/ecg_anomaly_ae/train_c.c` around lines 361 - 369, The train-from-scratch path in writeAllReconstructions currently only emits c_reconstructions.npy, but compare.py and the README still expect outputs/c_train_recons.npy. Update the reconstruction-writing block around writeAllReconstructions/getTestSample so the training path also produces c_train_recons.npy (or otherwise matches the filename compare.py loads), keeping the existing c_reconstructions.npy output intact if it is still needed.
🟡 Minor comments (10)
examples/kws_mfcc/train_c.c-120-124 (1)
120-124: 🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick winFail fast on labels outside
g_numClasses.With the runtime class count, a stale/mismatched label like
34in 6-class mode silently becomes an all-zero target. Validateclsbefore filling the one-hot tensor.Proposed fix
int32_t cls = ((int32_t *)intLabels->array[i]->data)[0]; + if (cls < 0 || (size_t)cls >= g_numClasses) { + fprintf(stderr, "Invalid label %d for %zu classes\n", cls, g_numClasses); + exit(1); + } float *data = (float *)t->data;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/kws_mfcc/train_c.c` around lines 120 - 124, The one-hot label construction in the training path silently accepts out-of-range class labels and produces an all-zero target; add a bounds check for cls before the loop that fills data. In the label handling logic around intLabels->array[i]->data, validate that cls is within [0, g_numClasses) and fail fast if it is not, rather than writing the tensor in train_c.c.examples/kws_mfcc/train_c.c-68-73 (1)
68-73: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winReject partially parsed
KWS_CLASSESvalues.strtol(env, NULL, 10)accepts prefixes like6fooand still returns 6; require anendptrcheck so only exact6or35are accepted.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/kws_mfcc/train_c.c` around lines 68 - 73, The KWS_CLASSES parsing in the environment-value helper currently accepts partially parsed strings because `strtol` is called with a null end pointer. Update the parsing logic in the function that validates `KWS_CLASSES` to use an `endptr` and reject any value where extra characters remain after the number, so only exact `6` or `35` are accepted; keep the existing fallback and error message behavior for invalid input.src/userApi/training_loop/calculate_grads/include/TraceApi.h-23-25 (1)
23-25: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winFix the
agradcontract text.The implementation emits
"agrad"beforebackward(...)and exposes the gradient entering that layer's backward pass, not a post-backward tensor. Saying "after each layer's backward" points consumers at the wrong quantity.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/userApi/training_loop/calculate_grads/include/TraceApi.h` around lines 23 - 25, Update the `TraceApi.h` contract comment for the trace points emitted by `calculateGradsSequential` / the `sink` callback so the `"agrad"` description matches the implementation. The issue is that `"agrad"` is fired before `backward(...)` and represents the gradient entering each layer’s backward pass, not a post-backward result; revise the wording in the `calculateGradsSequential` comment to reflect that pre-backward gradient input semantics.examples/mnist_mlp/train_c.c-217-223 (1)
217-223: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winHonor the logged seed on the C training path.
Line 221 builds the randomly initialized layers before any C-side RNG seeding happens, but Lines 252-255 still record
"seed": 42inc.json. That makes the train-from-scratch run non-reproducible from its own logged config. Seed the framework RNG immediately beforebuildModel()or drop the field if this path is intentionally unseeded.Also applies to: 252-255
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/mnist_mlp/train_c.c` around lines 217 - 223, The C training path is initializing the model in buildModel() before the framework RNG is seeded, while the generated c.json still hardcodes "seed": 42. Update the training flow around buildModel() to seed the RNG first so the logged seed matches the actual initialization, or remove the seed field from the config if this path is meant to stay unseeded. Make sure the change keeps the logged configuration and the model initialization order consistent.examples/mnist_mlp/compare.py-39-43 (1)
39-43: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winReject mismatched prediction and label lengths before building the confusion matrix.
zip(preds, labels)silently truncates to the shorter input here. A partialc_predictions.npywould still produce plots and a parity report over a truncated test set instead of failing fast.Proposed fix
def confusion_matrix(preds: np.ndarray, labels: np.ndarray, num_classes: int) -> np.ndarray: + if preds.shape[0] != labels.shape[0]: + raise ValueError( + f"prediction/label length mismatch: {preds.shape[0]} != {labels.shape[0]}" + ) cm = np.zeros((num_classes, num_classes), dtype=np.int64) for p, a in zip(preds, labels): cm[int(p), int(a)] += 1 return cm🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/mnist_mlp/compare.py` around lines 39 - 43, The confusion_matrix helper currently iterates with zip(preds, labels), which silently truncates mismatched inputs and can produce a partial report. Update confusion_matrix to validate that preds and labels have the same length before the loop and raise a clear error if they differ, so callers like the compare.py plotting/parity flow fail fast instead of using incomplete data.examples/kws_raw/README.md-8-11 (1)
8-11: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winMention the LayerNorm lengths in the rate-change note.
Changing
Kalso changes the hard-codedLayerNorm([C, L])shapes intrain_c.c/trace_c.c; updating only the MaxPoolinputLengths leaves the C model inconsistent.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/kws_raw/README.md` around lines 8 - 11, Update the rate-change note to mention that changing K also affects the hard-coded LayerNorm([C, L]) shapes in train_c.c and trace_c.c, not just the MaxPool inputLength values. Make the README guidance around K and effective rate explicit that both the pooling input lengths and LayerNorm lengths must be kept in sync with the new sample rate, using the existing AdaptiveAvgPool1d(1) and train_c.c/trace_c.c references to locate the affected model setup.examples/kws_raw/trace_pytorch.py-35-40 (1)
35-40: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick winSplit the semicolon-chained statements.
Ruff is already flagging E702 on Lines 35-40, 60, and 68, so this file will not pass the configured Python lint as written.
Also applies to: 60-60, 68-68
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/kws_raw/trace_pytorch.py` around lines 35 - 40, The semicolon-chained statements in trace_pytorch.py trigger Ruff E702, so split them into separate lines to satisfy linting. Update the chained assignments in the tracing logic around the activations collection, including the blocks that use acts and the later occurrences in the same function, so each assignment or function call stands alone. Keep the existing behavior in the traced forward path and preserve the same variable names like acts, h, model.conv1, model.ln1, F.relu, and F.max_pool1d while removing all semicolons.Source: Linters/SAST tools
examples/_shared/mnist_data.py-19-23 (1)
19-23: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winUse a real split check here.
With
python -O, thisassertis removed and any unsupportedsplitfalls through totrain=False, so"validation"silently loads the test set. RaiseValueErrorbefore constructing the dataset instead.Proposed fix
def load_mnist(root: str | Path, split: str) -> tuple[np.ndarray, np.ndarray]: - assert split in ("train", "test"), split + if split not in ("train", "test"): + raise ValueError(f"unsupported MNIST split: {split}") ds = datasets.MNIST(🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/_shared/mnist_data.py` around lines 19 - 23, The split validation in load_mnist relies on an assert, which disappears under optimized Python and can let unsupported values fall through to datasets.MNIST as the test split. Replace the assert with an explicit runtime check in load_mnist that raises ValueError for any split other than "train" or "test", and only construct datasets.MNIST after the split has been validated.examples/_shared/npy_dump_sink.c-23-23 (1)
23-23: 🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick winReject probe indexes above
numProbes.The header contract only reserves
layerIdx == numProbesfor the loss probe. Mapping every larger index to"loss"silently aliases bad indexes onto the same output filename and can mask an upstream tracing bug.Proposed fix
- const char *probe = (layerIdx < ctx->numProbes) ? ctx->probeNames[layerIdx] : "loss"; + const char *probe = NULL; + if (layerIdx < ctx->numProbes) { + probe = ctx->probeNames[layerIdx]; + } else if (layerIdx == ctx->numProbes) { + probe = "loss"; + } else { + fprintf(stderr, "npyDumpSink: invalid probe index %zu (numProbes=%zu)\n", layerIdx, + ctx->numProbes); + exit(1); + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/_shared/npy_dump_sink.c` at line 23, Reject probe indexes greater than numProbes in the probe selection logic: in npy_dump_sink.c, the ternary assigning probe currently maps every layerIdx above ctx->numProbes to "loss", which can hide invalid inputs. Update the probe resolution so only layerIdx == ctx->numProbes uses the loss probe and any layerIdx > ctx->numProbes is treated as an error or otherwise rejected before reaching the filename/output path.examples/_shared/speechcommands_data.py-133-150 (1)
133-150: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winValidate
num_classeswithoutassert.Under
python -O, the assertion disappears and unsupported values fall into the 6-class branch, so the loader returns the wrong label space instead of failing fast.Proposed fix
def load_speechcommands(root, num_classes: int) -> dict: - assert num_classes in (6, 35), num_classes + if num_classes not in (6, 35): + raise ValueError(f"unsupported num_classes: {num_classes}") root = Path(root)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/_shared/speechcommands_data.py` around lines 133 - 150, The num_classes guard in load_speechcommands currently relies on assert, which can be stripped under optimized Python and let invalid values proceed into the 6-class path. Replace the assert in load_speechcommands with an explicit runtime check that rejects unsupported num_classes values and raises a clear exception before any dataset splitting happens. Keep the existing validation logic around the 35-class keyword set, but ensure the initial num_classes validation is always enforced regardless of interpreter flags.
🧹 Nitpick comments (5)
.github/workflows/ci.yml (1)
156-170: 🚀 Performance & Scalability | 🔵 Trivial | 💤 Low valueMinor inconsistency: MNIST raw data cached per-example instead of shared.
examples/mnist_mlp/data/rawandexamples/mnist_cnn/data/rawcache the same underlying MNIST dataset separately, whereas the new SpeechCommands raw cache (Lines 166-170) is shared once underexamples/_shared/data/speech_commandsfor bothkws_mfccandkws_raw. Ifexamples/_shared/mnist_data.pydownloads to a shared location, consider pointing both MNIST examples' raw cache at that single shared directory to avoid duplicate downloads/cache entries. Given MNIST's small size, impact is minimal.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/ci.yml around lines 156 - 170, The raw dataset cache setup is duplicating MNIST downloads across both examples instead of using a shared location like the SpeechCommands cache. Update the cache paths in the CI workflow so `mnist_mlp` and `mnist_cnn` point to the shared MNIST raw directory used by `examples/_shared/mnist_data.py`, and adjust the cache key inputs to reference the shared prepare/download script or other shared source of truth. Keep the `actions/cache` block aligned with the shared-data pattern used by `examples/_shared/data/speech_commands` so both MNIST examples reuse the same cached raw dataset.test/unit/userAPI/UnitTestConv1dApi.c (1)
324-336: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAssert the override-path bias is still sampled.
biasMaxAbs == 0still passes this test today, so a regression that accidentally zero-initializes bias whenweightInitis overridden would slip through. Add the same> 0.0fcheck used in the default-init case.Suggested patch
TEST_ASSERT_TRUE_MESSAGE(weightMaxAbs > defaultBound, "He override did not widen weights beyond the PyTorch default bound"); TEST_ASSERT_TRUE_MESSAGE(weightMaxAbs <= heBound * 1.001f, "He weights exceed the sqrt(6)/sqrt(fan_in) bound"); + TEST_ASSERT_TRUE_MESSAGE(biasMaxAbs > 0.0f, + "Bias became zero under the override path"); TEST_ASSERT_TRUE_MESSAGE(biasMaxAbs <= defaultBound * 1.001f, "Bias must stay PyTorch default uniform regardless of weight scheme");🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/unit/userAPI/UnitTestConv1dApi.c` around lines 324 - 336, The override-path bias assertion in UnitTestConv1dApi.c is missing a non-zero sampling check, so a zero-initialized bias can still pass; update the bias validation near the existing maxAbsFloat(cfg->bias->param) check to also assert biasMaxAbs > 0.0f, matching the default-init test behavior while keeping the current default-bound check intact.test/unit/layer/UnitTestLinear.c (1)
1450-1455: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAssert the override-path bias is still sampled.
The current assertion set would still pass if the Xavier path accidentally produced all-zero bias. Adding the same nonzero check as the default-init test will pin the intended contract.
Suggested patch
TEST_ASSERT_TRUE_MESSAGE(weightMaxAbs > defaultBound, "Xavier override did not change weights away from the default bound"); TEST_ASSERT_TRUE_MESSAGE(weightMaxAbs <= xavierBound * 1.001f, "Xavier weights exceed the sqrt(6/(fan_in+fan_out)) bound"); + TEST_ASSERT_TRUE_MESSAGE(biasMaxAbs > 0.0f, + "Bias became zero under the override path"); TEST_ASSERT_TRUE_MESSAGE(biasMaxAbs <= defaultBound * 1.001f, "Bias must stay PyTorch default uniform regardless of weight scheme");🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/unit/layer/UnitTestLinear.c` around lines 1450 - 1455, The Xavier override test in UnitTestLinear should also verify that the bias is actually sampled and not left all-zero. Update the assertions around the existing weight and bias checks in the linear unit test to include the same nonzero bias validation used by the default-initialization test, so the override path is pinned to a real sampled bias rather than only checking it stays within bounds.test/unit/userAPI/UnitTestConv1dTransposedApi.c (1)
293-298: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAssert the override-path bias is still sampled.
This only proves the bias stays within the old bound. It does not fail if the override path starts zero-initializing bias, because
0 <= defaultBoundstill passes.Suggested patch
TEST_ASSERT_TRUE_MESSAGE(weightMaxAbs > defaultBound, "He override did not widen weights beyond the PyTorch default bound"); TEST_ASSERT_TRUE_MESSAGE(weightMaxAbs <= heBound * 1.001f, "He weights exceed the sqrt(6)/sqrt(fan_in) bound"); + TEST_ASSERT_TRUE_MESSAGE(biasMaxAbs > 0.0f, + "Bias became zero under the override path"); TEST_ASSERT_TRUE_MESSAGE(biasMaxAbs <= defaultBound * 1.001f, "Bias must stay PyTorch default uniform regardless of weight scheme");🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/unit/userAPI/UnitTestConv1dTransposedApi.c` around lines 293 - 298, The override-path bias check in the transposed conv unit test only verifies the value stays within the default bound, so it can miss a zero-initialized bias. Update the assertions in UnitTestConv1dTransposedApi to also verify the bias is actually sampled/non-zero for the override path, using the existing weight and bias max-abs checks in the same test case, while keeping the current bound validation intact.examples/har_classifier/CMakeLists.txt (1)
51-55: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winMake
Distributionstransitively owned byLayerCommonif that's what Line 55 is satisfying.
examples/har_classifier/train_c.cno longer uses the distribution API directly, so linkingDistributionshere makes every consumer ofLayerCommonknow one of its private implementation details. If this was added only because the new init helpers pull it in, move that dependency to theLayerCommontarget insrc/userApi/CMakeLists.txtinstead.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/har_classifier/CMakeLists.txt` around lines 51 - 55, `Distributions` should not be linked directly from the `examples/har_classifier` target list if `train_c.c` no longer uses it; this leaks a private dependency through `LayerCommon`. Update the `LayerCommon` target in `src/userApi/CMakeLists.txt` so it owns the `Distributions` dependency transitively, and remove `Distributions` from the `har_classifier` link list so consumers only depend on `LayerCommon`.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@examples/_shared/trace_sweep.py`:
- Around line 120-121: The aggregate in trace_sweep.py is currently allowing
mismatched dump filename sets to slip through because compare_pairs() skips
missing counterparts and batch_results still gets appended. In the loop that
builds pairs and appends to batch_results, add a validation step using the
dump_c and dump_pt filename sets before calling trace_compare.compare_pairs() or
recording the result, and fail fast when they differ. Use the existing
trace_compare.compare_pairs and batch_results flow as the place to enforce this
check.
In `@examples/kws_raw/compare.py`:
- Around line 78-84: The compare.py parity report is meant to be informational
only, but the current final return in the main flow still turns any mismatch
into a nonzero exit. Update the result handling near the overall_pass summary so
the script always exits successfully after printing the report, and keep the
existing parity output logic in compare.py and any surrounding main entrypoint
consistent with that non-failing contract.
- Around line 47-50: The confusion_matrix function currently uses zip(preds,
labels), which silently truncates when the arrays differ in length. Add an
explicit length check at the start of confusion_matrix (or in the caller that
prepares preds and labels) and raise a clear error if the lengths do not match
before building the matrix. Keep the fix localized around confusion_matrix so
the mismatch is caught before any counting happens.
In `@examples/kws_raw/trace_c.c`:
- Around line 368-372: The per-step dump directory naming in the trace
generation loop is deterministic, so reruns can reuse stale contents from prior
executions. Update the step directory handling in the loop around ensureDir and
snprintf in trace_c.c so each step dump directory is either freshly cleared
before use or creation fails on an existing directory (including treating EEXIST
as an error), preventing mixed old/new .npy files from being read by the
comparer.
In `@examples/kws_raw/trace_pytorch.py`:
- Around line 11-13: The fixed output path in trace_pytorch.py can leave stale
*.sNNN.npy files from earlier runs, so start each run with a clean
dump_pt/step000 directory or write into a fresh step-specific directory. Update
the step/activation dump logic in the main tracing flow (where the output path
is created and files are written) to remove any existing directory before saving
new samples, using the existing path-building symbols in trace_pytorch.py.
In `@examples/kws_raw/train_c.c`:
- Around line 44-45: `buildModel()` is using random initialization without
applying `SEED`, so training is not actually reproducible even though the log
says seed 42. In `train_c.c`, make sure the C RNG is seeded with `SEED`
immediately before `buildModel()` is called, and keep the existing
`SEED`/`SHUFFLE_SEED` constants aligned with that setup so the model
construction uses the intended deterministic state. Also apply the same seeding
fix in the other referenced location so both paths behave consistently.
- Around line 414-419: Guard the prediction buffer allocation in train_c.c by
validating the size calculation before calling malloc: the numTest value from
getTestSize() can overflow when multiplied by sizeof(int32_t), causing an
undersized predictions buffer. Update the allocation path near getTestSize() and
predictions so it checks for overflow or ensures numTest is within a safe bound
before allocating, and fail cleanly if the requested buffer size is too large.
In `@examples/mnist_cnn/compare.py`:
- Around line 39-43: The prediction comparison flow is silently truncating
mismatched arrays because confusion_matrix zips preds and labels, so a partial
predictions file can pass instead of failing. Add an explicit length check in
the compare.py validation path before calling confusion_matrix, using the loaded
test_y and each prediction array, and raise an error if any prediction length
differs; keep the check near the code that loads predictions and invokes
confusion_matrix so failures surface early.
- Around line 70-76: Update the compare script so the informational parity
report never fails the process; in compare.py, keep the reporting in the main
compare routine (the print loop and overall summary) but change the return value
logic at the end so it always exits successfully instead of returning 1 when
overall_pass is false. Make sure the behavior of the compare command stays
aligned with the module docstring and README description of the step as
informational, not a CI gate.
In `@examples/mnist_mlp/compare.py`:
- Around line 70-76: The parity comparison in compare.py is informational-only,
but the current return path in the main comparison flow still exits with a
failure status when overall_pass is false. Update the logic around the results
loop and final return so the script always completes successfully for tolerance
drift, keeping the printed PASS/FAIL summary as informational output only. Use
the compare.py comparison function and the overall_pass/result handling to
locate the change, and remove any nonzero exit behavior from this reporting
path.
In `@src/userApi/layer/Conv1dApi.c`:
- Around line 108-112: The Conv1d Xavier initialization is computing fan-out
with grouped channels instead of the full output-channel count. Update the
fan-out calculation in the Conv1d weight initialization path so the code in the
Conv1dApi logic uses outChannels * kernelSize rather than dividing by groups,
while keeping fan-in based on the per-group input channels. Preserve the
existing initWeightTensor call and adjust the local fanOut symbol accordingly.
In `@src/userApi/training_loop/calculate_grads/CalculateGradsSequential.c`:
- Around line 140-156: `traceModelParams()` currently calls `sink(ctx, ...)`
unconditionally, so a NULL trace sink will crash even though the no-op path is
expected; add an early guard in `traceModelParams()` (before the loop) to return
immediately when `sink` is NULL, keeping `traceModelWeights()` and
`traceModelGrads()` behavior consistent with the existing `tracedGrads(...,
NULL, NULL)` no-op case.
In `@src/userApi/training_loop/calculate_grads/include/TraceApi.h`:
- Around line 17-21: Clarify the lifetime of the phase argument in traceSink_t:
it is currently undocumented even though tracedGrads() passes literals and
traceModelWeights()/traceModelGrads() may build it from temporary stack storage.
Update the comment in TraceApi.h near traceSink_t to state that phase is
borrowed and valid only for the duration of the callback, or change the callers
that synthesize phase to use stable storage before invoking the sink.
---
Outside diff comments:
In `@examples/ecg_anomaly_ae/train_c.c`:
- Around line 361-369: The train-from-scratch path in writeAllReconstructions
currently only emits c_reconstructions.npy, but compare.py and the README still
expect outputs/c_train_recons.npy. Update the reconstruction-writing block
around writeAllReconstructions/getTestSample so the training path also produces
c_train_recons.npy (or otherwise matches the filename compare.py loads), keeping
the existing c_reconstructions.npy output intact if it is still needed.
---
Minor comments:
In `@examples/_shared/mnist_data.py`:
- Around line 19-23: The split validation in load_mnist relies on an assert,
which disappears under optimized Python and can let unsupported values fall
through to datasets.MNIST as the test split. Replace the assert with an explicit
runtime check in load_mnist that raises ValueError for any split other than
"train" or "test", and only construct datasets.MNIST after the split has been
validated.
In `@examples/_shared/npy_dump_sink.c`:
- Line 23: Reject probe indexes greater than numProbes in the probe selection
logic: in npy_dump_sink.c, the ternary assigning probe currently maps every
layerIdx above ctx->numProbes to "loss", which can hide invalid inputs. Update
the probe resolution so only layerIdx == ctx->numProbes uses the loss probe and
any layerIdx > ctx->numProbes is treated as an error or otherwise rejected
before reaching the filename/output path.
In `@examples/_shared/speechcommands_data.py`:
- Around line 133-150: The num_classes guard in load_speechcommands currently
relies on assert, which can be stripped under optimized Python and let invalid
values proceed into the 6-class path. Replace the assert in load_speechcommands
with an explicit runtime check that rejects unsupported num_classes values and
raises a clear exception before any dataset splitting happens. Keep the existing
validation logic around the 35-class keyword set, but ensure the initial
num_classes validation is always enforced regardless of interpreter flags.
In `@examples/kws_mfcc/train_c.c`:
- Around line 120-124: The one-hot label construction in the training path
silently accepts out-of-range class labels and produces an all-zero target; add
a bounds check for cls before the loop that fills data. In the label handling
logic around intLabels->array[i]->data, validate that cls is within [0,
g_numClasses) and fail fast if it is not, rather than writing the tensor in
train_c.c.
- Around line 68-73: The KWS_CLASSES parsing in the environment-value helper
currently accepts partially parsed strings because `strtol` is called with a
null end pointer. Update the parsing logic in the function that validates
`KWS_CLASSES` to use an `endptr` and reject any value where extra characters
remain after the number, so only exact `6` or `35` are accepted; keep the
existing fallback and error message behavior for invalid input.
In `@examples/kws_raw/README.md`:
- Around line 8-11: Update the rate-change note to mention that changing K also
affects the hard-coded LayerNorm([C, L]) shapes in train_c.c and trace_c.c, not
just the MaxPool inputLength values. Make the README guidance around K and
effective rate explicit that both the pooling input lengths and LayerNorm
lengths must be kept in sync with the new sample rate, using the existing
AdaptiveAvgPool1d(1) and train_c.c/trace_c.c references to locate the affected
model setup.
In `@examples/kws_raw/trace_pytorch.py`:
- Around line 35-40: The semicolon-chained statements in trace_pytorch.py
trigger Ruff E702, so split them into separate lines to satisfy linting. Update
the chained assignments in the tracing logic around the activations collection,
including the blocks that use acts and the later occurrences in the same
function, so each assignment or function call stands alone. Keep the existing
behavior in the traced forward path and preserve the same variable names like
acts, h, model.conv1, model.ln1, F.relu, and F.max_pool1d while removing all
semicolons.
In `@examples/mnist_mlp/compare.py`:
- Around line 39-43: The confusion_matrix helper currently iterates with
zip(preds, labels), which silently truncates mismatched inputs and can produce a
partial report. Update confusion_matrix to validate that preds and labels have
the same length before the loop and raise a clear error if they differ, so
callers like the compare.py plotting/parity flow fail fast instead of using
incomplete data.
In `@examples/mnist_mlp/train_c.c`:
- Around line 217-223: The C training path is initializing the model in
buildModel() before the framework RNG is seeded, while the generated c.json
still hardcodes "seed": 42. Update the training flow around buildModel() to seed
the RNG first so the logged seed matches the actual initialization, or remove
the seed field from the config if this path is meant to stay unseeded. Make sure
the change keeps the logged configuration and the model initialization order
consistent.
In `@src/userApi/training_loop/calculate_grads/include/TraceApi.h`:
- Around line 23-25: Update the `TraceApi.h` contract comment for the trace
points emitted by `calculateGradsSequential` / the `sink` callback so the
`"agrad"` description matches the implementation. The issue is that `"agrad"` is
fired before `backward(...)` and represents the gradient entering each layer’s
backward pass, not a post-backward result; revise the wording in the
`calculateGradsSequential` comment to reflect that pre-backward gradient input
semantics.
---
Nitpick comments:
In @.github/workflows/ci.yml:
- Around line 156-170: The raw dataset cache setup is duplicating MNIST
downloads across both examples instead of using a shared location like the
SpeechCommands cache. Update the cache paths in the CI workflow so `mnist_mlp`
and `mnist_cnn` point to the shared MNIST raw directory used by
`examples/_shared/mnist_data.py`, and adjust the cache key inputs to reference
the shared prepare/download script or other shared source of truth. Keep the
`actions/cache` block aligned with the shared-data pattern used by
`examples/_shared/data/speech_commands` so both MNIST examples reuse the same
cached raw dataset.
In `@examples/har_classifier/CMakeLists.txt`:
- Around line 51-55: `Distributions` should not be linked directly from the
`examples/har_classifier` target list if `train_c.c` no longer uses it; this
leaks a private dependency through `LayerCommon`. Update the `LayerCommon`
target in `src/userApi/CMakeLists.txt` so it owns the `Distributions` dependency
transitively, and remove `Distributions` from the `har_classifier` link list so
consumers only depend on `LayerCommon`.
In `@test/unit/layer/UnitTestLinear.c`:
- Around line 1450-1455: The Xavier override test in UnitTestLinear should also
verify that the bias is actually sampled and not left all-zero. Update the
assertions around the existing weight and bias checks in the linear unit test to
include the same nonzero bias validation used by the default-initialization
test, so the override path is pinned to a real sampled bias rather than only
checking it stays within bounds.
In `@test/unit/userAPI/UnitTestConv1dApi.c`:
- Around line 324-336: The override-path bias assertion in UnitTestConv1dApi.c
is missing a non-zero sampling check, so a zero-initialized bias can still pass;
update the bias validation near the existing maxAbsFloat(cfg->bias->param) check
to also assert biasMaxAbs > 0.0f, matching the default-init test behavior while
keeping the current default-bound check intact.
In `@test/unit/userAPI/UnitTestConv1dTransposedApi.c`:
- Around line 293-298: The override-path bias check in the transposed conv unit
test only verifies the value stays within the default bound, so it can miss a
zero-initialized bias. Update the assertions in UnitTestConv1dTransposedApi to
also verify the bias is actually sampled/non-zero for the override path, using
the existing weight and bias max-abs checks in the same test case, while keeping
the current bound validation intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 1251b5b8-2278-4fef-86ed-4deda8727b24
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (68)
.github/workflows/ci.ymlCMakeLists.txtdevenv.nixexample/MnistExperiment.cexamples/CMakeLists.txtexamples/README.mdexamples/_shared/CMakeLists.txtexamples/_shared/mnist_data.pyexamples/_shared/npy_dump_sink.cexamples/_shared/npy_dump_sink.hexamples/_shared/speechcommands_data.pyexamples/_shared/trace_compare.pyexamples/_shared/trace_sweep.pyexamples/ecg_anomaly_ae/CMakeLists.txtexamples/ecg_anomaly_ae/README.mdexamples/ecg_anomaly_ae/compare.pyexamples/ecg_anomaly_ae/train_c.cexamples/ecg_anomaly_ae_v2/train_c.cexamples/har_classifier/CMakeLists.txtexamples/har_classifier/README.mdexamples/har_classifier/train_c.cexamples/kws_mfcc/CMakeLists.txtexamples/kws_mfcc/README.mdexamples/kws_mfcc/compare.pyexamples/kws_mfcc/prepare_data.pyexamples/kws_mfcc/train_c.cexamples/kws_mfcc/train_pytorch.pyexamples/kws_raw/CMakeLists.txtexamples/kws_raw/README.mdexamples/kws_raw/compare.pyexamples/kws_raw/prepare_data.pyexamples/kws_raw/probe_manifest.hexamples/kws_raw/trace_c.cexamples/kws_raw/trace_pytorch.pyexamples/kws_raw/train_c.cexamples/kws_raw/train_pytorch.pyexamples/mnist_cnn/CMakeLists.txtexamples/mnist_cnn/README.mdexamples/mnist_cnn/compare.pyexamples/mnist_cnn/prepare_data.pyexamples/mnist_cnn/train_c.cexamples/mnist_cnn/train_pytorch.pyexamples/mnist_mlp/CMakeLists.txtexamples/mnist_mlp/README.mdexamples/mnist_mlp/compare.pyexamples/mnist_mlp/prepare_data.pyexamples/mnist_mlp/train_c.cexamples/mnist_mlp/train_pytorch.pypyproject.tomlsrc/userApi/CMakeLists.txtsrc/userApi/LayerCommon.csrc/userApi/include/LayerCommon.hsrc/userApi/layer/Conv1dApi.csrc/userApi/layer/Conv1dTransposedApi.csrc/userApi/layer/LinearApi.csrc/userApi/layer/include/Conv1dApi.hsrc/userApi/layer/include/Conv1dTransposedApi.hsrc/userApi/layer/include/LinearApi.hsrc/userApi/training_loop/calculate_grads/CalculateGradsSequential.csrc/userApi/training_loop/calculate_grads/include/TraceApi.htest/unit/CMakeLists.txttest/unit/layer/CMakeLists.txttest/unit/layer/UnitTestLinear.ctest/unit/training_loop/CMakeLists.txttest/unit/training_loop/UnitTestCalculateGradsSequential.ctest/unit/userAPI/CMakeLists.txttest/unit/userAPI/UnitTestConv1dApi.ctest/unit/userAPI/UnitTestConv1dTransposedApi.c
💤 Files with no reviewable changes (3)
- example/MnistExperiment.c
- examples/ecg_anomaly_ae_v2/train_c.c
- CMakeLists.txt
| pairs = trace_compare.compare_pairs(c_dump, pt_dump) | ||
| batch_results.append((c_loss, pt_loss, pairs)) |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Fail when the dump filename sets do not match.
trace_compare.compare_pairs() intentionally skips missing counterparts. In this loop, that means a missing probe/phase on either side just disappears from the aggregate tables, so trace_sweep.py can report a clean summary while traces are incomplete. Validate the dump_c and dump_pt filename sets before appending pairs.
Proposed fix
print(f" C: {c_out}")
print(f" PT: {pt_out}", flush=True)
c_loss = extract_loss(c_out)
pt_loss = extract_loss(pt_out)
+ c_names = {p.name for p in c_dump.glob("*.npy")}
+ pt_names = {p.name for p in pt_dump.glob("*.npy")}
+ if not c_names or not pt_names or c_names != pt_names:
+ missing_in_pt = sorted(c_names - pt_names)
+ missing_in_c = sorted(pt_names - c_names)
+ raise RuntimeError(
+ "trace dump mismatch\n"
+ f" missing in PyTorch: {missing_in_pt[:10]}\n"
+ f" missing in C: {missing_in_c[:10]}"
+ )
pairs = trace_compare.compare_pairs(c_dump, pt_dump)
batch_results.append((c_loss, pt_loss, pairs))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| pairs = trace_compare.compare_pairs(c_dump, pt_dump) | |
| batch_results.append((c_loss, pt_loss, pairs)) | |
| c_names = {p.name for p in c_dump.glob("*.npy")} | |
| pt_names = {p.name for p in pt_dump.glob("*.npy")} | |
| if not c_names or not pt_names or c_names != pt_names: | |
| missing_in_pt = sorted(c_names - pt_names) | |
| missing_in_c = sorted(pt_names - c_names) | |
| raise RuntimeError( | |
| "trace dump mismatch\n" | |
| f" missing in PyTorch: {missing_in_pt[:10]}\n" | |
| f" missing in C: {missing_in_c[:10]}" | |
| ) | |
| pairs = trace_compare.compare_pairs(c_dump, pt_dump) | |
| batch_results.append((c_loss, pt_loss, pairs)) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/_shared/trace_sweep.py` around lines 120 - 121, The aggregate in
trace_sweep.py is currently allowing mismatched dump filename sets to slip
through because compare_pairs() skips missing counterparts and batch_results
still gets appended. In the loop that builds pairs and appends to batch_results,
add a validation step using the dump_c and dump_pt filename sets before calling
trace_compare.compare_pairs() or recording the result, and fail fast when they
differ. Use the existing trace_compare.compare_pairs and batch_results flow as
the place to enforce this check.
| def confusion_matrix(preds: np.ndarray, labels: np.ndarray, num_classes: int) -> np.ndarray: | ||
| cm = np.zeros((num_classes, num_classes), dtype=np.int64) | ||
| for p, a in zip(preds, labels): | ||
| cm[int(p), int(a)] += 1 |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Fail fast on prediction/label length mismatches.
zip(preds, labels) silently truncates. If either predictions file has the wrong length, the confusion matrix still renders while dropping samples and hiding the contract violation.
Proposed fix
def confusion_matrix(preds: np.ndarray, labels: np.ndarray, num_classes: int) -> np.ndarray:
+ if preds.shape[0] != labels.shape[0]:
+ raise ValueError(
+ f"expected {labels.shape[0]} predictions, got {preds.shape[0]}"
+ )
cm = np.zeros((num_classes, num_classes), dtype=np.int64)
for p, a in zip(preds, labels):
cm[int(p), int(a)] += 1
return cm📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def confusion_matrix(preds: np.ndarray, labels: np.ndarray, num_classes: int) -> np.ndarray: | |
| cm = np.zeros((num_classes, num_classes), dtype=np.int64) | |
| for p, a in zip(preds, labels): | |
| cm[int(p), int(a)] += 1 | |
| def confusion_matrix(preds: np.ndarray, labels: np.ndarray, num_classes: int) -> np.ndarray: | |
| if preds.shape[0] != labels.shape[0]: | |
| raise ValueError( | |
| f"expected {labels.shape[0]} predictions, got {preds.shape[0]}" | |
| ) | |
| cm = np.zeros((num_classes, num_classes), dtype=np.int64) | |
| for p, a in zip(preds, labels): | |
| cm[int(p), int(a)] += 1 |
🧰 Tools
🪛 Ruff (0.15.20)
[warning] 49-49: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/kws_raw/compare.py` around lines 47 - 50, The confusion_matrix
function currently uses zip(preds, labels), which silently truncates when the
arrays differ in length. Add an explicit length check at the start of
confusion_matrix (or in the caller that prepares preds and labels) and raise a
clear error if the lengths do not match before building the matrix. Keep the fix
localized around confusion_matrix so the mismatch is caught before any counting
happens.
Source: Linters/SAST tools
| print("\nParity report (PyTorch vs C) — INFORMATIONAL:") | ||
| print(f"{'metric':<14} {'pt':>10} {'c':>10} {'diff':>10} {'tol':>8} {'type':>5} {'pass':>6}") | ||
| for r in results: | ||
| print(f"{r.metric:<14} {r.pt_value:>10.5f} {r.c_value:>10.5f} {r.diff:>10.5f} " | ||
| f"{r.tolerance:>8.4f} {r.tolerance_type:>5} {str(r.passed):>6}") | ||
| print(f"\nOverall: {'PASS' if overall_pass else 'FAIL'} (informational; not a CI gate)") | ||
| return 0 if overall_pass else 1 |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Keep the informational compare non-failing.
This script says the parity report is informational only, but return 0 if overall_pass else 1 makes every tolerance miss a hard failure. That contradicts both the module docstring and this PR’s stated contract for compare.py.
Proposed fix
- return 0 if overall_pass else 1
+ return 0📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| print("\nParity report (PyTorch vs C) — INFORMATIONAL:") | |
| print(f"{'metric':<14} {'pt':>10} {'c':>10} {'diff':>10} {'tol':>8} {'type':>5} {'pass':>6}") | |
| for r in results: | |
| print(f"{r.metric:<14} {r.pt_value:>10.5f} {r.c_value:>10.5f} {r.diff:>10.5f} " | |
| f"{r.tolerance:>8.4f} {r.tolerance_type:>5} {str(r.passed):>6}") | |
| print(f"\nOverall: {'PASS' if overall_pass else 'FAIL'} (informational; not a CI gate)") | |
| return 0 if overall_pass else 1 | |
| print("\nParity report (PyTorch vs C) — INFORMATIONAL:") | |
| print(f"{'metric':<14} {'pt':>10} {'c':>10} {'diff':>10} {'tol':>8} {'type':>5} {'pass':>6}") | |
| for r in results: | |
| print(f"{r.metric:<14} {r.pt_value:>10.5f} {r.c_value:>10.5f} {r.diff:>10.5f} " | |
| f"{r.tolerance:>8.4f} {r.tolerance_type:>5} {str(r.passed):>6}") | |
| print(f"\nOverall: {'PASS' if overall_pass else 'FAIL'} (informational; not a CI gate)") | |
| return 0 |
🧰 Tools
🪛 Ruff (0.15.20)
[warning] 82-82: Use explicit conversion flag
Replace with conversion flag
(RUF010)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/kws_raw/compare.py` around lines 78 - 84, The compare.py parity
report is meant to be informational only, but the current final return in the
main flow still turns any mismatch into a nonzero exit. Update the result
handling near the overall_pass summary so the script always exits successfully
after printing the report, and keep the existing parity output logic in
compare.py and any surrounding main entrypoint consistent with that non-failing
contract.
| ensureDir("examples/kws_raw/dump_c"); | ||
| for (size_t step = 0; step < g_steps; step++) { | ||
| char dir[256]; | ||
| snprintf(dir, sizeof(dir), "examples/kws_raw/dump_c/step%03zu", step); | ||
| ensureDir(dir); |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Avoid reusing existing step dump directories.
step%03zu is deterministic, so rerunning with fewer steps or a smaller --act-samples leaves stale .npy files behind and the comparer can read mixed old/new traces. Clear dir first, or treat EEXIST as an error for per-step dumps.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/kws_raw/trace_c.c` around lines 368 - 372, The per-step dump
directory naming in the trace generation loop is deterministic, so reruns can
reuse stale contents from prior executions. Update the step directory handling
in the loop around ensureDir and snprintf in trace_c.c so each step dump
directory is either freshly cleared before use or creation fails on an existing
directory (including treating EEXIST as an error), preventing mixed old/new .npy
files from being read by the comparer.
| import argparse, sys | ||
| from pathlib import Path | ||
| import numpy as np, torch, torch.nn.functional as F |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Start from a clean dump_pt/step000 directory.
This reuses a fixed output path. If a rerun uses a smaller batch or fewer activation samples, old *.sNNN.npy files remain and the directory no longer represents one step. Remove the directory first or write to a fresh one.
Proposed fix
-import numpy as np, torch, torch.nn.functional as F
+import shutil
+
+import numpy as np, torch, torch.nn.functional as F
@@
- out = HERE / "dump_pt" / "step000"; out.mkdir(parents=True, exist_ok=True)
+ out = HERE / "dump_pt" / "step000"
+ shutil.rmtree(out, ignore_errors=True)
+ out.mkdir(parents=True, exist_ok=True)Also applies to: 68-68
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/kws_raw/trace_pytorch.py` around lines 11 - 13, The fixed output
path in trace_pytorch.py can leave stale *.sNNN.npy files from earlier runs, so
start each run with a clean dump_pt/step000 directory or write into a fresh
step-specific directory. Update the step/activation dump logic in the main
tracing flow (where the output path is created and files are written) to remove
any existing directory before saving new samples, using the existing
path-building symbols in trace_pytorch.py.
| print("\nParity report (PyTorch vs C) — INFORMATIONAL:") | ||
| print(f"{'metric':<14} {'pt':>10} {'c':>10} {'diff':>10} {'tol':>8} {'type':>5} {'pass':>6}") | ||
| for r in results: | ||
| print(f"{r.metric:<14} {r.pt_value:>10.5f} {r.c_value:>10.5f} {r.diff:>10.5f} " | ||
| f"{r.tolerance:>8.4f} {r.tolerance_type:>5} {str(r.passed):>6}") | ||
| print(f"\nOverall: {'PASS' if overall_pass else 'FAIL'} (informational; not a CI gate)") | ||
| return 0 if overall_pass else 1 |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Keep the “informational” compare step non-failing.
The module docstring and examples/mnist_cnn/README.md both describe this command as informational, but Line 76 returns 1 on tolerance misses. That turns the report into a gate for any scripted run.
Suggested fix
print(f"\nOverall: {'PASS' if overall_pass else 'FAIL'} (informational; not a CI gate)")
- return 0 if overall_pass else 1
+ return 0📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| print("\nParity report (PyTorch vs C) — INFORMATIONAL:") | |
| print(f"{'metric':<14} {'pt':>10} {'c':>10} {'diff':>10} {'tol':>8} {'type':>5} {'pass':>6}") | |
| for r in results: | |
| print(f"{r.metric:<14} {r.pt_value:>10.5f} {r.c_value:>10.5f} {r.diff:>10.5f} " | |
| f"{r.tolerance:>8.4f} {r.tolerance_type:>5} {str(r.passed):>6}") | |
| print(f"\nOverall: {'PASS' if overall_pass else 'FAIL'} (informational; not a CI gate)") | |
| return 0 if overall_pass else 1 | |
| print("\nParity report (PyTorch vs C) — INFORMATIONAL:") | |
| print(f"{'metric':<14} {'pt':>10} {'c':>10} {'diff':>10} {'tol':>8} {'type':>5} {'pass':>6}") | |
| for r in results: | |
| print(f"{r.metric:<14} {r.pt_value:>10.5f} {r.c_value:>10.5f} {r.diff:>10.5f} " | |
| f"{r.tolerance:>8.4f} {r.tolerance_type:>5} {str(r.passed):>6}") | |
| print(f"\nOverall: {'PASS' if overall_pass else 'FAIL'} (informational; not a CI gate)") | |
| return 0 |
🧰 Tools
🪛 Ruff (0.15.20)
[warning] 74-74: Use explicit conversion flag
Replace with conversion flag
(RUF010)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/mnist_cnn/compare.py` around lines 70 - 76, Update the compare
script so the informational parity report never fails the process; in
compare.py, keep the reporting in the main compare routine (the print loop and
overall summary) but change the return value logic at the end so it always exits
successfully instead of returning 1 when overall_pass is false. Make sure the
behavior of the compare command stays aligned with the module docstring and
README description of the step as informational, not a CI gate.
| print("\nParity report (PyTorch vs C) — INFORMATIONAL:") | ||
| print(f"{'metric':<14} {'pt':>10} {'c':>10} {'diff':>10} {'tol':>8} {'type':>5} {'pass':>6}") | ||
| for r in results: | ||
| print(f"{r.metric:<14} {r.pt_value:>10.5f} {r.c_value:>10.5f} {r.diff:>10.5f} " | ||
| f"{r.tolerance:>8.4f} {r.tolerance_type:>5} {str(r.passed):>6}") | ||
| print(f"\nOverall: {'PASS' if overall_pass else 'FAIL'} (informational; not a CI gate)") | ||
| return 0 if overall_pass else 1 |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Do not fail the process from an informational-only comparison.
Lines 70-76 currently make compare.py exit 1 on tolerance drift, which turns this script into a gate. That contradicts this module’s own docstring and examples/mnist_mlp/README.md, both of which describe the train-from-scratch comparison as informational-only.
Proposed fix
print("\nParity report (PyTorch vs C) — INFORMATIONAL:")
print(f"{'metric':<14} {'pt':>10} {'c':>10} {'diff':>10} {'tol':>8} {'type':>5} {'pass':>6}")
for r in results:
print(f"{r.metric:<14} {r.pt_value:>10.5f} {r.c_value:>10.5f} {r.diff:>10.5f} "
f"{r.tolerance:>8.4f} {r.tolerance_type:>5} {str(r.passed):>6}")
print(f"\nOverall: {'PASS' if overall_pass else 'FAIL'} (informational; not a CI gate)")
- return 0 if overall_pass else 1
+ return 0📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| print("\nParity report (PyTorch vs C) — INFORMATIONAL:") | |
| print(f"{'metric':<14} {'pt':>10} {'c':>10} {'diff':>10} {'tol':>8} {'type':>5} {'pass':>6}") | |
| for r in results: | |
| print(f"{r.metric:<14} {r.pt_value:>10.5f} {r.c_value:>10.5f} {r.diff:>10.5f} " | |
| f"{r.tolerance:>8.4f} {r.tolerance_type:>5} {str(r.passed):>6}") | |
| print(f"\nOverall: {'PASS' if overall_pass else 'FAIL'} (informational; not a CI gate)") | |
| return 0 if overall_pass else 1 | |
| print("\nParity report (PyTorch vs C) — INFORMATIONAL:") | |
| print(f"{'metric':<14} {'pt':>10} {'c':>10} {'diff':>10} {'tol':>8} {'type':>5} {'pass':>6}") | |
| for r in results: | |
| print(f"{r.metric:<14} {r.pt_value:>10.5f} {r.c_value:>10.5f} {r.diff:>10.5f} " | |
| f"{r.tolerance:>8.4f} {r.tolerance_type:>5} {str(r.passed):>6}") | |
| print(f"\nOverall: {'PASS' if overall_pass else 'FAIL'} (informational; not a CI gate)") | |
| return 0 |
🧰 Tools
🪛 Ruff (0.15.20)
[warning] 74-74: Use explicit conversion flag
Replace with conversion flag
(RUF010)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/mnist_mlp/compare.py` around lines 70 - 76, The parity comparison in
compare.py is informational-only, but the current return path in the main
comparison flow still exits with a failure status when overall_pass is false.
Update the logic around the results loop and final return so the script always
completes successfully for tolerance drift, keeping the printed PASS/FAIL
summary as informational output only. Use the compare.py comparison function and
the overall_pass/result handling to locate the change, and remove any nonzero
exit behavior from this reporting path.
| /* fan_in = inPerGroup*kernelSize; fan_out = outPerGroup*kernelSize | ||
| * (PyTorch _calculate_fan_in_and_fan_out for the Conv1d weight layout). */ | ||
| size_t fanIn = inPerGroup * kernelSize; | ||
| size_t fanOut = (outChannels / groups) * kernelSize; | ||
| initWeightTensor(paramTensor, weightInit, fanIn, fanOut); |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify the current PyTorch source still derives fan_out from tensor.size(0).
curl -fsSL https://raw.githubusercontent.com/pytorch/pytorch/main/torch/nn/init.py |
sed -n '/def _calculate_fan_in_and_fan_out/,/return fan_in, fan_out/p'Repository: es-ude/OnDeviceTraining
Length of output: 908
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the Conv1d initialization code and the tensor initializer it calls.
git ls-files 'src/userApi/layer/Conv1dApi.c' 'src/**/initWeightTensor*' 'src/**/weightInit*' 'src/**/Tensor*' |
sed -n '1,200p'
printf '\n--- Conv1dApi.c ---\n'
sed -n '1,180p' src/userApi/layer/Conv1dApi.c
printf '\n--- Search initWeightTensor references ---\n'
rg -n "initWeightTensor|fanIn|fanOut" srcRepository: es-ude/OnDeviceTraining
Length of output: 12338
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '\n--- LayerCommon.c (initWeightTensor) ---\n'
sed -n '1,90p' src/userApi/LayerCommon.c
printf '\n--- Conv1dTransposedApi.c (grouped fanOut reference) ---\n'
sed -n '55,90p' src/userApi/layer/Conv1dTransposedApi.cRepository: es-ude/OnDeviceTraining
Length of output: 4423
Use the full output-channel count for Conv1d Xavier fan-out.
For weights shaped [outChannels, inChannels/groups, kernelSize], fan-out should be outChannels * kernelSize, not (outChannels / groups) * kernelSize; otherwise grouped Xavier gets a wider bound than intended.
Proposed fix
- /* fan_in = inPerGroup*kernelSize; fan_out = outPerGroup*kernelSize
+ /* fan_in = inPerGroup*kernelSize; fan_out = outChannels*kernelSize
* (PyTorch _calculate_fan_in_and_fan_out for the Conv1d weight layout). */
size_t fanIn = inPerGroup * kernelSize;
- size_t fanOut = (outChannels / groups) * kernelSize;
+ size_t fanOut = outChannels * kernelSize;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /* fan_in = inPerGroup*kernelSize; fan_out = outPerGroup*kernelSize | |
| * (PyTorch _calculate_fan_in_and_fan_out for the Conv1d weight layout). */ | |
| size_t fanIn = inPerGroup * kernelSize; | |
| size_t fanOut = (outChannels / groups) * kernelSize; | |
| initWeightTensor(paramTensor, weightInit, fanIn, fanOut); | |
| /* fan_in = inPerGroup*kernelSize; fan_out = outChannels*kernelSize | |
| * (PyTorch _calculate_fan_in_and_fan_out for the Conv1d weight layout). */ | |
| size_t fanIn = inPerGroup * kernelSize; | |
| size_t fanOut = outChannels * kernelSize; | |
| initWeightTensor(paramTensor, weightInit, fanIn, fanOut); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/userApi/layer/Conv1dApi.c` around lines 108 - 112, The Conv1d Xavier
initialization is computing fan-out with grouped channels instead of the full
output-channel count. Update the fan-out calculation in the Conv1d weight
initialization path so the code in the Conv1dApi logic uses outChannels *
kernelSize rather than dividing by groups, while keeping fan-in based on the
per-group input channels. Preserve the existing initWeightTensor call and adjust
the local fanOut symbol accordingly.
| static void traceModelParams(layer_t **model, size_t modelSize, const char *tag, bool wantGrad, | ||
| traceSink_t sink, void *ctx) { | ||
| char phase[64]; | ||
| for (size_t i = 0; i < modelSize; i++) { | ||
| parameter_t *w = NULL, *b = NULL; | ||
| if (!layerParameters(model[i], &w, &b)) { | ||
| continue; | ||
| } | ||
| tensor_t *wt = wantGrad ? getGradFromParameter(w) : getParamFromParameter(w); | ||
| snprintf(phase, sizeof(phase), "%s.weight", tag); | ||
| sink(ctx, i, model[i]->type, phase, wt); | ||
| if (b != NULL) { | ||
| tensor_t *bt = wantGrad ? getGradFromParameter(b) : getParamFromParameter(b); | ||
| snprintf(phase, sizeof(phase), "%s.bias", tag); | ||
| sink(ctx, i, model[i]->type, phase, bt); | ||
| } | ||
| } |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Guard traceModelParams() against a NULL sink.
traceModelWeights() and traceModelGrads() both route here, and sink(ctx, ...) is unconditional. Passing NULL currently crashes, even though tracedGrads(..., NULL, NULL) is already treated as a valid no-op trace path.
Suggested fix
static void traceModelParams(layer_t **model, size_t modelSize, const char *tag, bool wantGrad,
traceSink_t sink, void *ctx) {
+ if (sink == NULL) {
+ return;
+ }
+
char phase[64];
for (size_t i = 0; i < modelSize; i++) {
parameter_t *w = NULL, *b = NULL;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| static void traceModelParams(layer_t **model, size_t modelSize, const char *tag, bool wantGrad, | |
| traceSink_t sink, void *ctx) { | |
| char phase[64]; | |
| for (size_t i = 0; i < modelSize; i++) { | |
| parameter_t *w = NULL, *b = NULL; | |
| if (!layerParameters(model[i], &w, &b)) { | |
| continue; | |
| } | |
| tensor_t *wt = wantGrad ? getGradFromParameter(w) : getParamFromParameter(w); | |
| snprintf(phase, sizeof(phase), "%s.weight", tag); | |
| sink(ctx, i, model[i]->type, phase, wt); | |
| if (b != NULL) { | |
| tensor_t *bt = wantGrad ? getGradFromParameter(b) : getParamFromParameter(b); | |
| snprintf(phase, sizeof(phase), "%s.bias", tag); | |
| sink(ctx, i, model[i]->type, phase, bt); | |
| } | |
| } | |
| static void traceModelParams(layer_t **model, size_t modelSize, const char *tag, bool wantGrad, | |
| traceSink_t sink, void *ctx) { | |
| if (sink == NULL) { | |
| return; | |
| } | |
| char phase[64]; | |
| for (size_t i = 0; i < modelSize; i++) { | |
| parameter_t *w = NULL, *b = NULL; | |
| if (!layerParameters(model[i], &w, &b)) { | |
| continue; | |
| } | |
| tensor_t *wt = wantGrad ? getGradFromParameter(w) : getParamFromParameter(w); | |
| snprintf(phase, sizeof(phase), "%s.weight", tag); | |
| sink(ctx, i, model[i]->type, phase, wt); | |
| if (b != NULL) { | |
| tensor_t *bt = wantGrad ? getGradFromParameter(b) : getParamFromParameter(b); | |
| snprintf(phase, sizeof(phase), "%s.bias", tag); | |
| sink(ctx, i, model[i]->type, phase, bt); | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/userApi/training_loop/calculate_grads/CalculateGradsSequential.c` around
lines 140 - 156, `traceModelParams()` currently calls `sink(ctx, ...)`
unconditionally, so a NULL trace sink will crash even though the no-op path is
expected; add an early guard in `traceModelParams()` (before the loop) to return
immediately when `sink` is NULL, keeping `traceModelWeights()` and
`traceModelGrads()` behavior consistent with the existing `tracedGrads(...,
NULL, NULL)` no-op case.
| * - phase: "fwd" | "agrad" | "lossgrad" for tracedGrads (Task 2); | ||
| * "<tag>.weight" / "<tag>.bias" for traceModelWeights/Grads (Task 3). | ||
| * - tensor: borrowed; valid only for the duration of the call. */ | ||
| typedef void (*traceSink_t)(void *ctx, size_t layerIdx, layerType_t layerType, const char *phase, | ||
| tensor_t *tensor); |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Clarify that phase is borrowed too.
tracedGrads() passes string literals, but traceModelWeights() / traceModelGrads() synthesize phase in stack storage before invoking the sink. A sink that retains the pointer past the callback can end up with dangling or corrupted phase names. Please either document phase as valid only for the duration of the call or switch those synthesized names to stable storage.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/userApi/training_loop/calculate_grads/include/TraceApi.h` around lines 17
- 21, Clarify the lifetime of the phase argument in traceSink_t: it is currently
undocumented even though tracedGrads() passes literals and
traceModelWeights()/traceModelGrads() may build it from temporary stack storage.
Update the comment in TraceApi.h near traceSink_t to state that phase is
borrowed and valid only for the duration of the callback, or change the callers
that synthesize phase to use stable storage before invoking the sink.
No description provided.