ML Frame

A low-level C11 machine learning framework built around a deterministic Memory Arena, explicit Tensor primitives, a compact Autograd engine, optimized GEMM backends, and an end-to-end quantization stack for edge-oriented workloads.

This repository focuses on correctness, reproducibility, and performance in a compact systems-style implementation.

Technical Highlights

Arena-based memory management with checkpoint/restore semantics.
Dense Tensor core with shape/stride metadata and guarded allocation paths.
Reverse-topological Autograd traversal via parent links and operator callbacks.
Runtime-dispatched matrix multiplication with SCALAR, AVX2/FMA, NEON, and optional CBLAS paths.
Blocked/Packed GEMM with tunable tile configuration and optional OpenMP parallel execution.
INT8 quantization features:
- Per-tensor quantization
- Per-channel quantization
- Grouped activation calibration
- Fully integer INT8 x INT8 -> INT32 -> INT8 requantization
FP16 conversion utilities for compact representation.
Versioned checksummed binary I/O with strict metadata and integrity validation.
Broad validation, stability, fuzz/property, and benchmark tests.

Repository Layout

include/
  ml_core.h        # Arena and Tensor definitions
  ml_autograd.h    # Backward pass API
  ml_math.h        # GEMM, backend dispatch, quantization, FP16 utilities
  ml_ops.h         # Differentiable ops (add/mul/matmul)
  ml_nn.h          # Activations and losses
  ml_layer.h       # Linear and quantized linear layers
  ml_optim.h       # SGD and gradient reset
  ml_io.h          # Weight serialization/deserialization

src/
  ml_core.c
  ml_autograd.c
  ml_math.c
  ml_ops.c
  ml_nn.c
  ml_layer.c
  ml_optim.c
  ml_io.c

test/
  test_step1.c
  test_step2.c
  test_step3_4.c
  test_step5_to_7.c
  test_step8_to_10.c
  test_validation_hardening.c
  test_io_hardening.c
  test_nn_stability.c
  test_quantization_backend.c
  test_int8_linear_inference.c
  test_fuzz_quant_io.c
  test_benchmark_limits.c

Core Architecture

1. Memory Model

The runtime is built on a single-owner Memory Arena:

arena_init initializes a fixed backing buffer.
arena_alloc performs aligned bump allocation.
arena_checkpoint and arena_restore allow scoped temporary allocations.
arena_reset resets all transient state in O(1).

This model avoids frequent heap operations and enables deterministic allocation behavior.

2. Tensor Model

A Tensor stores:

float* data
float* grad
shape[], strides[], ndim, size
Autograd fields (requires_grad, parents, backward, visited)

Tensor creation validates dimensionality and guards overflow-sensitive size computations.

3. Autograd Model

tensor_backward executes reverse traversal over a topologically sorted node list and invokes registered backward callbacks.

Implemented operator-level gradient propagation includes:

op_add with broadcast-aware gradient accumulation
op_mul
op_matmul
op_relu
op_sigmoid
op_softmax
loss_mse
loss_crossentropy

4. Compute Backends

tensor_matmul_simd selects backend implementations at runtime.

Supported backend identifiers:

ML_MATMUL_BACKEND_SCALAR
ML_MATMUL_BACKEND_AVX2_FMA
ML_MATMUL_BACKEND_NEON
ML_MATMUL_BACKEND_CBLAS (when enabled)

Additional compute features:

Cache-blocked GEMM (tensor_matmul_blocked)
Packed panel strategy
Microkernel dispatch (including 8x8 paths)
Configurable tiling through MlGemmConfig
ml_gemm_autotune and thread controls

5. Quantization and Reduced Precision

The project provides a full quantization surface:

Per-tensor INT8 parameter generation (tensor_calc_qparams_i8)
Per-channel INT8 parameter generation (tensor_calc_qparams_i8_per_channel)
Activation calibration (global and grouped percentile)
De/quantization APIs for INT8 tensors
Integer kernel path:
- tensor_matmul_i8i8_i8pc
FP16 conversion and tensor helpers:
- ml_float_to_fp16
- ml_fp16_to_float
- tensor_quantize_fp16
- tensor_dequantize_fp16

QuantizedLinearLayer provides native INT8 linear inference.

6. Serialization and Integrity

io_save_weights and io_load_weights implement a versioned binary format with:

Fixed header metadata
Tensor descriptor validation
Header checksum validation
Data checksum validation
Corruption rejection and strict mismatch handling

Build and Toolchain

Prerequisites

C11-compatible compiler (GCC or Clang)
Math library (-lm)
Optional:
- OpenMP (-fopenmp)
- CBLAS (-DML_USE_CBLAS + BLAS link flags)

Example Build Commands

Compile one test target:

gcc -O3 -std=c11 -Wall -Wextra -Iinclude src/*.c test/test_step2.c -lm -o test_step2

Compile with OpenMP (optional):

gcc -O3 -std=c11 -Wall -Wextra -fopenmp -Iinclude src/*.c test/test_benchmark_limits.c -lm -o test_benchmark_limits

Compile with CBLAS (optional, platform-specific link flags may differ):

gcc -O3 -std=c11 -Wall -Wextra -DML_USE_CBLAS -Iinclude src/*.c test/test_quantization_backend.c -lblas -lm -o test_quantization_backend

Running Tests

Representative execution flow:

./test_step1
./test_step2
./test_step3_4
./test_step5_7
./test_step8_10
./test_validation_hardening
./test_io_hardening
./test_nn_stability
./test_quantization_backend
./test_int8_linear_inference
./test_fuzz_quant_io

On Windows, generated executables typically use .exe suffix.

Benchmarking and Runtime Tuning

Benchmark executable:

./test_benchmark_limits

Key runtime variables:

ML_MATMUL_BACKEND=scalar|avx2|avx2_fma|neon|cblas
ML_GEMM_AUTOTUNE=1
ML_GEMM_BM
ML_GEMM_BN
ML_GEMM_BK
ML_GEMM_APACK_THRESHOLD
ML_GEMM_NUM_THREADS

Backend introspection APIs:

ml_matmul_last_backend()
ml_matmul_backend_name()

Test Coverage Summary

Step tests validate baseline arena, tensor, math, and training flow.
Validation hardening tests verify invalid input rejection and non-finite update guards.
I/O hardening tests verify corruption detection and shape consistency checks.
NN stability tests stress Softmax and Cross-Entropy numerical behavior.
Quantization/backend tests verify parity and error thresholds across quantized paths.
Fuzz/property tests exercise randomized quantization and serialization invariants.
Benchmark tests provide throughput/latency observations and CSV-style metrics flow.

API Surface (High-Level)

Core

tensor_create
tensor_set_requires_grad
tensor_backward
optim_sgd_step
optim_zero_grad

Math/Ops

tensor_add
tensor_matmul
tensor_matmul_blocked
tensor_matmul_simd
op_add
op_mul
op_matmul

NN

op_relu
op_sigmoid
op_softmax
loss_mse
loss_crossentropy

Layer

layer_linear_create
layer_linear_forward
layer_linear_quantize
layer_linear_forward_int8

Quantization

tensor_calc_qparams_i8
tensor_quantize_i8
tensor_dequantize_i8
tensor_calc_qparams_i8_per_channel
tensor_quantize_i8_per_channel
tensor_calibrate_activation_qparams_i8
tensor_calibrate_activation_qparams_i8_grouped
tensor_matmul_int8
tensor_matmul_int8_per_channel
tensor_matmul_i8i8_i8pc
tensor_quantize_fp16
tensor_dequantize_fp16

I/O

io_save_weights
io_load_weights

Current Scope and Limitations

This repository currently targets dense tensor operations and related training/inference primitives.

Not in scope (at present):

Convolution and pooling operator families
Graph compilation/fusion passes
Distributed training
Multi-format model import/export ecosystem
Full package/distribution pipeline

Engineering Notes

The project is intentionally designed for explicit control over memory, numerics, and compute paths.
Safety checks are integrated into allocation, validation, serialization, and optimizer flows.
The implementation favors predictable behavior and direct inspectability over abstraction-heavy runtime layers.

Contributing

Recommended contribution areas:

Additional operator coverage
Extended backend kernels
Cross-platform CI and benchmark automation
Expanded calibration algorithms
Documentation and usage examples

Documentation References

The following references were used to guide implementation details and technical decisions:

C language standard library references: https://en.cppreference.com/w/c
GCC compiler options and target tuning: https://gcc.gnu.org/onlinedocs/
Clang compiler documentation: https://clang.llvm.org/docs/
Intel x86 Intrinsics Guide (AVX2/FMA): https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html
Arm NEON intrinsics reference: https://developer.arm.com/architectures/instruction-sets/intrinsics/
OpenMP specification: https://www.openmp.org/specifications/
CBLAS reference (Netlib BLAS): https://www.netlib.org/blas/
Valgrind user manual (Memcheck): https://valgrind.org/docs/manual/manual.html
IEEE 754 floating-point standard overview: https://ieeexplore.ieee.org/document/8766229
CRC background and polynomial references: https://reveng.sourceforge.io/crc-catalogue/all.htm

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
include		include
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML Frame

Technical Highlights

Repository Layout

Core Architecture

1. Memory Model

2. Tensor Model

3. Autograd Model

4. Compute Backends

5. Quantization and Reduced Precision

6. Serialization and Integrity

Build and Toolchain

Prerequisites

Example Build Commands

Running Tests

Benchmarking and Runtime Tuning

Test Coverage Summary

API Surface (High-Level)

Core

Math/Ops

NN

Layer

Quantization

I/O

Current Scope and Limitations

Engineering Notes

Contributing

Documentation References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ML Frame

Technical Highlights

Repository Layout

Core Architecture

1. Memory Model

2. Tensor Model

3. Autograd Model

4. Compute Backends

5. Quantization and Reduced Precision

6. Serialization and Integrity

Build and Toolchain

Prerequisites

Example Build Commands

Running Tests

Benchmarking and Runtime Tuning

Test Coverage Summary

API Surface (High-Level)

Core

Math/Ops

NN

Layer

Quantization

I/O

Current Scope and Limitations

Engineering Notes

Contributing

Documentation References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages