A pure-C, fully serial, fully observable inference engine for BERT-base-uncased.
Every floating-point operation is an explicit loop, so an operation-level hook can be inserted at any point of the forward pass. There is no OpenMP and no BLAS. The engine can run as a standalone dev binary, or in installation mode where it streams every operation as an event for downstream visualization and audio.
This engine is the computation core of Nayuta: The Transformer, an art installation and computational study that runs a single BERT forward pass over roughly 62,400 institutional art texts at about 1,000 operations per second, a pace that would take on the order of 159,000 years to complete. Every floating-point operation is surfaced as a visual or auditory event, so the model's computation is rendered as transparent arithmetic rather than an opaque result. (Nayuta is a Buddhist term for an immense, almost uncountable number, echoing that runtime.)
bert.c is factored out of that project as a reusable, self-contained component. The full
installation mounts this repository as a submodule at bert_inference/engine/ and lives at
github.com/oudeis01/nayuta.
Based on llama2.c by Andrej Karpathy
(MIT). The bidirectional attention pattern references modernbert.c by Hardik
Vala. Key differences from Llama: word + position + token_type embeddings with
LayerNorm, bidirectional N×N attention (no causal mask), no RoPE, GELU
activation, standard 2-layer FFN with bias, and post-norm. See the header
comment in bert.c for the full list.
Two targets are produced from the single bert.c:
cmake -B build
cmake --build buildbert— dev mode, pure C, no dependencies (-lmonly).bert_install— installation mode (-DINSTALLATION_MODE), streams FMA batches over ZeroMQ (PUSH) and structural events over OSC. Requireslibzmqandliblo(resolved viapkg-config).
The engine consumes a weights file exported from a Hugging Face BERT-base checkpoint plus a pre-tokenized corpus. The exporters and corpus builders are not part of this engine repo; they live alongside the installation that drives it.
Dev mode takes an exported weights file and a token source and runs one forward pass:
# token IDs from a file
./build/bert bert_base.bin tokens.txt
# or pipe token IDs on stdin
echo "101 2057 2024 2204 102" | ./build/bert bert_base.binThe operation-level hook fires at each weight-matrix computation as the pass runs. In installation
mode (bert_install), the same stream is pushed over ZeroMQ and OSC instead.
MIT. See LICENSE. Original llama2.c copyright retained.