A Forth-flavored language for numeric and matrix work, statistics and
regression, set/array/frame manipulation, string/regex processing, logic
programming, and multi-core data parallelism — with embedded SQLite and a
runtime C FFI. A compact C interpreter built with clang -O3.
make # builds ./logicforth
make test # runs the golden-output test suite
make bench # runs the benchmark suite (logicforth vs CPython)
./logicforth # REPL
Self-contained: its vendored dependencies — PCRE2 (regex), isocline (REPL line
editing), and SQLite (embedded SQL) — live under external/ and are built from
source into the binary, so make needs only a C compiler and the system
libffi. Refresh them with make vendor-pcre2, sh tools/vendor-isocline.sh,
and sh tools/vendor-sqlite.sh (see each directory's PROVENANCE).
On macOS, make also builds liblapacke_accel.dylib, a thin LAPACK-over-
Accelerate shared library that the statistics library dlopens through the FFI
for SVD and least-squares; only the linear-algebra fits depend on it. Re-vendor
it with make vendor-lapacke.
\ Arithmetic
3 4 + . \ 7
\ Matrices: * is element-wise; matrix multiply is dgemm (αAB + βC)
[ 1 2 3 4 ] 2 2 matrix dup transpose * \ element-wise product of M and Mᵀ
\ Sets and set algebra
< 1 2 3 > < 2 3 4 > + . \ < 1 2 3 4 > (union via polymorphic +)
\ Set-builder { x² | x ∈ 1..10, even x } — literal + filter/map + destruct
< 1 10 range [: 2 mod 0= :] filter [: fsq :] map destruct > . \ < 4 16 36 64 100 >
\ Frames — symbol-keyed nested maps
{ :a 1 :b { :c 2 } } /b/c @ . \ 2
\ Path queries — * (any child), // (any depth), [pred] filters
{ :a { :n 1 } :b { :n 2 } } /*/n select-values . \ [ 1 2 ]
\ JSON: parse to frames/arrays, serialize back
"[1, 2, 3]" json>frame frame>json . \ [1, 2, 3]
\ Higher-order operations
[ 1 2 3 4 5 ] [: dup * :] map . \ [ 1 4 9 16 25 ]
\ Strings and regex (PCRE2)
"x=42" "(\w+)=(\d+)" match . \ [ "x=42" "x" "42" ]
"hello world" "o" "0" replace . \ hell0 w0rld
\ Exceptions
[: "missing" throw :]
[: "got " . . cr :] try-catch \ prints "got missing"
\ Coroutines via delimited continuations
: yield shift ;
: producer 1 yield 2 yield 3 ;
reset producer \ leaves (1, k) — next value via resume
\ Logic: unify binds variables; amb is a committed choice
[ 1 2 3 ] [ X Y Z ] ~ drop X $ . Y $ . Z $ . cr \ 1 2 3
[: fail :] [: "fallback" :] amb . \ fallback
\ Multi-core: run a quotation across the array on every core
[ 1 2 3 4 5 6 7 8 ] [: dup * :] pmap . \ [ 1 4 9 16 25 36 49 64 ]
\ Statistics over a matrix column: mean and the median (0.5 quantile)
[ 2 4 4 4 5 5 7 9 ] 8 1 matrix dup mean . 0.5 quantile . \ 5 4.5
\ SQLite, in-memory: create, insert a bound param, query back
":memory:" db-open
dup "create table t(x)" [ ] db-exec drop
dup "insert into t values (?)" [ 42 ] db-exec drop
"select x from t" [ ] db-query :rows @ 0 @i :x @ . \ 42- Tagged Vals — floats, strings, symbols, sets, arrays, cons pairs, frames, matrices, execution tokens, dictionary addresses, continuations, logic variables, process streams, internal marks. A single 8-byte NaN-boxed representation; the tag determines interpretation.
- Direct-threaded inner interpreter — each dictionary cell is a handler function pointer, dispatched by an indirect tail call (
musttail); a colon call, literal, or branch carries its operand in the cell(s) right after the handler. The dictionary is the threaded code — no separate bytecode. - Compile-time instruction fusion — adjacent variable-reads and float ops collapse into single instructions (
var var f+→ one op;… var f+!fuses the store),f*+/f*-are fused multiply-add/subtract, and a comparison immediately before a branch (= if,gt while,0= until) fuses into a single compare-and-branch op. Variable-fused float words (vf+/vf*/… on one named variable,vvf+/vvf*/… on two) collapse the variable load into the float op. - Program and execution state separated — the dictionary, symbol pool, and object heap live in global structures (
Vocabulary,Compiler,Arena) that are read-only during a run; the per-run mutable state — the three stacks, instruction pointer, locals, and GC roots — lives in anInterpreter, so one program can be shared across multiple execution contexts. - Three stacks — data, return, and a side stack for stashing values that mustn't sit on the other two.
- Colon definitions —
: name body ;. The body is captured as source text forseeand the text-formsave. - Anonymous quotations —
[: ... :]pushes a fresh xt. Works at top level and inside colon defs. - Control flow —
if/else/then, thebegin/until/againandbegin/while/repeatloops, countedtimes/i-times,exit, and>r/r>/r@for return-stack access. - Tick and execute —
' word executefor first-class invocation by name. forget— truncate the dictionary back to a named word; symbol identities survive.- Variables and symbols —
variable foodeclares a global; read it by bare name, assign with42 to foo(toalso auto-creates a global on first assignment at the REPL).symbol bardefines a symbol;:foois a symbol literal interned on use;string>symbolinterns a computed string. - Word-local variables —
| x y |at the head of a colon definition or quotation declares scoped slots (initialized to0.0); read by bare name, assign withto name.++ name/-- nameincrement/decrement a local in place (f++/f--the unsafe float-only forms). Locals nest through quotations and survive continuation capture. - Mark-and-sweep GC — walks data/return/side stacks, dictionary, and a small
gc_rootsarray for in-flight C-level temporaries. It triggers on object-table pressure and on live-byte pressure, and runs at a safepoint between words so popped C-level operands stay live.
- Polymorphic arithmetic —
+/-/*//dispatch on operand tags: floats compute, strings concatenate (+), sets union/difference/intersection, matrices element-wise, a scalar broadcasts over a matrix, and arrays concatenate (+). - In-place matrix ops —
+!/-!/*!//!mutate the left matrix in place (explicit; the programmer decides). Float-only fast paths (f+,f-,f*,f/,f^, …) skip the type dispatch when both operands are known floats. - Matrix construction —
R C 0-matrix(zeros),[ ... ] R C matrix,V N diagonal-matrix(N×N with V on the diagonal),N identity-matrix,start end step matrix-range(a 1×N row over a stepped range). - DGEMM —
dgemm-nn/tn/nt/tt(αAB + βC) for all four transpose variants. The non-transposednnpath is ikj-ordered withrestrictpointers for cache-friendly access; the transposed variants use a straightforward triple loop. - Indexing —
@i/@j/@i,jto read rows, columns, or single cells. - Shape —
dim,reshape,flatten,transpose,diagonal. - Selection —
augment(concatenate two matrices column-wise),submatrix(copy a half-open row×column block),select-rows(gather rows named by a float index array). - Reductions —
sum,row-sums,column-sums,max,min,row-maxes,row-mins,column-maxes,column-mins. Librarymean,row-means,column-meanson top. - Descriptive statistics —
var(sample variance) andquantile(linearly interpolated at p ∈ [0,1]) over all elements; the loadable statistics library layersstd,se,median,percentile,iqr, andcion these. - Element-wise math —
abs,sqrt,exp,log,ln,sin,cos,tan,tanh,asin,acos,atan,round,truncate,round-up,round-down. Polymorphic over floats and matrices. - Total ordering —
=/lt/gtcompare matrices by shape then row-major contents, so matrices work as set members.
Flat, fixed-length typed numeric buffers stored off the arena (one allocation, freed by GC), for dense numeric data without per-element boxing and as FFI scratch.
int-segment/double-segment—( n -- seg )an n-element zero-filled buffer; both store doubles internally, so@ireads and!iwrites a float, sharing the array indexing words.segment>pointer— intern the backing buffer as aT_PTRfor an FFI:ptrargument, no copy.
- Set literals —
< 1 2 3 >, set operations,member?,size, in-placeset-add!/set-remove!, andarray>set(sort-and-dedup an array into a set in one pass). group-by—array :col group-bygroups frames by a symbol field into a frame from each value to a set of rows (the engine behind fast indexing and aggregation).- Array literals —
[ 1 2 3 ], thearrayconstructor (gather N from the stack),array-of(fill),range( from to -- arr ) for an ascending or descending integer sequence,iota( n -- [0..n-1] ), indexed access via@i, in-place store via!i. - Array operations —
sort(a sorted copy inval_cmporder),reverse,take,concat,flatten-array(flatten one level), andsample( arr count repl -- arr ) drawing elements with or without replacement. - Map, fold, zip-map, filter —
mapfor a single source,reducefor a left fold with an accumulator,mapnfor N-ary zip,filterto select by predicate, with anonymous quotations as the higher-order argument. - Destructuring —
destructspreads a set/array/frame's elements onto the stack (a frame as alternating symbol/value).destruct-to( values names -- ) takes two equal-length arrays and assigns each value to the global variable named by the corresponding symbol, creating it if absent. - In-place slicing —
slice!copies a strided run from one array into another,reverse-slice!reverses a run in place,to-slice!stores values from the stack into a range.
A thread-local xoshiro256** stream. Each worker thread derives its own stream from the shared base seed, so parallel draws are deterministic per worker.
seed—( n -- )set the global base seed and reset the stream.random—( -- f )a uniform float in[0, 1);random-int—( bound -- f )a uniform integer in[0, bound).sample(arrays) andresample-indices(datasets) draw on this stream.
Worker threads over one shared object heap (not fork): a quotation runs across the collection on several cores, results joining back by handle with no copy. Allocation in a region is per-worker; a region whose results don't escape is rewound wholesale, and live results are retained by handle.
pmap—( arr xt -- arr )parallelmap;pfilter—( arr pred -- arr )parallelfilter, order preserved;pmap-reduce—( arr id map-xt combine-xt -- val )fused parallel map+fold, withcombine-xtassociative andidits neutral element.-extforms —pmap-ext/pfilter-ext/pmap-reduce-exttake an explicit worker count and items-per-claim; the bare forms default tonum-coresworkers.num-cores— online CPU count.
Symbol-keyed nested maps — the associative type, and the compound term the logic layer builds on. The three bracket families are distinct: [ ] arrays, { } frames, < > sets.
- Literals —
{ :a 1 :b 2 }; values may be any Val, including nested frames, arrays, and sets. - Builders —
frame( keys values -- frame ) from two parallel collections,array>frame( kv-array -- frame ) from an alternating key/value array, andframe>array( frame -- kv-array ) the inverse, flattening to a key-sorted alternating array. - Path literals —
/a/b/cis a symbol array[ :a :b :c ], built once at compile time, used to address into the tree — and usable as a key when constructing a frame ({ /a/b/c v }/array>frame), where it vivifies nested frames. A path may also be a search pattern:*matches any child at that level,//matches at any depth (descendant-or-self), and[…]filters by predicate ([city=:NYC],[age>30],[.>0]on the node itself,[addr/zip]on a sub-path). - Access —
@( frame key/path -- value ) get,!( frame key/path value -- frame ) set with auto-vivified intermediates,has?existence test,delete-atremove,update-atapply a quotation to a leaf,mergecombine two frames (right wins), pluskeys/values/size. The single-location words (@,!,delete-at,update-at) take a:symbolkey or a plain/a/b/clocator and reject a search pattern;has?accepts either, answering whether any node matches. - Path queries —
select-values( frame pattern -- array ) returns every value matched by a*////predicate search pattern, in document order;select-keysreturns the full root-to-match path for each match (each round-trips back through@). Convert the result witharray>setfor distinct values orarray>consto feed matches tochoose. - Representation — sorted parallel key/value arrays with binary-search lookup; mutable in place, reference semantics. Structurally comparable, so frames work as set members and round-trip through their
{ }literal.
- String literals are raw (newlines allowed;
""is the one escape → a literal");formatfills{n}placeholders from the stack —"got {0} of {1}" format; polymorphic concatenation via+. - Regex on PCRE2 (Perl-compatible, JIT-compiled):
match(first match as a flat[ whole cap… ]),match-all(all matches, nested),replace(replace-all, with&/\1–\9backrefs), and thehas?string overload (does the pattern match?). Patterns are plain"..."literals — PCRE2 reads\d,\w,\n, lookaround,\p{...}. - Slicing / building —
substring(half-open codepoint range),char-at(the one-character string at a codepoint index),split(split at each non-overlapping match of a pattern, empty fields kept),join(concatenate an array of strings with a separator). - Unicode — strings are UTF-8 and the bare words work in codepoints:
size/substring/char-at/codepoint-atcount and index by codepoint, with byte-level forms (byte-size,byte-substring) for the raw layer and to pair with regex byte offsets.string>chars/string>codepointsdecompose a string,codepoint>char/codepoints>stringrebuild one, andemitUTF-8-encodes a codepoint. Regex runs in UTF + UCP mode:.matches a codepoint,\w/\d/\bare Unicode-aware, and invalid byte sequences are tolerated rather than erroring.
json>frame— parse a JSON string into native values: objects → frames (keys interned as symbols), arrays → arrays, strings → strings (escapes and\uXXXXdecoded to UTF-8), numbers → floats,true/false→ the reserved:1/:0boolean symbols,null→null(the none value). Recursive-descent, GC-safe, rejects trailing garbage.frame>json— serialize a value back to a JSON string: floats use a shortest round-trip representation, strings are escaped,:1/:0→true/false, none →null.
- Interactive REPL with full isocline line editing: theme-adaptive syntax highlighting, matching-brace highlighting, inline hints and Tab completion (word names from the live dictionary, filenames inside string literals), persistent history (
.logicforth_history), and multi-line editing —Ctrl+Jinserts a line,Entersubmits the whole buffer. Acount|topprompt shows stack depth and the top value, green on a terminal, red on error..pretty-prints a nested array across lines with the opening brackets aligned; strings print quoted inside a collection and in.s, raw when printed bare. loadruns a source file as if typed.savewrites the user's vocabulary as a re-loadable.l4source file.save-image/load-image— binary image with full state preservation (dictionary, objects, stacks, continuations).reloadtruncates user state and re-runs every fileloaded this session, in order.read-file/write-file/append-file— read a whole file as one (byte-safe) string; write or append a string's bytes to a path.env/env!— read an environment variable as a string (the none value if unset) and set one (process-wide, sostart-processchildren inherit it).
Drive external programs over pipes (fork/execvp/pipe/waitpid; binary-safe, no shell):
argv start-process— launch from an argv array; returns a frame{ :pid :in :out :err }with the child's pid and its stdin/stdout/stderr asT_STREAMvalues.write/read/close— write a string to a stream, read a stream to EOF, close one (closing:insends EOF).running?/wait/stop— non-blocking liveness check, block-until-exit, signal-and-reap.lib.l4conveniences:run(split a command line and start it),read-out/read-err/write-in.commands width parallel-run— run a batch of argv arrays concurrently, at mostwidthat a time, collecting{ :out :err :status }per command in input order (refills a slot as each child finishes). Process-level parallelism — e.g. firing off manycurlrequests at once.
Embedded relational storage via the vendored SQLite amalgamation — built into the binary, no external dependency. A database is a T_DB handle.
db-open/db-close— open (creating if absent, or":memory:"for an in-memory DB) and push a handle; close frees the connection and is idempotent.db-exec—( db statement params -- n )— run an INSERT/UPDATE/DELETE/CREATE withparamsbound to its?placeholders; returns the affected-row count (0 for DDL).db-query—( db query params -- rel )— run a query; returns a fact-database relation{ :rows <bag of row frames> :index { } }, each row keyed by column-name symbols (INTEGER/REAL → float, TEXT → string, NULL →null, BLOB → raw bytes). Duplicates are kept, in result order; the result drops straight intoquery/inner-join.- Bound parameters —
paramsis an array bound positionally to the?placeholders ([ ]for none); floats, strings, symbols, andnullbind, so string values need no hand-escaping. create-index—( rel cols -- rel ),lib.l4— index a query result oncols, interning those columns to symbols so the fact-db index andquerycan use them.
TSV is the one tabular file format (convert other formats to TSV before loading).
read-tsv/write-tsv— read a file into an array of row-arrays (a numeric cell becomes a float, an empty cellnone, everything else a string) and write one back.rows>dataset—( table header? -- frame )a column-oriented frame, one array per column;rows>relation—( table index-cols header? -- relation )a deduped, indexed fact-database relation;dataset>matrix—( dataset cols -- m )an observations×columns numeric matrix from named columns.resample-indices—( n -- arr )n indices drawn from[0,n)with replacement, for bootstrap resampling.
The statistics library (lib/statistics.l4, loaded on demand) builds on the matrix and FFI layers:
- Descriptive —
std,se,median,percentile,iqr,ci(percentile confidence interval). - Resampling —
bootstrap/pbootstrap(parallel) over a fit quotation. - Linear algebra —
svdandfit-linear(least-squares) call LAPACK'sdgesvd/dgelsdthrough the FFI (the Accelerate-backed dylib on macOS). - Regression —
linear-regressionandlogistic-regression(IRLS with Firth correction), each returning per-coefficient estimate, standard error, bias, and confidence interval from a bootstrap.
Call C functions in any shared library at runtime via libffi — no per-library glue. An opaque C pointer is a T_PTR handle (a registry index, since a 64-bit pointer doesn't fit a Val).
ffi-open—( path -- lib )—dlopena library and push a handle;""opens the running process for already-linked symbols.ffi-function—( lib symbol arg-types ret-type -- ) <name>— resolve a symbol and define the following word<name>to call it. Types are symbols::void :int :long :double :ptr :string. Floats marshal to/from Cint/long/double, strings pass asconst char*(a returnedchar*is copied back into a string),:ptris an opaque handle. The call interface is prepared once; calls are ~30–100 ns.ffi-variadic—( lib symbol arg-types ret-type n-fixed -- ) <name>— the same for a variadic C function (ffi_prep_cif_var);n-fixedleading args are fixed, the rest variadic, with the variadic types fixed per binding. Enough to driveprintf,curl_easy_setopt, etc.ffi-free—( ptr -- )—freea C buffer held as aT_PTR.matrix>pointer/segment>pointer— intern a matrix's or segment's element buffer as aT_PTR(no copy, aliasing the live buffer) to pass dense numeric data to a:ptrparameter.- Examples:
"/usr/lib/libcurl.4.dylib" ffi-openplus a few declarations drives a real libcurl HTTPS request in-process, no subprocess; the statistics library drives LAPACK'sdgesvd/dgelsdthe same way (matrices in viamatrix>pointer). FFI is unsafe — a wrong signature corrupts or crashes; arg count is checked, types are the caller's responsibility.
A four-primitive substrate the rest of the control story is built on. See docs/continuations.md for the full treatment.
reset— installs a delimiter (a uniquely-tagged mark on the return stack).shift— captures the slice up to the nearest reset, removes the mark and captured frames, pushes the continuation as aT_CONTVal. Used for coroutines and generators.shift-with— same capture, but runs a handler xt in the outer context after the unwind. Used for exceptions and restarts.resume— re-enters a captured continuation. Multi-shot.
Coroutines on the continuation primitives, in lib.l4:
yield— emit a value to the driver and suspend until resumed.start-generator— run a producer to its firstyield, leaving the yielded value and a resumable continuation.gen-take— collect the first N values a producer yields into an array;gen-each— run a consumer on each yielded value until the producer falls off.
A third stack for stashing arbitrary Vals without disturbing the data or return stack: >side, side>, side-drop, side-depth.
Built in lib.l4 on top of the continuation primitives:
throw— non-local exit with a value.catch— wraps an xt; returns(result 0)on success,(exc 1)on throw.try-catch— wraps an xt with a recovery handler. Arity-agnostic.
The shift-with handler can also resume the captured continuation, giving the Common Lisp restart pattern — exceptions can recover rather than just abort.
Unification and committed choice, on the trail and the continuation machinery:
- Logic variables —
lvarmakes a fresh one; a capitalized identifier is a logic-var literal: a persistent global at the REPL, or a fresh per-call variable when declared in| X |inside a definition or quotation. unify(~) — unifies two terms, binding logic vars through a trail so they match: atoms by value, arrays element-wise, frames as open records (shared keys must unify, extras allowed); on a mismatch it fails.deref($) follows a variable's binding chain.amb/fail— committed choice: run the first branch; if it fails (aunifymismatch or an explicitfail), roll its bindings back through the trail and run the second, committing to whichever succeeds.choosegeneralizes it to a cons list, running a continuation with each element until one succeeds._— the anonymous wildcard: unifies with anything, binds nothing, and allocates nothing.matches?— a non-destructiveunifytest: marks the trail, unifies, rolls back, and pushes whether the two unified — so it composes in straight-line code.- Cons lists —
[( a b c )]builds cons pairs and[( H T )]is the[H|T]head/tail pattern underunify; withcons,head-tail, andarray↔consconversions. - Fact database —
relation/assert/query/retract/count-matches/inner-join. A relation is a frame of a row-set plus per-column indexes (declared symbol columns); rows are column-keyed frames that dedup;querymatches a pattern by unification, narrowing through the index (and returning the bucket directly when the index covers the whole pattern).inner-joinmerges two relations on a shared column via index probing;bulk-loadbuilds a whole relation in one sorted pass (array>setfor the rows,group-byper index) instead of row-by-row. The same row-frame shape is what a SQLite query would return.
dup,drop,swap,over,rot,depth,roll,clear— stack-manipulation primitives.copy/reify— deep copy of a value (strings, arrays, sets, frames, matrices);reifyadditionally renames unbound logic vars to canonical:_0/:_1/… for a ground, storable, comparable snapshot.now— current Unix time as a float (seconds since epoch).see— prints a word's source definition;see-compileddisassembles its threaded body.man—( xt -- fr ), returns a frame of a word's reference entry (stack effect, one-line summary, cost notes).help nameprints it for the named word.words,forget,bye,gc,clear,.s,.a— interpreter utilities.
See PLAN.md.
src/c/logicforth.h — types, global program structs (Vocabulary/Arena/Compiler), per-run Interpreter, prototypes
src/c/core.c — engine: interpreter, dictionary, GC, printing, image, REPL
src/c/words.c — arithmetic, stack, I/O, control flow, defining words, continuations
src/c/collections.c — sets, arrays, and frames
src/c/matrix.c — matrix words and numeric kernels
src/c/functional.c — higher-order operations (map, mapn, …)
src/c/superwords.c — compile-time instruction fusion (superwords)
src/c/strings.c — string and PCRE2 regex operations
src/c/logic.c — logic variables, unification, amb, fact database
src/c/database.c — SQLite integration
src/c/foreign.c — FFI (libffi), pointer registry, matrix/segment bridges
src/c/help_table.c — generated help/man text (from docs/reference.md)
src/forth/lib.l4 — standard library (embedded, auto-loaded at startup)
lib/ — loadable libraries: statistics.l4, files.l4, claude.l4
external/ — vendored deps: pcre2, sqlite, isocline, lapacke
tests/ — golden-output test files
bench/ — benchmark suite (logicforth vs CPython) and inventory
docs/ — design documents and the word reference
examples/ — sample programs
PLAN.md — future work
See LICENSE.