A small disk-backed database engine in C++17. The project is being built bottom-up: page-level storage first, then a tuple/catalog layer, then the SQL front end (parser → analyzer), then a planner that lowers the bound AST into a tree of Volcano-style operators, and finally an executor that drives that tree to produce rows.
SQL string
→ Parser → SelectQuery (string-based AST)
→ Analyzer → BoundSelect (numeric indices + types)
→ Planner → PlanNode tree (SeqScan / NestedLoopJoin / Filter)
→ Executor → ExecResult (rows of Values)
The pieces below are wired end-to-end and exercised by main.cpp.
A layered stack over a single OS file of fixed-size 4 KiB pages. Each
layer adds meaning over the bytes below; only DiskManager ever
touches the file.
┌───────────────────────────────────────────────────┐
│ Catalog name → (schema, root_page) │
│ TupleCodec Values ↔ bytes (schema-aware) │
├───────────────────────────────────────────────────┤
│ HeapFile chain of slotted pages │
│ SlottedPage one page's header + slots + rows │
├───────────────────────────────────────────────────┤
│ BufferPool RAM cache: LRU, pin/dirty bits, │
│ PageGuard (RAII) │
├───────────────────────────────────────────────────┤
│ DiskManager page N ↔ byte offset N × 4096 │
└────────────────────────┬──────────────────────────┘
▼
┌─────────────────────┐
│ one OS file │
│ all durable state │
└─────────────────────┘
The four core storage layers, bottom-up:
DiskManager— read/write/allocate raw pages byPageId. Page N lives at byte offsetN * PAGE_SIZE. Files only grow.BufferPool— fixed-size frame array caching pages from theDiskManager. LRU eviction over unpinned frames;PageGuard(RAII) handles pin/unpin and dirty marking automatically.SlottedPage— byte-level page format. Header + slot directory growing down, packed tuple bytes growing up. Slot IDs are stable across compaction;removeleaves a tombstone that a later insert may reuse.HeapFile— an unordered collection of tuples spread across a chain of slotted pages linked by each page'snext_page_id. Supportsinsert,remove, point lookup byRID, and forward iteration.
After seeding, the demo's database file looks like this on disk:
/tmp/dbms_demo.db (16 KiB total = 4 pages)
┌──────────┬──────────┬──────────┬──────────┐
│ page 0 │ page 1 │ page 2 │ page 3 │
│ __tables │ __columns│ users │ posts │
└──────────┴──────────┴──────────┴──────────┘
0–4095 4096–8191 8192–12287 12288–16383 (byte offsets)
Each page is a 4 KiB SlottedPage:
┌─────────┬─────────────────┬──────────┬───────────────┐
│ header │ slot[0] slot[1] │ ← free │ … tuples … │
│ ~16 B │ (grows down) │ │ (grow up) │
└─────────┴─────────────────┴──────────┴───────────────┘
__tables and __columns are the catalog's two system tables (rows
of (table_id, name, root_page) and (table_id, position, name, type, nullable)). They live at hard-coded page IDs so cold-open requires no
manifest — bootstrap is "open the heap files at pages 0 and 1, walk
them, rebuild every user table's TableInfo in memory."
tuple.{h,cpp}—Type,Value,Schema, andTupleCodecfor encoding/decoding a row to/from bytes.Int32/Int64/Boolare fixed width;Textis length-prefixed.catalog.{h,cpp}— persistent table catalog stored as two bootstrap heap files at hard-coded pages:__tablesat page 0 and__columnsat page 1. On startup the catalog is reconstructed into an in-memory cache;createTableupdates both the disk system tables and the cache.analyzer.{h,cpp}— walks a parsedSelectQueryagainst aCatalog, resolves every name to numeric indices, and type-checks every operator. Produces aBoundSelect(the parallel of the parser AST, but with strings replaced by(table_index, column_index)pairs and every node carrying aresult_type). Throws on name-resolution failure or type mismatch.plan_node.{h,cpp}— thePlanNodeinterface (Volcano-style:open() / next() / close() / describe()) and the wide-rowExecRow = vector<vector<Value>>shape every operator emits. Plus freeevalExpr/evalBinaryOpfor use by Filter and projection.operators.{h,cpp}— three concretePlanNodes:SeqScan— walks one heap file, decodes each tuple, populates its assigned slot in the wide row.NestedLoopJoin— materializes the right child onopen(), nested-loops over (left × right), overlays the two children's disjoint slots, emits combined rows where the ON predicate holds.Filter— pulls from its child until the predicate is true.
planner.{h,cpp}—Planner::plan(BoundSelect&): a 1:1 lowering to a left-deep operator tree (outerSeqScan, then a stack ofNestedLoopJoins, then an optionalFilteron top). No alternatives yet — when there are (hash join, index scan, predicate pushdown), this is the seam where an optimizer plugs in.executor.{h,cpp}— thin coordinator. Builds the plan, drivesroot->open() / next() / close(), applies the SELECT list to each wide row to produce a flat-rowExecResult { column_names, column_types, rows }.execute(BoundSelect bs)consumes its argument because the planner moves the WHERE expression out.
A query like SELECT name FROM users WHERE age > 18 flows through four
stages, each producing a different intermediate representation:
SQL string
"SELECT name FROM users WHERE age > 18"
│
▼ Parser (tokenize + recursive-descent)
│
SelectQuery (string-based AST)
table = "users"
columns = ["name"]
where = { column: "age", op: Gt, value: "18" }
│
▼ Analyzer (resolve names against Catalog, type-check)
│
BoundSelect (numeric indices + types)
from_tables = [ users_info ]
select_list = [ ColumnRef{0, 1, Text} ] // "name"
where = BinaryOp{
Gt,
ColumnRef{0, 2, Int32}, // "age"
Literal{18, Int32}
}
│
▼ Planner (1:1 lowering to operator tree)
│
PlanNode tree
Filter(age > 18)
└── SeqScan(users)
│
▼ Executor (open / next / close, project)
│
▼
ExecResult
column_names = ["name"]
column_types = [Text]
rows = [ ["alice"], ["carol"], ["eve"] ]
Every layer's job in one phrase: Parser turns characters into a
tree of strings; Analyzer turns strings into integers (against the
Catalog); Planner turns the tree into operators; Executor
drives the operators and projects the SELECT list.
Hand-written recursive-descent parser for a single SELECT statement.
Supported grammar:
SELECT (* | <col-list>)
FROM <table>
{JOIN <table> ON <col> = <col>}
[WHERE <col> <op> <literal>]
Columns may be qualified (users.id). Comparison ops: = != < > <= >=.
Joins are inner joins on a single equality between two columns.
SELECT *and explicit column lists, qualified or bare.- One or more
JOIN ... ON <col> = <col>clauses; ON columns may reference any table in the FROM/JOIN list and must share a type. - A
WHERE <col> <op> <literal>clause; the literal is parsed under the column's expected type, with range checking forInt32. - Bare-column ambiguity, unknown table/column, and type-mismatch errors
are all reported as
std::runtime_error.
Aliases are not yet supported, so the same physical table cannot appear twice in one query.
UPDATE/DELETEat the SQL surface (CREATE TABLEandINSERTare wired through Parser → Analyzer → Executor — seegrammar.md).ORDER BY,LIMIT, aggregates, expressions in the SELECT list, and table aliases.- Alternative operators (
HashJoin,IndexScan, ...) and a real optimizer that picks between them. The plan tree is now a data structure, so these slot in as newPlanNodesubclasses plus rewrite passes — but neither exists yet. The current planner does a fixed 1:1 lowering. EXPLAINat the SQL surface. Operators already have adescribe()method, so it's a thin addition once we want to wire it up.- Indexes, transactions, recovery, concurrency control.
g++with C++17 support (orclang++— adjustCXXin theMakefile)make
Build the dbms binary:
make dbmsRun the demo (main.cpp):
./dbmsmain.cpp is one end-to-end flow that exercises every layer:
- open a fresh database file and bootstrap the catalog (system-table pages 0/1);
- issue
CREATE TABLE+ multi-rowINSERTforusersandpoststhrough Parser → Analyzer → Executor; flush; - cold-reopen the file with a brand-new
BufferPoolandCatalog; - run a handful of SELECT strings (including filters and a join) through Parser → Analyzer → Planner → Executor and print each result as a padded table.
Or do both build + run in one step:
make runTests use doctest (vendored at
tests/vendor/doctest.h, no install needed):
make testThis builds and runs build/run_tests, which covers the parser, every
storage layer (disk manager, buffer pool, slotted page, heap file, plus
an end-to-end integration test), the tuple codec, the catalog, the
analyzer, the executor (end-to-end SQL → ExecResult cases including a
cold-reopen), and the operators directly (SeqScan / Filter /
NestedLoopJoin state machines, empty-input edge cases, and the
describe() EXPLAIN output). Pass doctest flags by invoking the binary
directly, e.g.:
build/run_tests --help
build/run_tests --test-case="SELECT *"make cleanRemoves build/ and the dbms binary.