Add model config system (MULTI-1382) by RobbieMcKinstry · Pull Request #3 · wack/wubbie

RobbieMcKinstry · 2026-06-29T14:44:05Z

Build out config.rs into the single source of truth the model, loss, and
training loop instantiate against:

ModelConfig with model dims (vocab_size, context_length, d_model, d_ff,
num_layers, num_heads) plus the norm_first layer-norm-placement toggle
(post-norm = paper-faithful, pre-norm = modern/stable; default deferred to
Phase 3, ships provisional pre-norm).
Named sizes: the headline ~100M GPT-2-small-class config (12L/768/12H, 4x
FFN, 1024 context) and a debug-tiny size, enumerated by ModelSize.
TrainingConfig hyperparameters and a RunConfig bundle so a run is fully
described by, and reproducible from, a single serialized config.
vocab_size wired to a single tokenizer::VOCAB_SIZE placeholder, to be
finalized when the byte-level BPE tokenizer locks (MULTI-1379).
approx_parameter_count / head_dim helpers and serde round-trip tests.

The on-disk config-file layout and the CLI that loads it follow the house
convention and are deferred pending sign-off (see MULTI-1382).

Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com
Claude-Session: https://claude.ai/code/session_012VozmaFNnAz5SgEuzdY6dR

Build out config.rs into the single source of truth the model, loss, and training loop instantiate against: - ModelConfig with model dims (vocab_size, context_length, d_model, d_ff, num_layers, num_heads) plus the norm_first layer-norm-placement toggle (post-norm = paper-faithful, pre-norm = modern/stable; default deferred to Phase 3, ships provisional pre-norm). - Named sizes: the headline ~100M GPT-2-small-class config (12L/768/12H, 4x FFN, 1024 context) and a debug-tiny size, enumerated by ModelSize. - TrainingConfig hyperparameters and a RunConfig bundle so a run is fully described by, and reproducible from, a single serialized config. - vocab_size wired to a single tokenizer::VOCAB_SIZE placeholder, to be finalized when the byte-level BPE tokenizer locks (MULTI-1379). - approx_parameter_count / head_dim helpers and serde round-trip tests. The on-disk config-file layout and the CLI that loads it follow the house convention and are deferred pending sign-off (see MULTI-1382). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_012VozmaFNnAz5SgEuzdY6dR

Reorganize the CLI to mirror the established multitool/aviary structure (file layout + dispatch pattern), per owner direction: - src/bin/main.rs: thin entrypoint — `Cli::parse()` then `dispatch_command`, which inits logging from the global `--log-level` flag and delegates; an empty invocation prints long help. - src/config/: the clap layer — `cli.rs` (the `Cli` struct, derive-getters + global flags), `command.rs` (the `Command` subcommand enum + `dispatch`), and one `<Sub>Subcommand` args struct per subcommand (train/generate/serve). The model/run config (`ModelConfig`, `ModelSize`, `TrainingConfig`, `RunConfig`) moves here too, split into `model.rs`/`run.rs`. - src/cmd/: one handler per subcommand, each `new(args)` + `dispatch()`, delegated to from `Command::dispatch`. Handlers are wired but still return "not implemented yet"; later tickets fill them in and inherit this layout. `ModelSize` gains `clap::ValueEnum` so `--size` selects a named architecture. Crate root re-exports `Cli`. README and CLAUDE.md updated for the new tree. Adds derive-getters (locked via Cargo.lock) to match the house getter style. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_012VozmaFNnAz5SgEuzdY6dR

…(MULTI-1382) The prompt is the natural required input to generation, so model it as a required positional rather than an optional flag. A value of `-` is the conventional sentinel for "read the prompt from stdin"; `read_prompt()` resolves it (literal otherwise). Covered by parse/resolution tests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_012VozmaFNnAz5SgEuzdY6dR

Bring the house config-loading abstraction into wubbie: a `LayeredConfig` builder over `figment` that merges configuration sources by increasing precedence and extracts a fully-specified struct with no `Option`s. - config/loader.rs: `LayeredConfig` (defaults → file → env → CLI overrides); `with_file` selects by extension (.toml/.json[c]), requires the explicit file to exist, but tolerates partial files; `extract` resolves into the non-optional target, so a field unset by every layer with no default is a hard error rather than a silent `None`. - PartialModelConfig: the Option-heavy per-layer shape; only set fields serialize, so an unset value never clobbers a lower-precedence layer. - ModelConfigArgs: flattened per-field CLI overrides; `load_model_config` wires `--size` (base) → `--config` file → `WUBBIE_MODEL_*` env → flags. - `wubbie train` now resolves the layered ModelConfig up front (surfacing config errors before the not-yet-implemented training loop). Tests use `figment::Jail` for hermetic file/env precedence checks, including the missing-field-with-no-default error and explicit-missing-file rejection. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_012VozmaFNnAz5SgEuzdY6dR

claude added 4 commits June 29, 2026 14:26

RobbieMcKinstry merged commit 2f02fb9 into trunk Jun 29, 2026
3 checks passed

RobbieMcKinstry deleted the claude/multi-1382-model-config-nah6o3 branch June 29, 2026 15:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add model config system (MULTI-1382)#3

Add model config system (MULTI-1382)#3
RobbieMcKinstry merged 4 commits into
trunkfrom
claude/multi-1382-model-config-nah6o3

RobbieMcKinstry commented Jun 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

RobbieMcKinstry commented Jun 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants