Skip to content

Add model config system (MULTI-1382)#3

Merged
RobbieMcKinstry merged 4 commits into
trunkfrom
claude/multi-1382-model-config-nah6o3
Jun 29, 2026
Merged

Add model config system (MULTI-1382)#3
RobbieMcKinstry merged 4 commits into
trunkfrom
claude/multi-1382-model-config-nah6o3

Conversation

@RobbieMcKinstry

Copy link
Copy Markdown
Contributor

Build out config.rs into the single source of truth the model, loss, and
training loop instantiate against:

  • ModelConfig with model dims (vocab_size, context_length, d_model, d_ff,
    num_layers, num_heads) plus the norm_first layer-norm-placement toggle
    (post-norm = paper-faithful, pre-norm = modern/stable; default deferred to
    Phase 3, ships provisional pre-norm).
  • Named sizes: the headline ~100M GPT-2-small-class config (12L/768/12H, 4x
    FFN, 1024 context) and a debug-tiny size, enumerated by ModelSize.
  • TrainingConfig hyperparameters and a RunConfig bundle so a run is fully
    described by, and reproducible from, a single serialized config.
  • vocab_size wired to a single tokenizer::VOCAB_SIZE placeholder, to be
    finalized when the byte-level BPE tokenizer locks (MULTI-1379).
  • approx_parameter_count / head_dim helpers and serde round-trip tests.

The on-disk config-file layout and the CLI that loads it follow the house
convention and are deferred pending sign-off (see MULTI-1382).

Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com
Claude-Session: https://claude.ai/code/session_012VozmaFNnAz5SgEuzdY6dR

claude added 4 commits June 29, 2026 14:26
Build out config.rs into the single source of truth the model, loss, and
training loop instantiate against:

- ModelConfig with model dims (vocab_size, context_length, d_model, d_ff,
  num_layers, num_heads) plus the norm_first layer-norm-placement toggle
  (post-norm = paper-faithful, pre-norm = modern/stable; default deferred to
  Phase 3, ships provisional pre-norm).
- Named sizes: the headline ~100M GPT-2-small-class config (12L/768/12H, 4x
  FFN, 1024 context) and a debug-tiny size, enumerated by ModelSize.
- TrainingConfig hyperparameters and a RunConfig bundle so a run is fully
  described by, and reproducible from, a single serialized config.
- vocab_size wired to a single tokenizer::VOCAB_SIZE placeholder, to be
  finalized when the byte-level BPE tokenizer locks (MULTI-1379).
- approx_parameter_count / head_dim helpers and serde round-trip tests.

The on-disk config-file layout and the CLI that loads it follow the house
convention and are deferred pending sign-off (see MULTI-1382).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_012VozmaFNnAz5SgEuzdY6dR
Reorganize the CLI to mirror the established multitool/aviary structure
(file layout + dispatch pattern), per owner direction:

- src/bin/main.rs: thin entrypoint — `Cli::parse()` then `dispatch_command`,
  which inits logging from the global `--log-level` flag and delegates; an
  empty invocation prints long help.
- src/config/: the clap layer — `cli.rs` (the `Cli` struct, derive-getters +
  global flags), `command.rs` (the `Command` subcommand enum + `dispatch`),
  and one `<Sub>Subcommand` args struct per subcommand (train/generate/serve).
  The model/run config (`ModelConfig`, `ModelSize`, `TrainingConfig`,
  `RunConfig`) moves here too, split into `model.rs`/`run.rs`.
- src/cmd/: one handler per subcommand, each `new(args)` + `dispatch()`,
  delegated to from `Command::dispatch`. Handlers are wired but still return
  "not implemented yet"; later tickets fill them in and inherit this layout.

`ModelSize` gains `clap::ValueEnum` so `--size` selects a named architecture.
Crate root re-exports `Cli`. README and CLAUDE.md updated for the new tree.

Adds derive-getters (locked via Cargo.lock) to match the house getter style.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_012VozmaFNnAz5SgEuzdY6dR
…(MULTI-1382)

The prompt is the natural required input to generation, so model it as a
required positional rather than an optional flag. A value of `-` is the
conventional sentinel for "read the prompt from stdin"; `read_prompt()`
resolves it (literal otherwise). Covered by parse/resolution tests.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_012VozmaFNnAz5SgEuzdY6dR
Bring the house config-loading abstraction into wubbie: a `LayeredConfig`
builder over `figment` that merges configuration sources by increasing
precedence and extracts a fully-specified struct with no `Option`s.

- config/loader.rs: `LayeredConfig` (defaults → file → env → CLI overrides);
  `with_file` selects by extension (.toml/.json[c]), requires the explicit
  file to exist, but tolerates partial files; `extract` resolves into the
  non-optional target, so a field unset by every layer with no default is a
  hard error rather than a silent `None`.
- PartialModelConfig: the Option-heavy per-layer shape; only set fields
  serialize, so an unset value never clobbers a lower-precedence layer.
- ModelConfigArgs: flattened per-field CLI overrides; `load_model_config`
  wires `--size` (base) → `--config` file → `WUBBIE_MODEL_*` env → flags.
- `wubbie train` now resolves the layered ModelConfig up front (surfacing
  config errors before the not-yet-implemented training loop).

Tests use `figment::Jail` for hermetic file/env precedence checks, including
the missing-field-with-no-default error and explicit-missing-file rejection.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_012VozmaFNnAz5SgEuzdY6dR
@RobbieMcKinstry RobbieMcKinstry merged commit 2f02fb9 into trunk Jun 29, 2026
3 checks passed
@RobbieMcKinstry RobbieMcKinstry deleted the claude/multi-1382-model-config-nah6o3 branch June 29, 2026 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants