Add model config system (MULTI-1382)#3
Merged
Conversation
Build out config.rs into the single source of truth the model, loss, and training loop instantiate against: - ModelConfig with model dims (vocab_size, context_length, d_model, d_ff, num_layers, num_heads) plus the norm_first layer-norm-placement toggle (post-norm = paper-faithful, pre-norm = modern/stable; default deferred to Phase 3, ships provisional pre-norm). - Named sizes: the headline ~100M GPT-2-small-class config (12L/768/12H, 4x FFN, 1024 context) and a debug-tiny size, enumerated by ModelSize. - TrainingConfig hyperparameters and a RunConfig bundle so a run is fully described by, and reproducible from, a single serialized config. - vocab_size wired to a single tokenizer::VOCAB_SIZE placeholder, to be finalized when the byte-level BPE tokenizer locks (MULTI-1379). - approx_parameter_count / head_dim helpers and serde round-trip tests. The on-disk config-file layout and the CLI that loads it follow the house convention and are deferred pending sign-off (see MULTI-1382). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_012VozmaFNnAz5SgEuzdY6dR
Reorganize the CLI to mirror the established multitool/aviary structure (file layout + dispatch pattern), per owner direction: - src/bin/main.rs: thin entrypoint — `Cli::parse()` then `dispatch_command`, which inits logging from the global `--log-level` flag and delegates; an empty invocation prints long help. - src/config/: the clap layer — `cli.rs` (the `Cli` struct, derive-getters + global flags), `command.rs` (the `Command` subcommand enum + `dispatch`), and one `<Sub>Subcommand` args struct per subcommand (train/generate/serve). The model/run config (`ModelConfig`, `ModelSize`, `TrainingConfig`, `RunConfig`) moves here too, split into `model.rs`/`run.rs`. - src/cmd/: one handler per subcommand, each `new(args)` + `dispatch()`, delegated to from `Command::dispatch`. Handlers are wired but still return "not implemented yet"; later tickets fill them in and inherit this layout. `ModelSize` gains `clap::ValueEnum` so `--size` selects a named architecture. Crate root re-exports `Cli`. README and CLAUDE.md updated for the new tree. Adds derive-getters (locked via Cargo.lock) to match the house getter style. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_012VozmaFNnAz5SgEuzdY6dR
…(MULTI-1382) The prompt is the natural required input to generation, so model it as a required positional rather than an optional flag. A value of `-` is the conventional sentinel for "read the prompt from stdin"; `read_prompt()` resolves it (literal otherwise). Covered by parse/resolution tests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_012VozmaFNnAz5SgEuzdY6dR
Bring the house config-loading abstraction into wubbie: a `LayeredConfig` builder over `figment` that merges configuration sources by increasing precedence and extracts a fully-specified struct with no `Option`s. - config/loader.rs: `LayeredConfig` (defaults → file → env → CLI overrides); `with_file` selects by extension (.toml/.json[c]), requires the explicit file to exist, but tolerates partial files; `extract` resolves into the non-optional target, so a field unset by every layer with no default is a hard error rather than a silent `None`. - PartialModelConfig: the Option-heavy per-layer shape; only set fields serialize, so an unset value never clobbers a lower-precedence layer. - ModelConfigArgs: flattened per-field CLI overrides; `load_model_config` wires `--size` (base) → `--config` file → `WUBBIE_MODEL_*` env → flags. - `wubbie train` now resolves the layered ModelConfig up front (surfacing config errors before the not-yet-implemented training loop). Tests use `figment::Jail` for hermetic file/env precedence checks, including the missing-field-with-no-default error and explicit-missing-file rejection. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_012VozmaFNnAz5SgEuzdY6dR
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Build out config.rs into the single source of truth the model, loss, and
training loop instantiate against:
num_layers, num_heads) plus the norm_first layer-norm-placement toggle
(post-norm = paper-faithful, pre-norm = modern/stable; default deferred to
Phase 3, ships provisional pre-norm).
FFN, 1024 context) and a debug-tiny size, enumerated by ModelSize.
described by, and reproducible from, a single serialized config.
finalized when the byte-level BPE tokenizer locks (MULTI-1379).
The on-disk config-file layout and the CLI that loads it follow the house
convention and are deferred pending sign-off (see MULTI-1382).
Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com
Claude-Session: https://claude.ai/code/session_012VozmaFNnAz5SgEuzdY6dR