Skip to content

Fix ContextBuilder checkpoint loading for non-default architectures#13

Open
harens wants to merge 1 commit into
Thijsvanede:mainfrom
harens:fix/contextbuilder-load-metadata
Open

Fix ContextBuilder checkpoint loading for non-default architectures#13
harens wants to merge 1 commit into
Thijsvanede:mainfrom
harens:fix/contextbuilder-load-metadata

Conversation

@harens
Copy link
Copy Markdown

@harens harens commented Apr 23, 2026

This PR updates ContextBuilder persistence so checkpoints store the architecture metadata needed to reconstruct the model reliably.

Previously, save() wrote only the raw state_dict, and load() inferred constructor settings from tensor shapes. That worked for some default models, but was unreliable for non-default configurations such as custom num_layers, bidirectional encoders, or LSTM-based models.

Changes:

  • Store ContextBuilder constructor metadata alongside the state_dict.
  • Restore saved input_size, output_size, hidden_size, num_layers, max_length, bidirectional, and LSTM values on load.
  • Preserve backwards compatibility with older raw state_dict checkpoints.
  • Infer num_layers, bidirectional, and LSTM from recurrent tensor keys/shapes where older checkpoints do not include metadata.

This should make checkpoint round-tripping reliable for non-default architectures while keeping existing saved models loadable.

For legacy checkpoints, architecture parameters are inferred from PyTorch recurrent weight naming conventions (e.g. weight_ih_l{k},_reverse suffix, and gate dimensionality). This is a best-effort heuristic.

Save ContextBuilder architecture settings alongside the state_dict so non-default num_layers, bidirectional, and LSTM configurations can be restored.

Keep loading older raw state_dict checkpoints by inferring constructor settings from stored tensor shapes and recurrent layer keys where possible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant