Skip to content

scrub private information from simpletuner_config.json inside ckpt export#2741

Merged
bghira merged 1 commit into
mainfrom
bugfix/simpletuner-config-in-checkpoint-having-garbage-in-it
Jun 5, 2026
Merged

scrub private information from simpletuner_config.json inside ckpt export#2741
bghira merged 1 commit into
mainfrom
bugfix/simpletuner-config-in-checkpoint-having-garbage-in-it

Conversation

@bghira
Copy link
Copy Markdown
Owner

@bghira bghira commented Jun 5, 2026

This pull request enhances the handling and sanitization of training configuration data before exporting it, particularly focusing on excluding sensitive information and improving serialization. It introduces new helper functions for key normalization and sensitive key detection, updates the serialization logic to handle more data types, and adds comprehensive tests to ensure sensitive fields are excluded from exported configs.

Improvements to configuration sanitization and serialization:

  • Added sets and tuples (_EXCLUDED_TRAINING_CONFIG_KEYS, _SENSITIVE_TRAINING_CONFIG_KEY_PARTS, _SENSITIVE_TRAINING_CONFIG_KEYS) to define which config keys should be excluded or treated as sensitive, and implemented logic to skip these keys when serializing training configs (simpletuner/helpers/publishing/metadata.py).
  • Introduced _normalise_training_config_key and _should_skip_training_config_key helper functions to consistently identify and filter out sensitive or excluded keys during config export (simpletuner/helpers/publishing/metadata.py).
  • Refactored and expanded _make_training_config_serializable to handle additional data types (such as Enum, Path, torch.dtype, torch.device, np.generic, and np.ndarray) and to recursively sanitize nested structures, ensuring all exported config data is safe and JSON-serializable (simpletuner/helpers/publishing/metadata.py).

Testing and validation:

  • Added a new test, test_save_training_config_sanitizes_public_export, to verify that sensitive fields are excluded and allowed fields are correctly serialized when saving training configs (tests/test_model_card.py).

General code improvements:

  • Updated imports to support new serialization logic and testing, including importing Enum, Path, SimpleNamespace, and tempfile (simpletuner/helpers/publishing/metadata.py, tests/test_model_card.py) [1] [2] [3].

@bghira bghira merged commit 5c95311 into main Jun 5, 2026
2 checks passed
@bghira bghira deleted the bugfix/simpletuner-config-in-checkpoint-having-garbage-in-it branch June 5, 2026 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant