Integrate SmartDiskCache for hash-based persistent caching by BitcrushedHeart · Pull Request #1411 · Nerogar/OneTrainer

BitcrushedHeart · 2026-04-06T07:47:12Z

SmartDiskCache Integration

What This Is

Wires OneTrainer into the new 'SmartDiskCache' module from the companion mgds PR (Nerogar/mgds#49). The cache becomes persistent and content-addressed. It grows over time and only rebuilds what's genuinely stale, rather than wiping and rebuilding every time a file changes.

What Changed

Config

'sourceless_training' field added to TrainConfig with migration (migration_10). Default 'False'. 'clear_cache_before_training' default changed to False since SmartCache makes forced rebuilds unnecessary in most cases.

UI

Sourceless Training toggle in the Data tab - trains from cached .pt files without source images/text
Clean Cache button in the Data tab. shows a preview of orphaned cache files (count + MB) before deleting anything, handles both text and image cache directories
Updated clear_cache_before_training tooltip to reflect that SmartCache validates incrementally and detects model type changes automatically

Dataloaders

All dataloaders that previously used DiskCache now use SmartDiskCache through DataLoaderText2ImageMixin._cache_modules(). The mixin passes modeltype, source_path_in_name, and sourceless to the SmartDiskCache constructor.

When 'sourceless_training' and 'latent_caching' are both enabled, '_create_dataset()' short-circuits to '[cache_modules, output_modules]', skipping file enumeration, loading, augmentation, and preparation modules entirely.

Interruptible Caching

Pressing "Stop Training" during caching now finishes the current file, saves the cache index, and stops gracefully. The next run picks up where it left off.

GenericTrainer

'__clear_cache()' now prints a message explaining that SmartCache makes clearing unnecessary. The wipe logic is preserved (deletes image/, text/, and epoch-* dirs) but the default is off.

Closes #280
Closes #109
Closes #1357

Replaces DiskCache with SmartDiskCache in all dataloaders, adds sourceless_training config field with UI toggle, adds Clean Cache button with preview dialog, updates clear_cache_before_training default to False, and adds xxhash to requirements.

SmartCache validates incrementally and detects model type changes automatically, so the old warning about disabling cache clearing is no longer accurate.

SmartDiskCache import was placed after CollectPaths/DecodeVAE instead of in alphabetical order after SingleAspectCalculation.

Text encoder training requires re-tokenizing prompts from source files, which are not available in sourceless mode. Raise a clear error at dataset creation time rather than failing mid-training.

- Fix source_path_in_name: prompt_path -> image_path for text cache - Add stop_check_fun to SmartDiskCache for interruptible caching - Catch CachingStoppedException in trainer epoch loop - Closes Nerogar#109

… 1088282 Closes Nerogar#109

- Text cache now validates against sample_prompt_path instead of image_path - Clean button disabled while training is running to prevent concurrent access

Upstream mgds SmartCache added f65c2de 'Add fast validation to skip expensive per-file cache checks', replacing the 20+ min full stat walk with a directory-mtime + sampled spot-check path that returns in under a second on unchanged datasets.

Upstream mgds SmartCache now caches validated source filepaths in a per-process set and short-circuits start-of-epoch validation when every required path is already in that set. Before, even with the fast-validate path available, each epoch still re-stat'd the dataset. After, only the first epoch validates; every epoch after that returns immediately.

Pulls in the SmartDiskCache change that backfills missing split/aggregate names (e.g. 'latent_mask') into existing .pt files when settings like masked_training are toggled, instead of crashing downstream readers with KeyError. Old caches keep working without a full rebuild.

Replaces the previous 905efb2 augment-in-place with invalidate-and- rebuild. The augment path could write latent_mask at a shape that didn't match the cached latent_image (mask_augmentation modules added by enabling masked_training change crop_resolution), causing collate_fn to fail with 'stack expects each tensor to be equal size' on the first batch. Rebuilding the affected entries fresh produces all keys in one upstream pass so shapes stay consistent. The new mgds also auto-detects caches stamped by the prior augment code (via SCHEMA_METHOD marker) and rebuilds them on the next run.

…augment) Reverts the pin to mgds 51b3f19 (rebuild-on-schema-drift) which was a non-starter on big caches -- 100k entries means an unacceptable full VAE re-encode. Switches to mgds bfb3544 which keeps the augment-in- place strategy but fixes the shape-mismatch bug at source: per cached entry, augmented values are forced onto the spatial shape of the already-cached latent_image (bilinear interpolation when upstream returns a divergent crop_resolution). Existing caches whose latent_mask was written shape-divergently by the previous augment get re-augmented automatically via the bumped SCHEMA_METHOD marker.

dxqb · 2026-05-15T12:54:24Z

does it close #1357 ?

BitcrushedHeart · 2026-05-15T21:46:22Z

does it close #1357 ?

Yes, it hashes the file, checks if that hash exists, and then skips, so a caption of 'dog' could match 1,000,000 images or 1 with a single .pt file.

…hing) Wires the new mgds SmartCache features into the text caching path: - content_key_in_name='prompt' on the text SmartDiskCache: identical caption lines are encoded once and reused across variations, files and concepts. Editing one line of a multi-line caption re-encodes only that line; re-ordering lines is free; a bulk edit appending the same line to every caption file encodes it once for the whole dataset. - Z-Image: trim_padding + batch_collector on EncodeQwenText and a matching text cache build worker pool when latent_caching is enabled. Captions no longer pay for a full 512-token forward, and up to 8 captions share one forward (one weight-stream under layer offload). OT_TEXT_CACHE_BATCH=1 restores serial bs=1 encoding.

…am, HunyuanVideo) mgds side: the encoder batch collector moved to mgds.TextEncoderBatching and EncodeMistralText/EncodeLlamaText gained batch_collector support, with per-item retry when a batched forward fails so one bad caption cannot poison its batchmates. OneTrainer wiring (text_encode_batch_size shared in the mixin, env knob OT_TEXT_CACHE_BATCH, everything gated on latent_caching): - Qwen-Image: trim_padding + batch collector + build workers. Same proof as Z-Image - the pipeline prunes masked hidden-state rows before caching, and crop_start composes with trim (head slice vs tail skip). - Flux2 (24B Mistral dev / Qwen3 klein), HiDream (8B Llama), HunyuanVideo (Llama): batch collector + build workers, no trim - these pipelines cache full padded hidden states, so padded rows must remain encoder outputs. - HiDream/HunyuanVideo CLIP and T5 encoders get apply_thread_safe_forward so the widened build pool can drive them from multiple threads (transformers#42673); their Llama forwards serialize through the collector.

BitcrushedHeart added 5 commits April 6, 2026 07:21

Update clear_cache tooltip for SmartCache

f42347c

SmartCache validates incrementally and detects model type changes automatically, so the old warning about disabling cache clearing is no longer accurate.

Fix import sorting for SmartDiskCache (ruff I001)

da0bead

SmartDiskCache import was placed after CollectPaths/DecodeVAE instead of in alphabetical order after SingleAspectCalculation.

Block sourceless training with text encoder training

7ae38dd

Text encoder training requires re-tokenizing prompts from source files, which are not available in sourceless mode. Raise a clear error at dataset creation time rather than failing mid-training.

Pin mgds to SmartCache commit with sourceless concept metadata fix

97fc9ca

BitcrushedHeart force-pushed the SmartCache branch from 81c650c to 97fc9ca Compare April 6, 2026 10:16

BitcrushedHeart added 10 commits April 6, 2026 12:00

Fix text cache key, interruptible caching, pin mgds to 35ee9c5

b13d50a

- Fix source_path_in_name: prompt_path -> image_path for text cache - Add stop_check_fun to SmartDiskCache for interruptible caching - Catch CachingStoppedException in trainer epoch loop - Closes Nerogar#109

Interruptible caching, text cache fix, perf improvements, pin mgds to…

97b3982

… 1088282 Closes Nerogar#109

Fix text cache source validation, disable Clean button during training

f6c8ec4

- Text cache now validates against sample_prompt_path instead of image_path - Clean button disabled while training is running to prevent concurrent access

Pin mgds to BitcrushedHeart/mgds SmartCache branch (675fb2f)

15323ae

Bump mgds to f65c2de (fast cache validation)

eac588d

Upstream mgds SmartCache added f65c2de 'Add fast validation to skip expensive per-file cache checks', replacing the 20+ min full stat walk with a directory-mtime + sampled spot-check path that returns in under a second on unchanged datasets.

Bump mgds to c71379a (ruff cleanup in SmartDiskCache)

6150d59

dxqb self-requested a review May 10, 2026 17:06

dxqb mentioned this pull request May 15, 2026

[Bug]: Single file text encoding #1357

Open

dxqb linked an issue May 15, 2026 that may be closed by this pull request

[Bug]: Single file text encoding #1357

Open

BitcrushedHeart added 2 commits June 11, 2026 22:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Integrate SmartDiskCache for hash-based persistent caching#1411

Integrate SmartDiskCache for hash-based persistent caching#1411
BitcrushedHeart wants to merge 17 commits into
Nerogar:masterfrom
BitcrushedHeart:SmartCache

BitcrushedHeart commented Apr 6, 2026 •

edited

Loading

Uh oh!

dxqb commented May 15, 2026

Uh oh!

BitcrushedHeart commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

BitcrushedHeart commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

SmartDiskCache Integration

What This Is

What Changed

Config

UI

Dataloaders

Interruptible Caching

GenericTrainer

Uh oh!

dxqb commented May 15, 2026

Uh oh!

BitcrushedHeart commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BitcrushedHeart commented Apr 6, 2026 •

edited

Loading