Summary
When persistent_workers=True (the default for num_workers > 0), PyTorch DataLoader worker processes are kept alive across epochs. This causes two compounding issues:
- Stale file assignments: Workers capture a pickled snapshot of
ConfigArguments at spawn time and never receive updates from reconfigure(), so cross-epoch file shuffling and resharding have no effect.
- Skipped I/O: The local filesystem reader's in-memory cache (
self._local_cache) persists across epochs, so files read in epoch 1 are never re-read from storage — subsequent epochs measure memory bandwidth, not storage throughput.
Both issues stem from the same root cause: persistent workers retain process-level state that should be reset each epoch.
Root Cause
Stale file assignments
In dlio_benchmark/data_loader/torch_data_loader.py, the TorchDataset class serializes the config at construction:
class TorchDataset(Dataset):
def __init__(self, ...):
args = ConfigArguments.get_instance()
self.serial_args = pickle.dumps(args) # snapshot at construction
def worker_init(self, worker_id):
pickle.loads(self.serial_args) # restores stale snapshot
With persistent_workers=True, workers are spawned once and worker_init runs only at first spawn. The serial_args snapshot is never updated, so workers always see epoch 1's file_list_train, train_file_map, train_global_index_map, etc. — even after reconfigure() shuffles or reshards files in the main process.
persistent_workers=True is set unconditionally in three places in torch_data_loader.py (lines 558, 592, and 627):
if torch.__version__ != '1.3.1':
kwargs['persistent_workers'] = True
Skipped I/O (consequence of persistence)
In dlio_benchmark/reader/_local_fs_iterable_mixin.py, the _localfs_ensure_cached method checks whether a file is already in the worker's in-memory cache:
def _localfs_ensure_cached(self, filename: str) -> None:
if filename not in self._local_cache: # <-- skips read if cached
...
Without persistent workers, self._local_cache is reset each epoch (new process = empty dict). With persistent workers, the cache from epoch 1 persists, so every file is found in-cache and the storage read is skipped entirely.
Note: this cache issue is only relevant because persistent workers keep the process alive. Without persistent_workers=True, each epoch spawns fresh workers with an empty _local_cache.
Impact
| Epoch |
File assignment |
I/O behavior |
Measurement validity |
| 1 |
Correct ✓ |
Reads from storage ✓ |
Valid |
| 2+ |
Stale (epoch 1's files) ✗ |
Served from process memory ✗ |
Invalid — measures memory, not storage |
- Inflated AU: Epochs 2+ report near-perfect Accelerator Utilization because "reads" are dict lookups, not real I/O.
- No cross-epoch shuffling: Despite
file_shuffle: seed in the workload config, workers read the same files in the same order every epoch.
- Averaged metrics mislead: A run with 8 epochs measures storage for 1 epoch and memory for 7, producing an inflated average AU.
Suggested Fix
Remove persistent_workers=True so workers are re-spawned each epoch. They will pick up the current ConfigArguments (with reshuffled file lists) via serial_args at construction, and start with a fresh _local_cache:
# In all three DataLoader construction sites, remove:
# kwargs['persistent_workers'] = True
The worker spawn overhead (~2–5s for 4–8 workers) is negligible relative to typical epoch durations (minutes to hours for production-scale datasets). If spawn overhead is a concern, an alternative is to add a refresh_args() method that re-pickles the updated config before each epoch and explicitly clears any reader caches.
Summary
When
persistent_workers=True(the default fornum_workers > 0), PyTorch DataLoader worker processes are kept alive across epochs. This causes two compounding issues:ConfigArgumentsat spawn time and never receive updates fromreconfigure(), so cross-epoch file shuffling and resharding have no effect.self._local_cache) persists across epochs, so files read in epoch 1 are never re-read from storage — subsequent epochs measure memory bandwidth, not storage throughput.Both issues stem from the same root cause: persistent workers retain process-level state that should be reset each epoch.
Root Cause
Stale file assignments
In
dlio_benchmark/data_loader/torch_data_loader.py, theTorchDatasetclass serializes the config at construction:With
persistent_workers=True, workers are spawned once andworker_initruns only at first spawn. Theserial_argssnapshot is never updated, so workers always see epoch 1'sfile_list_train,train_file_map,train_global_index_map, etc. — even afterreconfigure()shuffles or reshards files in the main process.persistent_workers=Trueis set unconditionally in three places intorch_data_loader.py(lines 558, 592, and 627):Skipped I/O (consequence of persistence)
In
dlio_benchmark/reader/_local_fs_iterable_mixin.py, the_localfs_ensure_cachedmethod checks whether a file is already in the worker's in-memory cache:Without persistent workers,
self._local_cacheis reset each epoch (new process = empty dict). With persistent workers, the cache from epoch 1 persists, so every file is found in-cache and the storage read is skipped entirely.Note: this cache issue is only relevant because persistent workers keep the process alive. Without
persistent_workers=True, each epoch spawns fresh workers with an empty_local_cache.Impact
file_shuffle: seedin the workload config, workers read the same files in the same order every epoch.Suggested Fix
Remove
persistent_workers=Trueso workers are re-spawned each epoch. They will pick up the currentConfigArguments(with reshuffled file lists) viaserial_argsat construction, and start with a fresh_local_cache:The worker spawn overhead (~2–5s for 4–8 workers) is negligible relative to typical epoch durations (minutes to hours for production-scale datasets). If spawn overhead is a concern, an alternative is to add a
refresh_args()method that re-pickles the updated config before each epoch and explicitly clears any reader caches.