Fix RNNT ONNX decoding memory leaks#73
Conversation
|
Thank you for the report! We tried to reproduce the issue in our environment, but so far it does not reproduce on our side. Could you please share a bit more detail about your setup?
|
|
Sure, here are the details from our setup. Library versions: Jinja2==3.1.6 ONNXRuntime mode: Approximate amount of data processed: Typical audio lengths: Observed memory leak type / size: |
|
Thanks for the details. We ran a similar soak and saw RSS move from ~4.5 to ~4.7 GB; with .copy() in _split_state the difference was small and hard to separate from restart noise. Also, since training used segments up to ~30 s, long raw audio at inference isn’t really supported - could you clarify how 10-20 min files were passed in your backend (single clip vs segmented)? |
What changed
max_symbols_per_stepfrom config with default10.SpecScalerno-mutation behavior.Why
The RNNT split helpers returned views. Keeping those views in
dec_statecould retain full[L, B, H]LSTM buffers for each sample and cause memory growth on long audioor larger batches. The ONNX decoder also had a hard-coded per-frame symbol limit of
3, while the torch decoder defaults to10, which could produce different transcriptson the same weights.