Skip to content

feat(examples): kws_mfcc SpeechCommands MFCC parity demo (Stage 3 PR-A)#255

Merged
LeoBuron merged 9 commits into
developfrom
examples-kws-mfcc
Jun 26, 2026
Merged

feat(examples): kws_mfcc SpeechCommands MFCC parity demo (Stage 3 PR-A)#255
LeoBuron merged 9 commits into
developfrom
examples-kws-mfcc

Conversation

@LeoBuron

Copy link
Copy Markdown
Member

Adds examples/kws_mfcc/ — SpeechCommands keyword-spotting (MFCC features → 1D-CNN) demonstrating exact PyTorch↔C bit-parity. First of the two Stage-3 KWS demos; establishes the shared examples/_shared/speechcommands_data.py loader (reused by PR-B kws_raw).

Gate: BIT_PARITY=1 C int32 predictions are bit-identical to PyTorch (2483/2483, 6-class). First example to prove AdaptiveAvgPool1d(1) exact vs PyTorch.

Model: Conv1d(40→32,K3,SAME) → ReLU → MaxPool(2) → Conv1d(32→64,K3,SAME) → ReLU → MaxPool(2) → AdaptiveAvgPool1d(1) → Flatten → Linear(64→C) → Softmax.

Class-count knob: KWS_CLASSES (default 6). CI runs 6-class only (yes/no/up/down + synthetic silence + unknown, balanced); 35-class is local-only (compiled by the build-all rot-guard).

Note — torchaudio 2.11 / torchcodec: 2.11 (maintenance mode) routes dataset decode through torchcodec (needs system FFmpeg). The loader uses the spec-blessed fallback: get_metadata + stdlib wave reader (int16/32768 — byte-identical), so no torchcodec/FFmpeg dependency. The loader is path-based to bound peak RAM (~1.4 GB for 6-class, within the runner's 7 GB).

CI: shared raw-download cache + per-example processed-.npy cache (gates prepare on cache-hit); int32-exact diff. 6-class only.

🤖 Generated with Claude Code

LeoBuron added 9 commits June 26, 2026 18:11
torchaudio 2.11 (maintenance mode) routes its dataset decode through
torchcodec (needs a system FFmpeg), so iterating SPEECHCOMMANDS raised
ImportError at prepare time. Switch to ds.get_metadata (no decode) + a
stdlib `wave` reader (int16 PCM / 32768) — the fallback the spec blessed,
byte-identical output, no torchcodec/FFmpeg/scipy dependency.

Also make the loader path-based: collect paths per label, then decode only
the clips a split keeps (all 4 keywords + the sampled "unknown"). Peak RAM
for 6-class drops ~5.8 GB -> ~1.4 GB, fitting the 7 GB CI runner.
The stdlib wave reader interprets frames as int16/32768; a non-mono or non-16-bit
clip would be silently misdecoded. Assert the format (the corpus is uniformly
16 kHz mono 16-bit, so this never trips — it guards a future corpus swap). Flagged
by the PR-A final review as the one silent-decode path.
@LeoBuron LeoBuron merged commit 0c51088 into develop Jun 26, 2026
8 checks passed
@LeoBuron LeoBuron deleted the examples-kws-mfcc branch June 26, 2026 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant