feat(examples): kws_mfcc SpeechCommands MFCC parity demo (Stage 3 PR-A) by LeoBuron · Pull Request #255 · es-ude/OnDeviceTraining

LeoBuron · 2026-06-26T17:21:05Z

Adds examples/kws_mfcc/ — SpeechCommands keyword-spotting (MFCC features → 1D-CNN) demonstrating exact PyTorch↔C bit-parity. First of the two Stage-3 KWS demos; establishes the shared examples/_shared/speechcommands_data.py loader (reused by PR-B kws_raw).

Gate: BIT_PARITY=1 C int32 predictions are bit-identical to PyTorch (2483/2483, 6-class). First example to prove AdaptiveAvgPool1d(1) exact vs PyTorch.

Model: Conv1d(40→32,K3,SAME) → ReLU → MaxPool(2) → Conv1d(32→64,K3,SAME) → ReLU → MaxPool(2) → AdaptiveAvgPool1d(1) → Flatten → Linear(64→C) → Softmax.

Class-count knob: KWS_CLASSES (default 6). CI runs 6-class only (yes/no/up/down + synthetic silence + unknown, balanced); 35-class is local-only (compiled by the build-all rot-guard).

Note — torchaudio 2.11 / torchcodec: 2.11 (maintenance mode) routes dataset decode through torchcodec (needs system FFmpeg). The loader uses the spec-blessed fallback: get_metadata + stdlib wave reader (int16/32768 — byte-identical), so no torchcodec/FFmpeg dependency. The loader is path-based to bound peak RAM (~1.4 GB for 6-class, within the runner's 7 GB).

CI: shared raw-download cache + per-example processed-.npy cache (gates prepare on cache-hit); int32-exact diff. 6-class only.

🤖 Generated with Claude Code

torchaudio 2.11 (maintenance mode) routes its dataset decode through torchcodec (needs a system FFmpeg), so iterating SPEECHCOMMANDS raised ImportError at prepare time. Switch to ds.get_metadata (no decode) + a stdlib `wave` reader (int16 PCM / 32768) — the fallback the spec blessed, byte-identical output, no torchcodec/FFmpeg/scipy dependency. Also make the loader path-based: collect paths per label, then decode only the clips a split keeps (all 4 keywords + the sampled "unknown"). Peak RAM for 6-class drops ~5.8 GB -> ~1.4 GB, fitting the 7 GB CI runner.

The stdlib wave reader interprets frames as int16/32768; a non-mono or non-16-bit clip would be silently misdecoded. Assert the format (the corpus is uniformly 16 kHz mono 16-bit, so this never trips — it guards a future corpus swap). Flagged by the PR-A final review as the one silent-decode path.

LeoBuron added 9 commits June 26, 2026 18:11

feat(examples/_shared): add SpeechCommands loader + torchaudio dep

864f803

feat(examples/kws_mfcc): prepare_data writes MFCC train/val/test npy

eed67e3

feat(examples/kws_mfcc): PyTorch reference MFCC CNN + weight export

75539a3

feat(examples/kws_mfcc): factory-API MFCC CNN trainer + CMake wiring

ec81f62

feat(examples/kws_mfcc): informational compare.py + plots

9065e94

docs(examples/kws_mfcc): add README

9ffd3ee

ci(examples): wire kws_mfcc into the bit-parity job

deadcbe

LeoBuron merged commit 0c51088 into develop Jun 26, 2026
8 checks passed

LeoBuron deleted the examples-kws-mfcc branch June 26, 2026 17:28

LeoBuron mentioned this pull request Jun 26, 2026

feat(examples): kws_raw raw-waveform + in-model downsample parity demo (Stage 3 PR-B) #256

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(examples): kws_mfcc SpeechCommands MFCC parity demo (Stage 3 PR-A)#255

feat(examples): kws_mfcc SpeechCommands MFCC parity demo (Stage 3 PR-A)#255
LeoBuron merged 9 commits into
developfrom
examples-kws-mfcc

LeoBuron commented Jun 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

LeoBuron commented Jun 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant