Classify audio clips into categories using time-domain and frequency-domain features. Supports single-label and multi-label classification, segment-level annotation, file-based batch inference, model persistence, and active learning.
pip install -e ".[dev]"Or just install dependencies directly:
pip install -r requirements.txtEach audio clip is converted into a 41-dimensional feature vector combining:
| Domain | Features |
|---|---|
| Time | RMS energy (mean + std), zero-crossing rate (mean + std), amplitude max + mean |
| Frequency | Spectral centroid (mean + std), bandwidth, rolloff, contrast |
| Mel / MFCCs | 128-band mel spectrogram summary, 13 MFCCs (mean + std each) |
| Chroma | Pitch-class profile mean + std |
A RandomForestClassifier is trained on these vectors. For multi-label audio
(e.g. speech overlapping with noise) OneVsRestClassifier wraps the forest.
from audio_classifier import AnnotationStore, AudioAnnotation
store = AnnotationStore(file_path="recording.wav")
store.add(AudioAnnotation(start=0.0, end=2.5, labels=["silence"]))
store.add(AudioAnnotation(start=2.5, end=7.0, labels=["speech"]))
store.add(AudioAnnotation(start=6.0, end=9.0, labels=["speech", "noise"], # overlap
confidence="medium", environment="outdoor"))
store.add(AudioAnnotation(start=9.0, end=12.0, labels=["engine"]))
store.save("recording.json")from audio_classifier import AudioClassificationPipeline, AnnotationStore
stores = [AnnotationStore.load(p) for p in ["recording.json", "recording2.json"]]
pipeline = AudioClassificationPipeline(
categories=["silence", "speech", "noise", "engine", "alarm"],
segment_duration=2.0, # classify in 2-second windows
sr=22050,
n_jobs=-1, # use all CPU cores
)
pipeline.fit_from_annotations(stores)
pipeline.save("model.pkl")from audio_classifier import AudioClassificationPipeline
pipeline = AudioClassificationPipeline.load("model.pkl")
for result in pipeline.classify_file("new_recording.wav"):
print(f"{result.start:.1f}–{result.end:.1f}s "
f"{result.labels} "
f"(confidence: {result.top_score:.2f})")0.0–2.0s ['silence'] (confidence: 0.97)
2.0–4.0s ['speech'] (confidence: 0.88)
4.0–6.0s ['speech'] (confidence: 0.76)
6.0–8.0s ['noise'] (confidence: 0.65)
8.0–10.0s ['engine'] (confidence: 0.91)
from audio_classifier import AnnotationStore
test_stores = [AnnotationStore.load(p) for p in ["test1.json", "test2.json"]]
report = pipeline.evaluate(test_stores)
print(report.summary())Accuracy : 0.923
Macro F1 : 0.918
Wt'd F1 : 0.921
Label Prec Rec F1 n
------------------------------------------------
silence 1.000 1.000 1.000 10
speech 0.900 0.900 0.900 10
noise 0.909 1.000 0.952 10
engine 0.900 0.900 0.900 10
alarm 0.889 0.800 0.842 10
Confusion matrix (rows=actual, cols=predicted):
silence speech noise engine alarm
silence 10 0 0 0 0
speech 0 9 1 0 0
...
from audio_classifier import FeatureExtractor
extractor = FeatureExtractor(sr=22050)
unlabeled_features = [extractor.from_file(p) for p in unlabeled_files]
uncertain = pipeline.uncertain_samples(unlabeled_features, threshold=0.6)
for idx, labels, scores in uncertain:
top = max(scores, key=scores.__getitem__)
print(f"File {idx}: predicted={labels[0]!r} confidence={scores[top]:.3f} → review me")# Train from annotation JSON files
python -m audio_classifier train \
annotations/train_*.json \
--model model.pkl \
--sr 22050
# Classify audio files
python -m audio_classifier classify audio/*.wav \
--model model.pkl
# Show only uncertain predictions
python -m audio_classifier classify audio/*.wav \
--model model.pkl \
--uncertain-only
# Output as JSON
python -m audio_classifier classify audio/recording.wav \
--model model.pkl \
--json > results.json
# Evaluate on test annotations
python -m audio_classifier evaluate \
annotations/test_*.json \
--model model.pkl{
"file_path": "recordings/session_01.wav",
"sample_rate": 22050,
"annotations": [
{
"start": 0.0,
"end": 2.5,
"labels": ["silence"],
"confidence": "high",
"environment": "indoor",
"notes": ""
},
{
"start": 2.5,
"end": 7.0,
"labels": ["speech", "noise"],
"confidence": "medium",
"environment": "outdoor",
"notes": "crowd noise in background"
}
]
}python example.pyGenerates synthetic audio (silence, tones, noise, engine, speech-like), annotates, trains, classifies, evaluates, and identifies uncertain samples — no real audio needed.
pytest tests/ -vaudio_classifier/
├── features.py # FeatureExtractor — time + frequency → AudioFeatures vector
├── annotator.py # AudioAnnotation + AnnotationStore — annotation data model
├── classifier.py # AudioClassifier — RandomForest, single/multi-label, save/load
├── pipeline.py # AudioClassificationPipeline — fit, classify, evaluate, save/load
├── evaluate.py # EvaluationReport, evaluate(), evaluate_pipeline()
└── __main__.py # CLI (train / classify / evaluate)
tests/
├── conftest.py # Synthetic signal generators (no real audio files)
├── test_features.py # Feature extraction tests
├── test_annotator.py # Annotation model tests
├── test_classifier.py # Classifier training/prediction tests
├── test_pipeline.py # End-to-end pipeline tests
└── test_evaluate.py # Evaluation metric tests