Audio Classification

Classify audio clips into categories using time-domain and frequency-domain features. Supports single-label and multi-label classification, segment-level annotation, file-based batch inference, model persistence, and active learning.

Install

pip install -e ".[dev]"

Or just install dependencies directly:

pip install -r requirements.txt

How it works

Each audio clip is converted into a 41-dimensional feature vector combining:

Domain	Features
Time	RMS energy (mean + std), zero-crossing rate (mean + std), amplitude max + mean
Frequency	Spectral centroid (mean + std), bandwidth, rolloff, contrast
Mel / MFCCs	128-band mel spectrogram summary, 13 MFCCs (mean + std each)
Chroma	Pitch-class profile mean + std

A RandomForestClassifier is trained on these vectors. For multi-label audio (e.g. speech overlapping with noise) OneVsRestClassifier wraps the forest.

Quick start (Python API)

1 – Annotate

from audio_classifier import AnnotationStore, AudioAnnotation

store = AnnotationStore(file_path="recording.wav")
store.add(AudioAnnotation(start=0.0,  end=2.5,  labels=["silence"]))
store.add(AudioAnnotation(start=2.5,  end=7.0,  labels=["speech"]))
store.add(AudioAnnotation(start=6.0,  end=9.0,  labels=["speech", "noise"],  # overlap
                          confidence="medium", environment="outdoor"))
store.add(AudioAnnotation(start=9.0,  end=12.0, labels=["engine"]))
store.save("recording.json")

2 – Train

from audio_classifier import AudioClassificationPipeline, AnnotationStore

stores = [AnnotationStore.load(p) for p in ["recording.json", "recording2.json"]]

pipeline = AudioClassificationPipeline(
    categories=["silence", "speech", "noise", "engine", "alarm"],
    segment_duration=2.0,   # classify in 2-second windows
    sr=22050,
    n_jobs=-1,              # use all CPU cores
)
pipeline.fit_from_annotations(stores)
pipeline.save("model.pkl")

3 – Classify a new file

from audio_classifier import AudioClassificationPipeline

pipeline = AudioClassificationPipeline.load("model.pkl")

for result in pipeline.classify_file("new_recording.wav"):
    print(f"{result.start:.1f}–{result.end:.1f}s  "
          f"{result.labels}  "
          f"(confidence: {result.top_score:.2f})")

0.0–2.0s   ['silence']  (confidence: 0.97)
2.0–4.0s   ['speech']   (confidence: 0.88)
4.0–6.0s   ['speech']   (confidence: 0.76)
6.0–8.0s   ['noise']    (confidence: 0.65)
8.0–10.0s  ['engine']   (confidence: 0.91)

4 – Evaluate

from audio_classifier import AnnotationStore

test_stores = [AnnotationStore.load(p) for p in ["test1.json", "test2.json"]]
report = pipeline.evaluate(test_stores)
print(report.summary())

Accuracy : 0.923
Macro F1 : 0.918
Wt'd  F1 : 0.921

Label                  Prec    Rec     F1      n
------------------------------------------------
silence               1.000  1.000  1.000     10
speech                0.900  0.900  0.900     10
noise                 0.909  1.000  0.952     10
engine                0.900  0.900  0.900     10
alarm                 0.889  0.800  0.842     10

Confusion matrix (rows=actual, cols=predicted):
                     silence  speech   noise  engine   alarm
silence                   10       0       0       0       0
speech                     0       9       1       0       0
...

5 – Active learning (prioritise uncertain samples)

from audio_classifier import FeatureExtractor

extractor = FeatureExtractor(sr=22050)
unlabeled_features = [extractor.from_file(p) for p in unlabeled_files]

uncertain = pipeline.uncertain_samples(unlabeled_features, threshold=0.6)
for idx, labels, scores in uncertain:
    top = max(scores, key=scores.__getitem__)
    print(f"File {idx}: predicted={labels[0]!r}  confidence={scores[top]:.3f}  → review me")

Command-line interface

# Train from annotation JSON files
python -m audio_classifier train \
    annotations/train_*.json \
    --model model.pkl \
    --sr 22050

# Classify audio files
python -m audio_classifier classify audio/*.wav \
    --model model.pkl

# Show only uncertain predictions
python -m audio_classifier classify audio/*.wav \
    --model model.pkl \
    --uncertain-only

# Output as JSON
python -m audio_classifier classify audio/recording.wav \
    --model model.pkl \
    --json > results.json

# Evaluate on test annotations
python -m audio_classifier evaluate \
    annotations/test_*.json \
    --model model.pkl

Annotation format (JSON)

{
  "file_path": "recordings/session_01.wav",
  "sample_rate": 22050,
  "annotations": [
    {
      "start": 0.0,
      "end": 2.5,
      "labels": ["silence"],
      "confidence": "high",
      "environment": "indoor",
      "notes": ""
    },
    {
      "start": 2.5,
      "end": 7.0,
      "labels": ["speech", "noise"],
      "confidence": "medium",
      "environment": "outdoor",
      "notes": "crowd noise in background"
    }
  ]
}

Run the demo

python example.py

Generates synthetic audio (silence, tones, noise, engine, speech-like), annotates, trains, classifies, evaluates, and identifies uncertain samples — no real audio needed.

Run tests

pytest tests/ -v

Project layout

audio_classifier/
├── features.py     # FeatureExtractor  — time + frequency → AudioFeatures vector
├── annotator.py    # AudioAnnotation + AnnotationStore  — annotation data model
├── classifier.py   # AudioClassifier  — RandomForest, single/multi-label, save/load
├── pipeline.py     # AudioClassificationPipeline  — fit, classify, evaluate, save/load
├── evaluate.py     # EvaluationReport, evaluate(), evaluate_pipeline()
└── __main__.py     # CLI (train / classify / evaluate)

tests/
├── conftest.py          # Synthetic signal generators (no real audio files)
├── test_features.py     # Feature extraction tests
├── test_annotator.py    # Annotation model tests
├── test_classifier.py   # Classifier training/prediction tests
├── test_pipeline.py     # End-to-end pipeline tests
└── test_evaluate.py     # Evaluation metric tests

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
audio_classifier		audio_classifier
tests		tests
.gitignore		.gitignore
README.md		README.md
example.py		example.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio Classification

Install

How it works

Quick start (Python API)

1 – Annotate

2 – Train

3 – Classify a new file

4 – Evaluate

5 – Active learning (prioritise uncertain samples)

Command-line interface

Annotation format (JSON)

Run the demo

Run tests

Project layout

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Audio Classification

Install

How it works

Quick start (Python API)

1 – Annotate

2 – Train

3 – Classify a new file

4 – Evaluate

5 – Active learning (prioritise uncertain samples)

Command-line interface

Annotation format (JSON)

Run the demo

Run tests

Project layout

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages