Skip to content

anote-ai/audio-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Audio Classification

Classify audio clips into categories using time-domain and frequency-domain features. Supports single-label and multi-label classification, segment-level annotation, file-based batch inference, model persistence, and active learning.

Install

pip install -e ".[dev]"

Or just install dependencies directly:

pip install -r requirements.txt

How it works

Each audio clip is converted into a 41-dimensional feature vector combining:

Domain Features
Time RMS energy (mean + std), zero-crossing rate (mean + std), amplitude max + mean
Frequency Spectral centroid (mean + std), bandwidth, rolloff, contrast
Mel / MFCCs 128-band mel spectrogram summary, 13 MFCCs (mean + std each)
Chroma Pitch-class profile mean + std

A RandomForestClassifier is trained on these vectors. For multi-label audio (e.g. speech overlapping with noise) OneVsRestClassifier wraps the forest.


Quick start (Python API)

1 – Annotate

from audio_classifier import AnnotationStore, AudioAnnotation

store = AnnotationStore(file_path="recording.wav")
store.add(AudioAnnotation(start=0.0,  end=2.5,  labels=["silence"]))
store.add(AudioAnnotation(start=2.5,  end=7.0,  labels=["speech"]))
store.add(AudioAnnotation(start=6.0,  end=9.0,  labels=["speech", "noise"],  # overlap
                          confidence="medium", environment="outdoor"))
store.add(AudioAnnotation(start=9.0,  end=12.0, labels=["engine"]))
store.save("recording.json")

2 – Train

from audio_classifier import AudioClassificationPipeline, AnnotationStore

stores = [AnnotationStore.load(p) for p in ["recording.json", "recording2.json"]]

pipeline = AudioClassificationPipeline(
    categories=["silence", "speech", "noise", "engine", "alarm"],
    segment_duration=2.0,   # classify in 2-second windows
    sr=22050,
    n_jobs=-1,              # use all CPU cores
)
pipeline.fit_from_annotations(stores)
pipeline.save("model.pkl")

3 – Classify a new file

from audio_classifier import AudioClassificationPipeline

pipeline = AudioClassificationPipeline.load("model.pkl")

for result in pipeline.classify_file("new_recording.wav"):
    print(f"{result.start:.1f}{result.end:.1f}s  "
          f"{result.labels}  "
          f"(confidence: {result.top_score:.2f})")
0.0–2.0s   ['silence']  (confidence: 0.97)
2.0–4.0s   ['speech']   (confidence: 0.88)
4.0–6.0s   ['speech']   (confidence: 0.76)
6.0–8.0s   ['noise']    (confidence: 0.65)
8.0–10.0s  ['engine']   (confidence: 0.91)

4 – Evaluate

from audio_classifier import AnnotationStore

test_stores = [AnnotationStore.load(p) for p in ["test1.json", "test2.json"]]
report = pipeline.evaluate(test_stores)
print(report.summary())
Accuracy : 0.923
Macro F1 : 0.918
Wt'd  F1 : 0.921

Label                  Prec    Rec     F1      n
------------------------------------------------
silence               1.000  1.000  1.000     10
speech                0.900  0.900  0.900     10
noise                 0.909  1.000  0.952     10
engine                0.900  0.900  0.900     10
alarm                 0.889  0.800  0.842     10

Confusion matrix (rows=actual, cols=predicted):
                     silence  speech   noise  engine   alarm
silence                   10       0       0       0       0
speech                     0       9       1       0       0
...

5 – Active learning (prioritise uncertain samples)

from audio_classifier import FeatureExtractor

extractor = FeatureExtractor(sr=22050)
unlabeled_features = [extractor.from_file(p) for p in unlabeled_files]

uncertain = pipeline.uncertain_samples(unlabeled_features, threshold=0.6)
for idx, labels, scores in uncertain:
    top = max(scores, key=scores.__getitem__)
    print(f"File {idx}: predicted={labels[0]!r}  confidence={scores[top]:.3f}  → review me")

Command-line interface

# Train from annotation JSON files
python -m audio_classifier train \
    annotations/train_*.json \
    --model model.pkl \
    --sr 22050

# Classify audio files
python -m audio_classifier classify audio/*.wav \
    --model model.pkl

# Show only uncertain predictions
python -m audio_classifier classify audio/*.wav \
    --model model.pkl \
    --uncertain-only

# Output as JSON
python -m audio_classifier classify audio/recording.wav \
    --model model.pkl \
    --json > results.json

# Evaluate on test annotations
python -m audio_classifier evaluate \
    annotations/test_*.json \
    --model model.pkl

Annotation format (JSON)

{
  "file_path": "recordings/session_01.wav",
  "sample_rate": 22050,
  "annotations": [
    {
      "start": 0.0,
      "end": 2.5,
      "labels": ["silence"],
      "confidence": "high",
      "environment": "indoor",
      "notes": ""
    },
    {
      "start": 2.5,
      "end": 7.0,
      "labels": ["speech", "noise"],
      "confidence": "medium",
      "environment": "outdoor",
      "notes": "crowd noise in background"
    }
  ]
}

Run the demo

python example.py

Generates synthetic audio (silence, tones, noise, engine, speech-like), annotates, trains, classifies, evaluates, and identifies uncertain samples — no real audio needed.


Run tests

pytest tests/ -v

Project layout

audio_classifier/
├── features.py     # FeatureExtractor  — time + frequency → AudioFeatures vector
├── annotator.py    # AudioAnnotation + AnnotationStore  — annotation data model
├── classifier.py   # AudioClassifier  — RandomForest, single/multi-label, save/load
├── pipeline.py     # AudioClassificationPipeline  — fit, classify, evaluate, save/load
├── evaluate.py     # EvaluationReport, evaluate(), evaluate_pipeline()
└── __main__.py     # CLI (train / classify / evaluate)

tests/
├── conftest.py          # Synthetic signal generators (no real audio files)
├── test_features.py     # Feature extraction tests
├── test_annotator.py    # Annotation model tests
├── test_classifier.py   # Classifier training/prediction tests
├── test_pipeline.py     # End-to-end pipeline tests
└── test_evaluate.py     # Evaluation metric tests

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages