Skip to content

AshuJoshi/Auralis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project Auralis

Auralis is a speaker identification system that uses voice biometrics to identify speakers in audio files. It is particularly well-suited for scenarios where speakers are known and recur, such as earnings calls, meetings, or podcasts.

The system works by generating a unique "voiceprint" (a speaker embedding) for each person and storing it in a reference database. When given a new audio clip, Auralis compares the voice in the clip to the database to find a match.

This project is fully containerized using Docker, making it easy to set up and run on any system.

How it Works

The core of Auralis is a deep learning model (speechbrain/spkrec-ecapa-voxceleb) that has been trained to extract the unique characteristics of a person's voice. The process is as follows:

  1. Audio Processing: Raw audio files are sliced into smaller, labeled clips for each speaker.
  2. Embedding Generation: A speaker embedding (a vector of numbers) is generated for each clip. These embeddings, along with an average embedding for each speaker, are stored in a JSON database.
  3. Speaker Matching: To identify a speaker in a new audio clip, an embedding is generated for the clip and compared against the average embeddings in the database using cosine similarity. The speaker with the highest similarity score is identified as the match.

Getting Started

Prerequisites

  • Docker installed and running on your system.
  • A Hugging Face account and an access token.

1. Hugging Face Authentication

This project requires downloading a pre-trained model from the Hugging Face Hub. The model used is speechbrain/spkrec-ecapa-voxceleb, which is a gated repository.

  1. Create a Hugging Face Account: If you don't have one, create an account at huggingface.co.
  2. Accept the Model's Terms: Visit the model's page at https://huggingface.co/speechbrain/spkrec-ecapa-voxceleb and accept the license agreement.
  3. Generate an Access Token: In your Hugging Face account settings, create an access token with "read" permissions.

This token will be passed to the Docker container as an environment variable.

2. Build the Docker Image

You can build the Docker image with or without GPU support.

For CPU:

docker build -t auralis-cpu -f docker/Dockerfile.cpu .

For GPU:

docker build -t auralis-gpu -f docker/Dockerfile.gpu .

3. Usage

For a complete, step-by-step walkthrough on how to process audio, generate embeddings, and test speaker matching, please refer to the EXAMPLES.md file.

This guide will walk you through the entire workflow, from raw audio to speaker identification, with copy-paste-friendly commands.

Compatibility

This project has been tested on an Intel-based Mac (macOS). The auralis-cpu Docker image and all scripts have been confirmed to work in this environment.

The Dockerfile.gpu is provided for users with NVIDIA GPUs, but it has not been tested.

Project Structure

. Auralis/
├── docker/               # Dockerfiles for CPU and GPU environments
│   ├── Dockerfile.cpu
│   ├── Dockerfile.gpu
│   └── requirements.txt
├── src/                  # Python source code
│   ├── process_audio.py
│   ├── generate_embeddings.py
│   └── test_matching.py
├── data/                 # Data directory (ignored by git)
│   ├── raw_audio/        # Place your raw audio files here
│   ├── processed_audio/  # Processed clips will be saved here
│   └── test_audio/       # Place audio files for testing here
├── .gitignore
├── README.md             # This file
├── EXAMPLES.md           # Step-by-step usage examples
└── LICENSE               # MIT License

Next Steps

This proof-of-value release provides a solid foundation. Future enhancements could include:

  • A more robust database for storing embeddings.
  • A user interface for easier interaction.
  • Real-time transcription and speaker identification.

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages