Skip to content

dinalmeecle/Whisper-AI-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Whisper AI Model Setup and Transcription Guide

This document provides a comprehensive guide to setting up and using the Whisper AI model for transcribing audio files. Follow these steps to install all necessary components, set up your environment, and transcribe audio files with ease.


Step 1: Setting Up Your Environment

1.1 Install Prerequisites

Make sure you have the following installed on your computer:

How to Install FFmpeg

  1. Download FFmpeg from the official website: https://ffmpeg.org/download.html.

  2. Select "Windows builds from gyan.dev" under the Windows section, and download the ffmpeg-git-full.7z file.

  3. Extract the contents of the downloaded .7z file using 7-Zip or WinRAR.

  4. Copy the extracted bin folder (which contains ffmpeg.exe, ffplay.exe, and ffprobe.exe) to C:\ffmpeg.

  5. Add FFmpeg to your system PATH:

    • Open "Environment Variables" by searching for it in the Windows Start menu.
    • Select "Edit the system environment variables" and click "Environment Variables."
    • Find the "Path" variable under "System variables," select it, and click "Edit."
    • Click "New" and add C:\ffmpeg\bin. Click "OK" to apply the changes.
  6. Verify the installation by opening a Command Prompt and running:

    ffmpeg -version
    

1.2 Create a Project Directory

Open Command Prompt or PowerShell, and create and navigate to your project folder:

cd "C:\Users\user(change this)\Desktop\Whisper AI Model\Whisper"

Step 2: Set Up a Python Virtual Environment

2.1 Create a virtual environment (this isolates your project's dependencies):

python -m venv whisper-env

2.2 Activate the virtual environment:

whisper-env\Scripts\activate

You should now see (whisper-env) in your terminal, indicating that the environment is active.

Step 3: Install Whisper and Dependencies

3.1 Upgrade pip to ensure you have the latest version:

python -m pip install --upgrade pip

3.2 Install the Whisper model:

pip install git+https://github.com/openai/whisper.git 

3.3 Install PyTorch, which is required by Whisper:

If you don't have a GPU:

pip install torch

If you have a GPU, visit PyTorch's installation page and follow the instructions to install the appropriate CUDA version.

Step 4: Verify Installation

To confirm Whisper is installed correctly, run:

whisper --help

If you see usage instructions, the installation was successful.

Step 5

Now the setup is completed, now you can transcribe your audio files to a text. There are several methods for this transcribing process.

Option 1: Using Whisper Directly from the Command Line

You can run the Whisper AI model directly from the command line.

  1. Activate your virtual environment:

     whisper-env\Scripts\activate
    
  2. Use the following command to transcribe an audio file:

     whisper "path_to_your_audio_file.mp3" --model small
    

Replace "path_to_your_audio_file.mp3" with the path to your actual audio file.

You can change the model small to other sizes like base, medium, or large depending on your needs. You can try and see the different models.

Furthermore, as the path you have mentioned, the output files will be json, srt, tsv, txt, and vtt.

Option 2: Using Python Script

You can transcribe audio files directly using the Python script.

  1. Open Command Prompt or PowerShell.

  2. Activate your virtual environment:

     whisper-env\Scripts\activate
    
  3. Run your Python script:

     python transcribe.py
    

Follow the prompts to input the path to your audio file and specify the output folder.

the python script I use: transcribe.py

import whisper
import os

def transcribe_audio(file_path, output_folder="C:\\transcriptions", output_formats=["txt"]):
    # Load the Whisper model (change "small" to "medium" or "large" if needed)
    model = whisper.load_model("small")

    # Transcribe the audio file
    result = model.transcribe(file_path)

    # Extract the base name of the audio file (without extension)
    base_name = os.path.splitext(os.path.basename(file_path))[0]

    # Create a subfolder named after the base name of the audio file
    subfolder_path = os.path.join(output_folder, base_name)
    os.makedirs(subfolder_path, exist_ok=True)

    # Save transcription in the selected formats within the subfolder
    if "txt" in output_formats:
        txt_path = os.path.join(subfolder_path, f"{base_name}.txt")
        with open(txt_path, "w", encoding="utf-8") as file:
            file.write(result["text"])
        print(f"Transcription saved to {txt_path}")

    if "json" in output_formats:
        json_path = os.path.join(subfolder_path, f"{base_name}.json")
        with open(json_path, "w", encoding="utf-8") as file:
            import json
            json.dump(result, file, ensure_ascii=False, indent=4)
        print(f"Transcription saved to {json_path}")

    if "srt" in output_formats:
        srt_path = os.path.join(subfolder_path, f"{base_name}.srt")
        with open(srt_path, "w", encoding="utf-8") as file:
            file.write(whisper.utils.srt(result["segments"]))
        print(f"Transcription saved to {srt_path}")

    if "tsv" in output_formats:
        tsv_path = os.path.join(subfolder_path, f"{base_name}.tsv")
        with open(tsv_path, "w", encoding="utf-8") as file:
            file.write(whisper.utils.tsv(result["segments"]))
        print(f"Transcription saved to {tsv_path}")

    if "vtt" in output_formats:
        vtt_path = os.path.join(subfolder_path, f"{base_name}.vtt")
        with open(vtt_path, "w", encoding="utf-8") as file:
            file.write(whisper.utils.vtt(result["segments"]))
        print(f"Transcription saved to {vtt_path}")

if __name__ == "__main__":
    # Ask the user for the path of the audio file
    audio_file = input("Enter the path to your audio file (e.g., C:\\path\\to\\file.mp3): ").strip()
    
    # Make sure the file exists
    if not os.path.exists(audio_file):
        print(f"Error: The file '{audio_file}' does not exist. Please check the path and try again.")
    else:
        # Ask the user for the output folder if they want to customize it
        output_folder = input("Enter the path to your desired output folder (default is C:\\transcriptions): ").strip()
        if not output_folder:
            output_folder = "C:\\transcriptions"  # Set default output folder if not specified
        
        # Ask the user for the output formats they want
        formats_input = input("Enter the desired output formats separated by commas (e.g., txt,json,srt,tsv,vtt): ").strip().lower()
        if not formats_input:
            output_formats = ["txt"]  # Default format
        else:
            output_formats = [fmt.strip() for fmt in formats_input.split(",") if fmt.strip() in ["txt", "json", "srt", "tsv", "vtt"]]
        
        if not output_formats:
            print("Invalid output formats specified. Using default 'txt' format.")
            output_formats = ["txt"]

        transcribe_audio(audio_file, output_folder, output_formats)

Option 3 : Creating a Batch File for Easier Use

You can create a batch file (transcribe.bat) to run the script without needing to activate the virtual environment manually each time.

  1. Open a text editor and create the following batch script:

    @echo off
    echo Activating virtual environment...
    call whisper-env\Scripts\activate
    
    echo Running the transcription script...
    python batch_transcribe.py
    
    pause
    Save this file as transcribe.bat in your Whisper directory.

``

  1. Save this file as transcribe.bat in your Whisper directory.
  2. Double-click on transcribe.bat.
  3. It will activate your virtual environment and run the batch_transcribe.py script.
  4. Follow the prompts to transcribe your audio file.

About

This is a straihgt forward method to transcribe your audio recording in the dekstop.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages