Whisper AI Model Setup and Transcription Guide

This document provides a comprehensive guide to setting up and using the Whisper AI model for transcribing audio files. Follow these steps to install all necessary components, set up your environment, and transcribe audio files with ease.

Step 1: Setting Up Your Environment

1.1 Install Prerequisites

Make sure you have the following installed on your computer:

Python 3.7 or later: Download from Python's official website https://www.python.org/downloads/.
Git: Download and install from Git's official website https://git-scm.com/downloads.
FFmpeg:

How to Install FFmpeg

Download FFmpeg from the official website: https://ffmpeg.org/download.html.
Select "Windows builds from gyan.dev" under the Windows section, and download the ffmpeg-git-full.7z file.
Extract the contents of the downloaded .7z file using 7-Zip or WinRAR.
Copy the extracted bin folder (which contains ffmpeg.exe, ffplay.exe, and ffprobe.exe) to C:\ffmpeg.
Add FFmpeg to your system PATH:
- Open "Environment Variables" by searching for it in the Windows Start menu.
- Select "Edit the system environment variables" and click "Environment Variables."
- Find the "Path" variable under "System variables," select it, and click "Edit."
- Click "New" and add C:\ffmpeg\bin. Click "OK" to apply the changes.
Verify the installation by opening a Command Prompt and running:
```
ffmpeg -version
```

1.2 Create a Project Directory

Open Command Prompt or PowerShell, and create and navigate to your project folder:

cd "C:\Users\user(change this)\Desktop\Whisper AI Model\Whisper"

Step 2: Set Up a Python Virtual Environment

2.1 Create a virtual environment (this isolates your project's dependencies):

python -m venv whisper-env

2.2 Activate the virtual environment:

whisper-env\Scripts\activate

You should now see (whisper-env) in your terminal, indicating that the environment is active.

Step 3: Install Whisper and Dependencies

3.1 Upgrade pip to ensure you have the latest version:

python -m pip install --upgrade pip

3.2 Install the Whisper model:

pip install git+https://github.com/openai/whisper.git

3.3 Install PyTorch, which is required by Whisper:

If you don't have a GPU:

pip install torch

If you have a GPU, visit PyTorch's installation page and follow the instructions to install the appropriate CUDA version.

Step 4: Verify Installation

To confirm Whisper is installed correctly, run:

whisper --help

If you see usage instructions, the installation was successful.

Step 5

Now the setup is completed, now you can transcribe your audio files to a text. There are several methods for this transcribing process.

Option 1: Using Whisper Directly from the Command Line

You can run the Whisper AI model directly from the command line.

Activate your virtual environment:
```
 whisper-env\Scripts\activate
```

Use the following command to transcribe an audio file:

 whisper "path_to_your_audio_file.mp3" --model small

Replace "path_to_your_audio_file.mp3" with the path to your actual audio file.

You can change the model small to other sizes like base, medium, or large depending on your needs. You can try and see the different models.

Furthermore, as the path you have mentioned, the output files will be json, srt, tsv, txt, and vtt.

Option 2: Using Python Script

You can transcribe audio files directly using the Python script.

Open Command Prompt or PowerShell.
Activate your virtual environment:
```
 whisper-env\Scripts\activate
```
Run your Python script:
```
 python transcribe.py
```

Follow the prompts to input the path to your audio file and specify the output folder.

the python script I use: transcribe.py

import whisper
import os

def transcribe_audio(file_path, output_folder="C:\\transcriptions", output_formats=["txt"]):
    # Load the Whisper model (change "small" to "medium" or "large" if needed)
    model = whisper.load_model("small")

    # Transcribe the audio file
    result = model.transcribe(file_path)

    # Extract the base name of the audio file (without extension)
    base_name = os.path.splitext(os.path.basename(file_path))[0]

    # Create a subfolder named after the base name of the audio file
    subfolder_path = os.path.join(output_folder, base_name)
    os.makedirs(subfolder_path, exist_ok=True)

    # Save transcription in the selected formats within the subfolder
    if "txt" in output_formats:
        txt_path = os.path.join(subfolder_path, f"{base_name}.txt")
        with open(txt_path, "w", encoding="utf-8") as file:
            file.write(result["text"])
        print(f"Transcription saved to {txt_path}")

    if "json" in output_formats:
        json_path = os.path.join(subfolder_path, f"{base_name}.json")
        with open(json_path, "w", encoding="utf-8") as file:
            import json
            json.dump(result, file, ensure_ascii=False, indent=4)
        print(f"Transcription saved to {json_path}")

    if "srt" in output_formats:
        srt_path = os.path.join(subfolder_path, f"{base_name}.srt")
        with open(srt_path, "w", encoding="utf-8") as file:
            file.write(whisper.utils.srt(result["segments"]))
        print(f"Transcription saved to {srt_path}")

    if "tsv" in output_formats:
        tsv_path = os.path.join(subfolder_path, f"{base_name}.tsv")
        with open(tsv_path, "w", encoding="utf-8") as file:
            file.write(whisper.utils.tsv(result["segments"]))
        print(f"Transcription saved to {tsv_path}")

    if "vtt" in output_formats:
        vtt_path = os.path.join(subfolder_path, f"{base_name}.vtt")
        with open(vtt_path, "w", encoding="utf-8") as file:
            file.write(whisper.utils.vtt(result["segments"]))
        print(f"Transcription saved to {vtt_path}")

if __name__ == "__main__":
    # Ask the user for the path of the audio file
    audio_file = input("Enter the path to your audio file (e.g., C:\\path\\to\\file.mp3): ").strip()
    
    # Make sure the file exists
    if not os.path.exists(audio_file):
        print(f"Error: The file '{audio_file}' does not exist. Please check the path and try again.")
    else:
        # Ask the user for the output folder if they want to customize it
        output_folder = input("Enter the path to your desired output folder (default is C:\\transcriptions): ").strip()
        if not output_folder:
            output_folder = "C:\\transcriptions"  # Set default output folder if not specified
        
        # Ask the user for the output formats they want
        formats_input = input("Enter the desired output formats separated by commas (e.g., txt,json,srt,tsv,vtt): ").strip().lower()
        if not formats_input:
            output_formats = ["txt"]  # Default format
        else:
            output_formats = [fmt.strip() for fmt in formats_input.split(",") if fmt.strip() in ["txt", "json", "srt", "tsv", "vtt"]]
        
        if not output_formats:
            print("Invalid output formats specified. Using default 'txt' format.")
            output_formats = ["txt"]

        transcribe_audio(audio_file, output_folder, output_formats)

Option 3 : Creating a Batch File for Easier Use

You can create a batch file (transcribe.bat) to run the script without needing to activate the virtual environment manually each time.

Open a text editor and create the following batch script:

@echo off
echo Activating virtual environment...
call whisper-env\Scripts\activate

echo Running the transcription script...
python batch_transcribe.py

pause
Save this file as transcribe.bat in your Whisper directory.

``

Save this file as transcribe.bat in your Whisper directory.
Double-click on transcribe.bat.
It will activate your virtual environment and run the batch_transcribe.py script.
Follow the prompts to transcribe your audio file.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Others		Others
Whisper/whisper-env		Whisper/whisper-env
README.md		README.md
Whisper_AI_Model_Setup_Guide.txt		Whisper_AI_Model_Setup_Guide.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whisper AI Model Setup and Transcription Guide

Step 1: Setting Up Your Environment

1.1 Install Prerequisites

How to Install FFmpeg

1.2 Create a Project Directory

Step 2: Set Up a Python Virtual Environment

2.1 Create a virtual environment (this isolates your project's dependencies):

2.2 Activate the virtual environment:

Step 3: Install Whisper and Dependencies

3.1 Upgrade pip to ensure you have the latest version:

3.2 Install the Whisper model:

3.3 Install PyTorch, which is required by Whisper:

Step 4: Verify Installation

Step 5

Option 1: Using Whisper Directly from the Command Line

Option 2: Using Python Script

Option 3 : Creating a Batch File for Easier Use

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Whisper AI Model Setup and Transcription Guide

Step 1: Setting Up Your Environment

1.1 Install Prerequisites

How to Install FFmpeg

1.2 Create a Project Directory

Step 2: Set Up a Python Virtual Environment

2.1 Create a virtual environment (this isolates your project's dependencies):

2.2 Activate the virtual environment:

Step 3: Install Whisper and Dependencies

3.1 Upgrade pip to ensure you have the latest version:

3.2 Install the Whisper model:

3.3 Install PyTorch, which is required by Whisper:

Step 4: Verify Installation

Step 5

Option 1: Using Whisper Directly from the Command Line

Option 2: Using Python Script

Option 3 : Creating a Batch File for Easier Use

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages