This document provides a comprehensive guide to setting up and using the Whisper AI model for transcribing audio files. Follow these steps to install all necessary components, set up your environment, and transcribe audio files with ease.
Make sure you have the following installed on your computer:
- Python 3.7 or later: Download from Python's official website https://www.python.org/downloads/.
- Git: Download and install from Git's official website https://git-scm.com/downloads.
- FFmpeg:
-
Download FFmpeg from the official website: https://ffmpeg.org/download.html.
-
Select "Windows builds from gyan.dev" under the Windows section, and download the
ffmpeg-git-full.7zfile. -
Extract the contents of the downloaded
.7zfile using 7-Zip or WinRAR. -
Copy the extracted
binfolder (which containsffmpeg.exe,ffplay.exe, andffprobe.exe) toC:\ffmpeg. -
Add FFmpeg to your system PATH:
- Open "Environment Variables" by searching for it in the Windows Start menu.
- Select "Edit the system environment variables" and click "Environment Variables."
- Find the "Path" variable under "System variables," select it, and click "Edit."
- Click "New" and add
C:\ffmpeg\bin. Click "OK" to apply the changes.
-
Verify the installation by opening a Command Prompt and running:
ffmpeg -version
Open Command Prompt or PowerShell, and create and navigate to your project folder:
cd "C:\Users\user(change this)\Desktop\Whisper AI Model\Whisper"
python -m venv whisper-env
whisper-env\Scripts\activate
You should now see (whisper-env) in your terminal, indicating that the environment is active.
python -m pip install --upgrade pip
pip install git+https://github.com/openai/whisper.git
If you don't have a GPU:
pip install torch
If you have a GPU, visit PyTorch's installation page and follow the instructions to install the appropriate CUDA version.
To confirm Whisper is installed correctly, run:
whisper --help
If you see usage instructions, the installation was successful.
Now the setup is completed, now you can transcribe your audio files to a text. There are several methods for this transcribing process.
You can run the Whisper AI model directly from the command line.
-
Activate your virtual environment:
whisper-env\Scripts\activate -
Use the following command to transcribe an audio file:
whisper "path_to_your_audio_file.mp3" --model small
Replace "path_to_your_audio_file.mp3" with the path to your actual audio file.
You can change the model small to other sizes like base, medium, or large depending on your needs. You can try and see the different models.
Furthermore, as the path you have mentioned, the output files will be json, srt, tsv, txt, and vtt.
You can transcribe audio files directly using the Python script.
-
Open Command Prompt or PowerShell.
-
Activate your virtual environment:
whisper-env\Scripts\activate -
Run your Python script:
python transcribe.py
Follow the prompts to input the path to your audio file and specify the output folder.
the python script I use:
transcribe.py
import whisper
import os
def transcribe_audio(file_path, output_folder="C:\\transcriptions", output_formats=["txt"]):
# Load the Whisper model (change "small" to "medium" or "large" if needed)
model = whisper.load_model("small")
# Transcribe the audio file
result = model.transcribe(file_path)
# Extract the base name of the audio file (without extension)
base_name = os.path.splitext(os.path.basename(file_path))[0]
# Create a subfolder named after the base name of the audio file
subfolder_path = os.path.join(output_folder, base_name)
os.makedirs(subfolder_path, exist_ok=True)
# Save transcription in the selected formats within the subfolder
if "txt" in output_formats:
txt_path = os.path.join(subfolder_path, f"{base_name}.txt")
with open(txt_path, "w", encoding="utf-8") as file:
file.write(result["text"])
print(f"Transcription saved to {txt_path}")
if "json" in output_formats:
json_path = os.path.join(subfolder_path, f"{base_name}.json")
with open(json_path, "w", encoding="utf-8") as file:
import json
json.dump(result, file, ensure_ascii=False, indent=4)
print(f"Transcription saved to {json_path}")
if "srt" in output_formats:
srt_path = os.path.join(subfolder_path, f"{base_name}.srt")
with open(srt_path, "w", encoding="utf-8") as file:
file.write(whisper.utils.srt(result["segments"]))
print(f"Transcription saved to {srt_path}")
if "tsv" in output_formats:
tsv_path = os.path.join(subfolder_path, f"{base_name}.tsv")
with open(tsv_path, "w", encoding="utf-8") as file:
file.write(whisper.utils.tsv(result["segments"]))
print(f"Transcription saved to {tsv_path}")
if "vtt" in output_formats:
vtt_path = os.path.join(subfolder_path, f"{base_name}.vtt")
with open(vtt_path, "w", encoding="utf-8") as file:
file.write(whisper.utils.vtt(result["segments"]))
print(f"Transcription saved to {vtt_path}")
if __name__ == "__main__":
# Ask the user for the path of the audio file
audio_file = input("Enter the path to your audio file (e.g., C:\\path\\to\\file.mp3): ").strip()
# Make sure the file exists
if not os.path.exists(audio_file):
print(f"Error: The file '{audio_file}' does not exist. Please check the path and try again.")
else:
# Ask the user for the output folder if they want to customize it
output_folder = input("Enter the path to your desired output folder (default is C:\\transcriptions): ").strip()
if not output_folder:
output_folder = "C:\\transcriptions" # Set default output folder if not specified
# Ask the user for the output formats they want
formats_input = input("Enter the desired output formats separated by commas (e.g., txt,json,srt,tsv,vtt): ").strip().lower()
if not formats_input:
output_formats = ["txt"] # Default format
else:
output_formats = [fmt.strip() for fmt in formats_input.split(",") if fmt.strip() in ["txt", "json", "srt", "tsv", "vtt"]]
if not output_formats:
print("Invalid output formats specified. Using default 'txt' format.")
output_formats = ["txt"]
transcribe_audio(audio_file, output_folder, output_formats)You can create a batch file (transcribe.bat) to run the script without needing to activate the virtual environment manually each time.
-
Open a text editor and create the following batch script:
@echo off echo Activating virtual environment... call whisper-env\Scripts\activate echo Running the transcription script... python batch_transcribe.py pause Save this file as transcribe.bat in your Whisper directory.
``
- Save this file as
transcribe.batin your Whisper directory. - Double-click on transcribe.bat.
- It will activate your virtual environment and run the batch_transcribe.py script.
- Follow the prompts to transcribe your audio file.