Streaming transcriber with whisper. Enough machine power is needed to transcribe in real time.
pip install -U git+https://github.com/shirayu/whispering.git@v0.6.0
# If you use GPU, install proper torch and torchaudio
# Check https://pytorch.org/get-started/locally/
# Example : torch for CUDA 11.6
pip install -U torch torchaudio --extra-index-url https://download.pytorch.org/whl/cu116If you get OSError: PortAudio library not found in Linux, install "PortAudio".
sudo apt -y install portaudio19-dev# Run in English
whispering --language en --model tiny--helpshows full options--modelsets the model name to use. Larger models will be more accurate, but may not be able to transcribe in real time.--languagesets the language to transcribe. The list of languages are shown withwhispering -h--no-progressdisables the progress message-tsets temperatures to decode. You can set several like-t 0.0 -t 0.1 -t 0.5, but too many temperatures exhaust decoding time--debugoutputs logs for debug--vadsets VAD (Voice Activity Detection) threshold. The default is0.5. 0 disables VAD and forces whisper to analyze non-voice activity sound period--outputsets output file (Default: Standard output)
By default, whispering performs VAD for every 3.75 second.
This interval is determined by the value of -n and its default is 20.
When an interval is predicted as "silence", it will not be passed to whisper.
If you want to disable VAD, please make VAD threshold 0 by adding --vad 0.
By default, Whisper does not perform analysis until the total length of the segments determined by VAD to have speech exceeds 30 seconds.
However, if silence segments appear 16 times (the default value of --max_nospeech_skip) after speech is detected, the analysis is performed.
⚠ No security mechanism. Please make secure with your responsibility.
Run with --host and --port.
whispering --language en --model tiny --host 0.0.0.0 --port 8000whispering --host ADDRESS_OF_HOST --port 8000 --mode clientYou can set -n and other options.
- MIT License
- Some codes are ported from the original whisper. Its license is also MIT License