A professional VS Code extension that converts your voice to text using Deepgram AI, enhances it with Google Gemini, and provides a standalone text improvement tool.
Installation β’ Features β’ Usage β’ Documentation β’ Contributing
| ποΈ Voice Recording | π€ AI Enhancement | π Text Enhancer |
|---|---|---|
| Capture audio with native tools or browser APIs | Improve transcription with Gemini AI | Dedicated tool for text improvement |
| π Tabbed Interface | π Transcription History | β¨οΈ Keyboard Shortcuts |
|---|---|---|
| Separate tabs for different features | Auto-save with Edit, Copy, Insert, Delete | Ctrl+Shift+Space to toggle panel |
- Voice Recording: Capture audio directly with native tools or browser APIs
- Speech-to-Text: Convert speech to text using Deepgram's Nova-2 model
- Gemini AI Toggle: Opt-in to AI enhancement for speech-to-text (disabled by default)
- AI Enhancement: Improve transcription with Gemini (punctuation, formatting)
- Tabbed Interface: Separate tabs for Speech to Text and Text Enhancer
- Transcription History: Automatically save results with support for Edit, Copy, Insert, and Delete
- AI Text Enhancer: Dedicated tool to improve any text's punctuation, clarity, and tone
- Linux Native: Uses
arecordorparecordfor reliable mic access on Linux - Keyboard Shortcut: Press
Ctrl+Shift+Space(orCmd+Shift+Space) to toggle the panel - Fork Compatible: Optimized for VS Code forks like Antigravity, Cursor, and Windsurf
- Editor Integration: Automatically insert text at cursor position
- Clipboard Support: Text is also copied to clipboard
Before using Speechy Go, you need:
- Sign up at Deepgram Console
- Create an API key with "Usage" permissions
- Get one at Google AI Studio
# Download the .vsix file from releases
# In VS Code: Extensions β ... menu β "Install from VSIX..."
# Select the downloaded fileOpen VS Code Settings (Ctrl+, or Cmd+,) and search for "Speechy Go":
| Setting | Description | Required |
|---|---|---|
speechygo.deepgramApiKey |
Your Deepgram API key | β Yes |
speechygo.geminiApiKey |
Your Gemini API key | β No |
speechygo.enableGemini |
Enable AI text enhancement | Default: true |
speechygo.geminiPrompt |
Custom prompt for Gemini | Has default |
- Press
Ctrl+Shift+Spaceto open the Speechy Go panel - Configure your API keys in the settings if you haven't already
Or
- Open Command Palette (
Ctrl+Shift+PorCmd+Shift+P) - Run "Speechy Go: Start Recording"
graph LR
A[π€ Start Recording] --> B[π£οΈ Speak]
B --> C[βΉοΈ Stop Recording]
C --> D[β‘ Processing]
D --> E[β¨ AI Enhancement]
E --> F[π Insert & Copy]
- Click "Start Recording" button in the panel (allows mic access on first use)
- Allow microphone access when prompted
- Speak clearly into your microphone
- Click "Stop Recording" when finished
- Wait for processing...
- Result: Your transcription is auto-inserted at the cursor, copied to clipboard, and saved to history
- AI Enhancement: Toggle the "β¨ Enable Gemini AI Enhancement" switch to auto-format results
- Switch to the Text Enhancer tab
- Paste or type any text (e.g., draft emails, code comments)
- Click "Enhance Text" to improve it with Gemini
- Copy or Insert the professional result back into your editor
- Node.js 18+
- npm
- Linux only:
alsa-utils(providesarecord) - usually pre-installed
# Clone the repository
git clone https://github.com/simoabid/SpeechyGo.git
cd SpeechyGo
# Install dependencies
npm install
# Compile TypeScript
npm run compile
# Package as VSIX
npm run package- Open the project in VS Code
- Press
F5to launch Extension Development Host - In the new window, run the command "SpeechyGO: Start Recording"
- Run
npm run compile && npm run package - Right-click the generated
.vsixfile and select "Install Extension VSIX" or In VS Code: Extensions β...menu β "Install from VSIX..." - Press
Ctrl+Shift+Spaceto run the extension
Extension Host (Node.js)
β Creates
Webview Panel (UI)
β User clicks Start
[Linux: arecord | Other: getUserMedia]
β Audio capture
Deepgram API
β Speech-to-Text
Extension Host
β Gemini API
Enhanced Text
β Insert + Copy
Editor + Clipboard
| Platform | Method | Tool |
|---|---|---|
| Linux π§ | Native system audio | arecord (ALSA) or parecord (PulseAudio) |
| macOS/Windows ππͺ | Browser API | getUserMedia + MediaRecorder |
| Setting | Description | Default |
|---|---|---|
speechygo.deepgramApiKey |
Your Deepgram API key | "" |
speechygo.geminiApiKey |
Your Gemini API key | "" |
speechygo.enableGemini |
Enable AI for speech recordings | false |
speechygo.geminiModel |
Gemini model (e.g., gemini-3-flash-preview) |
gemini-3-flash-preview |
speechygo.geminiPrompt |
STT post-processing prompt | (Punctuation) |
speechygo.enhancePrompt |
Standalone enhancement prompt | (Professional Editor) |
Extensions installed on forks are fully supported. We use Base64 icon embedding and robust messaging to ensure UI stability.
π€ "Microphone permission denied" (macOS/Windows)
- Click the reload button and allow microphone access when prompted
- Check your OS privacy settings for microphone access
π§ Linux: "No audio recording tool found"
- Install
alsa-utils:sudo apt install alsa-utils - Or install PulseAudio utils:
sudo apt install pulseaudio-utils - Linux Mic Issues: Ensure
alsa-utilsorpulseaudio-utilsis installed
π "Deepgram API key not configured"
- Go to VS Code Settings β search "speechygo" β enter your API key
β "Transcription failed"
- Check your internet connection
- Verify your Deepgram API key is valid
- Ensure you have API credits remaining
π No text inserted
- Make sure you have a file open and the cursor is positioned where you want text
- If no editor is open, text is still copied to clipboard
β οΈ Other issues
- Packaging Error: Use Node.js v20+ (or run
nvm use v24) to avoidReferenceError: File is not defined - Keyboard Shortcut: If
Ctrl+Shift+Spaceis taken, you can rebind it in VS Code Keyboard Shortcuts
| π€ Recording | π Storage | π API |
|---|---|---|
| Never auto-records | Secure VS Code config | Direct to Deepgram |
- Audio is never recorded without clicking "Start Recording" (explicit interaction is required)
- Audio is sent directly to Deepgram API (not stored locally)
- Audio is streamed to Deepgram for processing and not stored on any other 3rd party server
- API keys are stored securely in VS Code's internal configuration (not hardcoded)
Made with β€οΈ by ABID.Dev π²π¦
If you find this project useful, please consider giving it a β!