🎙️ Speechy Go

A professional VS Code extension that converts your voice to text using Deepgram AI, enhances it with Google Gemini, and provides a standalone text improvement tool.

Installation • Features • Usage • Documentation • Contributing

✨ Features

🎙️ Voice Recording	🤖 AI Enhancement	📝 Text Enhancer
Capture audio with native tools or browser APIs	Improve transcription with Gemini AI	Dedicated tool for text improvement

🔄 Tabbed Interface	📋 Transcription History	⌨️ Keyboard Shortcuts
Separate tabs for different features	Auto-save with Edit, Copy, Insert, Delete	`Ctrl+Shift+Space` to toggle panel

🚀 Core Capabilities

Voice Recording: Capture audio directly with native tools or browser APIs
Speech-to-Text: Convert speech to text using Deepgram's Nova-2 model
Gemini AI Toggle: Opt-in to AI enhancement for speech-to-text (disabled by default)
AI Enhancement: Improve transcription with Gemini (punctuation, formatting)
Tabbed Interface: Separate tabs for Speech to Text and Text Enhancer
Transcription History: Automatically save results with support for Edit, Copy, Insert, and Delete
AI Text Enhancer: Dedicated tool to improve any text's punctuation, clarity, and tone
Linux Native: Uses arecord or parecord for reliable mic access on Linux
Keyboard Shortcut: Press Ctrl+Shift+Space (or Cmd+Shift+Space) to toggle the panel
Fork Compatible: Optimized for VS Code forks like Antigravity, Cursor, and Windsurf
Editor Integration: Automatically insert text at cursor position
Clipboard Support: Text is also copied to clipboard

🛠️ Tech Stack

Powered By:

📋 Prerequisites

Before using Speechy Go, you need:

1. Deepgram API Key ✅ (Required)

Sign up at Deepgram Console
Create an API key with "Usage" permissions

2. Google Gemini API Key ⭐ (Optional, but recommended)

Get one at Google AI Studio

📦 Installation

From VSIX Package

# Download the .vsix file from releases
# In VS Code: Extensions → ... menu → "Install from VSIX..."
# Select the downloaded file

⚙️ Configuration

Open VS Code Settings (Ctrl+, or Cmd+,) and search for "Speechy Go":

Setting	Description	Required
`speechygo.deepgramApiKey`	Your Deepgram API key	✅ Yes
`speechygo.geminiApiKey`	Your Gemini API key	❌ No
`speechygo.enableGemini`	Enable AI text enhancement	Default: true
`speechygo.geminiPrompt`	Custom prompt for Gemini	Has default

🎯 Usage

Getting Started

Press Ctrl+Shift+Space to open the Speechy Go panel
Configure your API keys in the settings if you haven't already

Or

Open Command Palette (Ctrl+Shift+P or Cmd+Shift+P)
Run "Speechy Go: Start Recording"

🎙️ Speech to Text

graph LR
    A[🎤 Start Recording] --> B[🗣️ Speak]
    B --> C[⏹️ Stop Recording]
    C --> D[⚡ Processing]
    D --> E[✨ AI Enhancement]
    E --> F[📝 Insert & Copy]

Click "Start Recording" button in the panel (allows mic access on first use)
Allow microphone access when prompted
Speak clearly into your microphone
Click "Stop Recording" when finished
Wait for processing...
Result: Your transcription is auto-inserted at the cursor, copied to clipboard, and saved to history
AI Enhancement: Toggle the "✨ Enable Gemini AI Enhancement" switch to auto-format results

✨ Text Enhancer

Switch to the Text Enhancer tab
Paste or type any text (e.g., draft emails, code comments)
Click "Enhance Text" to improve it with Gemini
Copy or Insert the professional result back into your editor

🔧 Building from Source

Requirements

Node.js 18+
npm
Linux only: alsa-utils (provides arecord) - usually pre-installed

Steps

# Clone the repository
git clone https://github.com/simoabid/SpeechyGo.git
cd SpeechyGo

# Install dependencies
npm install

# Compile TypeScript
npm run compile

# Package as VSIX
npm run package

Development

Option 1: Debugging

Open the project in VS Code
Press F5 to launch Extension Development Host
In the new window, run the command "SpeechyGO: Start Recording"

Option 2: Manual VSIX Installation

Run npm run compile && npm run package
Right-click the generated .vsix file and select "Install Extension VSIX" or In VS Code: Extensions → ... menu → "Install from VSIX..."
Press Ctrl+Shift+Space to run the extension

🏗️ Architecture

Extension Host (Node.js)
    ↓ Creates
Webview Panel (UI)
    ↓ User clicks Start
[Linux: arecord | Other: getUserMedia]
    ↓ Audio capture
Deepgram API
    ↓ Speech-to-Text
Extension Host
    ↓ Gemini API
Enhanced Text
    ↓ Insert + Copy
Editor + Clipboard

Platform-Specific Recording

Platform	Method	Tool
Linux 🐧	Native system audio	`arecord` (ALSA) or `parecord` (PulseAudio)
macOS/Windows 🍎🪟	Browser API	`getUserMedia` + MediaRecorder

Configuration Options

Setting	Description	Default
`speechygo.deepgramApiKey`	Your Deepgram API key	`""`
`speechygo.geminiApiKey`	Your Gemini API key	`""`
`speechygo.enableGemini`	Enable AI for speech recordings	`false`
`speechygo.geminiModel`	Gemini model (e.g., `gemini-3-flash-preview`)	`gemini-3-flash-preview`
`speechygo.geminiPrompt`	STT post-processing prompt	(Punctuation)
`speechygo.enhancePrompt`	Standalone enhancement prompt	(Professional Editor)

For Forks (Antigravity/Cursor/Windsurf)

Extensions installed on forks are fully supported. We use Base64 icon embedding and robust messaging to ensure UI stability.

🐛 Troubleshooting

🎤 "Microphone permission denied" (macOS/Windows)

Click the reload button and allow microphone access when prompted
Check your OS privacy settings for microphone access

🐧 Linux: "No audio recording tool found"

Install alsa-utils: sudo apt install alsa-utils
Or install PulseAudio utils: sudo apt install pulseaudio-utils
Linux Mic Issues: Ensure alsa-utils or pulseaudio-utils is installed

🔑 "Deepgram API key not configured"

Go to VS Code Settings → search "speechygo" → enter your API key

❌ "Transcription failed"

Check your internet connection
Verify your Deepgram API key is valid
Ensure you have API credits remaining

📝 No text inserted

Make sure you have a file open and the cursor is positioned where you want text
If no editor is open, text is still copied to clipboard

⚠️ Other issues

Packaging Error: Use Node.js v20+ (or run nvm use v24) to avoid ReferenceError: File is not defined
Keyboard Shortcut: If Ctrl+Shift+Space is taken, you can rebind it in VS Code Keyboard Shortcuts

🔒 Privacy & Security

🎤 Recording	🔐 Storage	🌐 API
Never auto-records	Secure VS Code config	Direct to Deepgram

Audio is never recorded without clicking "Start Recording" (explicit interaction is required)
Audio is sent directly to Deepgram API (not stored locally)
Audio is streamed to Deepgram for processing and not stored on any other 3rd party server
API keys are stored securely in VS Code's internal configuration (not hardcoded)

📊 Repository Stats

📄 License

MIT License - Feel free to use this project for personal or commercial purposes.

🤝 Contributing

Contributions are welcome! Please open an issue or PR.

Made with ❤️ by ABID.Dev 🇲🇦

If you find this project useful, please consider giving it a ⭐!

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
src		src
.gitignore		.gitignore
.vscodeignore		.vscodeignore
LICENSE		LICENSE
README.md		README.md
icon.png		icon.png
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

🎙️ Speechy Go

✨ Features

🚀 Core Capabilities

🛠️ Tech Stack

📋 Prerequisites

1. Deepgram API Key ✅ (Required)

2. Google Gemini API Key ⭐ (Optional, but recommended)

📦 Installation

From VSIX Package

⚙️ Configuration

🎯 Usage

Getting Started

🎙️ Speech to Text

✨ Text Enhancer

🔧 Building from Source

Requirements

Steps

Development

Option 1: Debugging

Option 2: Manual VSIX Installation

🏗️ Architecture

Platform-Specific Recording

Configuration Options

For Forks (Antigravity/Cursor/Windsurf)

🐛 Troubleshooting

🔒 Privacy & Security

📊 Repository Stats

📄 License

🤝 Contributing

Made with ❤️ by ABID.Dev 🇲🇦

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages