Skip to content

simoabid/SpeechyGo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Speechy Go Logo

πŸŽ™οΈ Speechy Go

MIT License VS Code TypeScript Node.js

Deepgram Google Gemini PRs Welcome

Typing SVG

A professional VS Code extension that converts your voice to text using Deepgram AI, enhances it with Google Gemini, and provides a standalone text improvement tool.

Installation β€’ Features β€’ Usage β€’ Documentation β€’ Contributing


✨ Features

πŸŽ™οΈ Voice Recording πŸ€– AI Enhancement πŸ“ Text Enhancer
Capture audio with native tools or browser APIs Improve transcription with Gemini AI Dedicated tool for text improvement
πŸ”„ Tabbed Interface πŸ“‹ Transcription History ⌨️ Keyboard Shortcuts
Separate tabs for different features Auto-save with Edit, Copy, Insert, Delete Ctrl+Shift+Space to toggle panel

πŸš€ Core Capabilities

  • Voice Recording: Capture audio directly with native tools or browser APIs
  • Speech-to-Text: Convert speech to text using Deepgram's Nova-2 model
  • Gemini AI Toggle: Opt-in to AI enhancement for speech-to-text (disabled by default)
  • AI Enhancement: Improve transcription with Gemini (punctuation, formatting)
  • Tabbed Interface: Separate tabs for Speech to Text and Text Enhancer
  • Transcription History: Automatically save results with support for Edit, Copy, Insert, and Delete
  • AI Text Enhancer: Dedicated tool to improve any text's punctuation, clarity, and tone
  • Linux Native: Uses arecord or parecord for reliable mic access on Linux
  • Keyboard Shortcut: Press Ctrl+Shift+Space (or Cmd+Shift+Space) to toggle the panel
  • Fork Compatible: Optimized for VS Code forks like Antigravity, Cursor, and Windsurf
  • Editor Integration: Automatically insert text at cursor position
  • Clipboard Support: Text is also copied to clipboard

πŸ› οΈ Tech Stack

Tech Stack

Powered By:

Deepgram Google Gemini


πŸ“‹ Prerequisites

Before using Speechy Go, you need:

1. Deepgram API Key βœ… (Required)

2. Google Gemini API Key ⭐ (Optional, but recommended)


πŸ“¦ Installation

From VSIX Package

# Download the .vsix file from releases
# In VS Code: Extensions β†’ ... menu β†’ "Install from VSIX..."
# Select the downloaded file

Download VSIX


βš™οΈ Configuration

Open VS Code Settings (Ctrl+, or Cmd+,) and search for "Speechy Go":

Setting Description Required
speechygo.deepgramApiKey Your Deepgram API key βœ… Yes
speechygo.geminiApiKey Your Gemini API key ❌ No
speechygo.enableGemini Enable AI text enhancement Default: true
speechygo.geminiPrompt Custom prompt for Gemini Has default

🎯 Usage

Getting Started

  1. Press Ctrl+Shift+Space to open the Speechy Go panel
  2. Configure your API keys in the settings if you haven't already

Or

  1. Open Command Palette (Ctrl+Shift+P or Cmd+Shift+P)
  2. Run "Speechy Go: Start Recording"

πŸŽ™οΈ Speech to Text

graph LR
    A[🎀 Start Recording] --> B[πŸ—£οΈ Speak]
    B --> C[⏹️ Stop Recording]
    C --> D[⚑ Processing]
    D --> E[✨ AI Enhancement]
    E --> F[πŸ“ Insert & Copy]
Loading
  1. Click "Start Recording" button in the panel (allows mic access on first use)
  2. Allow microphone access when prompted
  3. Speak clearly into your microphone
  4. Click "Stop Recording" when finished
  5. Wait for processing...
  6. Result: Your transcription is auto-inserted at the cursor, copied to clipboard, and saved to history
  7. AI Enhancement: Toggle the "✨ Enable Gemini AI Enhancement" switch to auto-format results

✨ Text Enhancer

  1. Switch to the Text Enhancer tab
  2. Paste or type any text (e.g., draft emails, code comments)
  3. Click "Enhance Text" to improve it with Gemini
  4. Copy or Insert the professional result back into your editor

πŸ”§ Building from Source

Requirements

Node.js npm Linux

  • Node.js 18+
  • npm
  • Linux only: alsa-utils (provides arecord) - usually pre-installed

Steps

# Clone the repository
git clone https://github.com/simoabid/SpeechyGo.git
cd SpeechyGo

# Install dependencies
npm install

# Compile TypeScript
npm run compile

# Package as VSIX
npm run package

Development

Option 1: Debugging

  1. Open the project in VS Code
  2. Press F5 to launch Extension Development Host
  3. In the new window, run the command "SpeechyGO: Start Recording"

Option 2: Manual VSIX Installation

  1. Run npm run compile && npm run package
  2. Right-click the generated .vsix file and select "Install Extension VSIX" or In VS Code: Extensions β†’ ... menu β†’ "Install from VSIX..."
  3. Press Ctrl+Shift+Space to run the extension

πŸ—οΈ Architecture

Extension Host (Node.js)
    ↓ Creates
Webview Panel (UI)
    ↓ User clicks Start
[Linux: arecord | Other: getUserMedia]
    ↓ Audio capture
Deepgram API
    ↓ Speech-to-Text
Extension Host
    ↓ Gemini API
Enhanced Text
    ↓ Insert + Copy
Editor + Clipboard

Platform-Specific Recording

Platform Method Tool
Linux 🐧 Native system audio arecord (ALSA) or parecord (PulseAudio)
macOS/Windows 🍎πŸͺŸ Browser API getUserMedia + MediaRecorder

Configuration Options

Setting Description Default
speechygo.deepgramApiKey Your Deepgram API key ""
speechygo.geminiApiKey Your Gemini API key ""
speechygo.enableGemini Enable AI for speech recordings false
speechygo.geminiModel Gemini model (e.g., gemini-3-flash-preview) gemini-3-flash-preview
speechygo.geminiPrompt STT post-processing prompt (Punctuation)
speechygo.enhancePrompt Standalone enhancement prompt (Professional Editor)

For Forks (Antigravity/Cursor/Windsurf)

Extensions installed on forks are fully supported. We use Base64 icon embedding and robust messaging to ensure UI stability.


πŸ› Troubleshooting

🎀 "Microphone permission denied" (macOS/Windows)
  • Click the reload button and allow microphone access when prompted
  • Check your OS privacy settings for microphone access
🐧 Linux: "No audio recording tool found"
  • Install alsa-utils: sudo apt install alsa-utils
  • Or install PulseAudio utils: sudo apt install pulseaudio-utils
  • Linux Mic Issues: Ensure alsa-utils or pulseaudio-utils is installed
πŸ”‘ "Deepgram API key not configured"
  • Go to VS Code Settings β†’ search "speechygo" β†’ enter your API key
❌ "Transcription failed"
  • Check your internet connection
  • Verify your Deepgram API key is valid
  • Ensure you have API credits remaining
πŸ“ No text inserted
  • Make sure you have a file open and the cursor is positioned where you want text
  • If no editor is open, text is still copied to clipboard
⚠️ Other issues
  • Packaging Error: Use Node.js v20+ (or run nvm use v24) to avoid ReferenceError: File is not defined
  • Keyboard Shortcut: If Ctrl+Shift+Space is taken, you can rebind it in VS Code Keyboard Shortcuts

πŸ”’ Privacy & Security

🎀 Recording πŸ” Storage 🌐 API
Never auto-records Secure VS Code config Direct to Deepgram
  • Audio is never recorded without clicking "Start Recording" (explicit interaction is required)
  • Audio is sent directly to Deepgram API (not stored locally)
  • Audio is streamed to Deepgram for processing and not stored on any other 3rd party server
  • API keys are stored securely in VS Code's internal configuration (not hardcoded)

πŸ“Š Repository Stats

GitHub stars GitHub forks GitHub issues GitHub pull requests


πŸ“„ License

MIT License

MIT License - Feel free to use this project for personal or commercial purposes.


🀝 Contributing

PRs Welcome

Contributions are welcome! Please open an issue or PR.


Made with ❀️ by ABID.Dev πŸ‡²πŸ‡¦

If you find this project useful, please consider giving it a ⭐!

Star on GitHub

About

A professional VS Code extension that converts your voice to text using Deepgram AI, enhances it with Google Gemini, and provides a standalone text improvement tool.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors