A real-time AI meeting co-pilot powered by Gemini Live API
"The smartest voice in the room is always yours."
You are in a meeting. Someone asks a question you should know the answer to. You do not know it. You cannot Google it without looking distracted.
MeetMind sits in a browser tab while you attend meetings. Whisper a question, get an instant answer through your earphones. Nobody knows.
- Real-time voice AI via Gemini Live API
- Private audio responses through your speakers only
- Screen awareness - share your screen and ask what you see
- Text input for when you cannot speak out loud
- Voice Activity Detection - AI stops the moment you speak
- Live conversation transcript
https://meetmind-671715875630.us-central1.run.app
1. Landing Page - the homepage before starting a session.

2. Live Session Active - Connected status and Listening... with animated wave bars.

3. Voice Transcripts Working - user speech and agent response in real time.

4. Screen Share in Action - MeetMind describing your screen when asked.

5. Typing a Question - silent text input with instant AI response.

6. Screen Awareness - asking what is visible and getting a grounded answer.

+-------------------------------------------------------------+
| Browser |
| |
| Microphone --> Web Audio API --> PCM 16kHz chunks |
| Screen --> ImageCapture --> JPEG frames |
| Text Input --> WebSocket Client |
| | |
| Audio Output <-- AudioContext (24kHz scheduled queue) |
| Transcript <-- WebSocket Messages |
+--------------------------|----------------------------------+
| WebSocket
| audio / text / screen
|
+--------------------------|----------------------------------+
| FastAPI Server |
| |
| upstream_task browser messages --> LiveRequestQueue |
| downstream_task Gemini events --> WebSocket |
| |
| asyncio.gather(upstream, downstream) |
+--------------------------|----------------------------------+
| Google ADK / StreamingMode.BIDI
|
+--------------------------|----------------------------------+
| Gemini Live API |
| |
| Input: audio/pcm 16kHz + image/jpeg + text |
| Output: audio/pcm 24kHz + transcriptions |
+-------------------------------------------------------------+
| Layer | Technology |
|---|---|
| Frontend | Vanilla JS, Web Audio API, Canvas |
| Backend | Python, FastAPI, asyncio WebSockets |
| AI | Google Gemini 2.0 Flash Live |
| Agent SDK | Google ADK, LiveRequestQueue |
| Audio In | PCM 16kHz, ScriptProcessorNode |
| Audio Out | PCM 24kHz, AudioBufferSourceNode |
meetmind/
├── README.md
├── screenshots/
│
└── app/
├── main.py # FastAPI entry point, route definitions
├── requirements.txt
├── Dockerfile
├── pytest.ini
│
├── core/
│ ├── config.py # env vars, audio and screen constants
│ ├── session.py # ADK runner, session creation, RunConfig
│ └── pipeline.py # upstream_task, downstream_task
│
├── meetmind_agent/
│ ├── agent.py # Gemini Live agent, system prompt
│ └── __init__.py
│
├── static/
│ └── index.html # full frontend -- UI, audio, WebSocket
│
└── tests/
└── test_meetmind.py # pytest suite
- Python 3.11 or higher
- Google API key with Gemini Live access
- Chrome browser
git clone https://github.com/areychana/meetmind.git
cd meetmind/app
python -m venv .venv
.venv\Scripts\activate # Windows
# source .venv/bin/activate # macOS / Linux
pip install -r requirements.txt
cp .env.example .env
# add your GOOGLE_API_KEY to .env
uvicorn main:app --reloadOpen http://127.0.0.1:8000 in Chrome.
pip install pytest pytest-asyncio
pytest tests/ -vA ScriptProcessorNode monitors audio energy (RMS) in real time. When energy spikes across 3 consecutive frames, the AI is interrupted immediately, ensuring your voice always takes priority.
Incoming 24kHz PCM chunks from Gemini are decoded and queued sequentially using AudioBufferSourceNode.start(when) with a rolling nextStartTime cursor. This prevents gaps and overlaps, delivering seamless playback.
The FastAPI backend runs two concurrent asyncio tasks per WebSocket connection:
upstream_task- Consumes browser messages (audio blobs, text input, screen frames) and pushes toLiveRequestQueuedownstream_task- Iterates throughrunner.run_live()and forwards audio chunks and transcription events back to the browser
Screen frames are captured every 5 seconds at 1280x720 resolution (JPEG quality 0.4) via ImageCapture.grabFrame() and sent as realtime blobs. The agent references screen content only when you explicitly ask.
MIT
Built for the Gemini Live Agent Hackathon by areychana, 2026.