MeetMind

A real-time AI meeting co-pilot powered by Gemini Live API

"The smartest voice in the room is always yours."

What is this?

You are in a meeting. Someone asks a question you should know the answer to. You do not know it. You cannot Google it without looking distracted.

MeetMind sits in a browser tab while you attend meetings. Whisper a question, get an instant answer through your earphones. Nobody knows.

Features

Real-time voice AI via Gemini Live API
Private audio responses through your speakers only
Screen awareness - share your screen and ask what you see
Text input for when you cannot speak out loud
Voice Activity Detection - AI stops the moment you speak
Live conversation transcript

Live demo:

https://meetmind-671715875630.us-central1.run.app

Screenshots

1. Landing Page - the homepage before starting a session.

2. Live Session Active - Connected status and Listening... with animated wave bars.

3. Voice Transcripts Working - user speech and agent response in real time.

4. Screen Share in Action - MeetMind describing your screen when asked.

5. Typing a Question - silent text input with instant AI response.

6. Screen Awareness - asking what is visible and getting a grounded answer.

Architecture

+-------------------------------------------------------------+
|                        Browser                              |
|                                                             |
|   Microphone  -->  Web Audio API  -->  PCM 16kHz chunks     |
|   Screen      -->  ImageCapture   -->  JPEG frames          |
|   Text Input  -->  WebSocket Client                         |
|                          |                                  |
|   Audio Output  <--  AudioContext (24kHz scheduled queue)   |
|   Transcript    <--  WebSocket Messages                     |
+--------------------------|----------------------------------+
                           |  WebSocket
                           |  audio / text / screen
                           |
+--------------------------|----------------------------------+
|                     FastAPI Server                         |
|                                                             |
|   upstream_task    browser messages --> LiveRequestQueue    |
|   downstream_task  Gemini events    --> WebSocket           |
|                                                             |
|   asyncio.gather(upstream, downstream)                      |
+--------------------------|----------------------------------+
                           |  Google ADK / StreamingMode.BIDI
                           |
+--------------------------|----------------------------------+
|                   Gemini Live API                          |
|                                                             |
|   Input:  audio/pcm 16kHz + image/jpeg + text              |
|   Output: audio/pcm 24kHz + transcriptions                 |
+-------------------------------------------------------------+

Tech Stack

Layer	Technology
Frontend	Vanilla JS, Web Audio API, Canvas
Backend	Python, FastAPI, asyncio WebSockets
AI	Google Gemini 2.0 Flash Live
Agent SDK	Google ADK, LiveRequestQueue
Audio In	PCM 16kHz, ScriptProcessorNode
Audio Out	PCM 24kHz, AudioBufferSourceNode

Project Structure

meetmind/
├── README.md
├── screenshots/
│
└── app/
    ├── main.py                  # FastAPI entry point, route definitions
    ├── requirements.txt
    ├── Dockerfile
    ├── pytest.ini
    │
    ├── core/
    │   ├── config.py            # env vars, audio and screen constants
    │   ├── session.py           # ADK runner, session creation, RunConfig
    │   └── pipeline.py          # upstream_task, downstream_task
    │
    ├── meetmind_agent/
    │   ├── agent.py             # Gemini Live agent, system prompt
    │   └── __init__.py
    │
    ├── static/
    │   └── index.html           # full frontend -- UI, audio, WebSocket
    │
    └── tests/
        └── test_meetmind.py     # pytest suite

Getting Started

Prerequisites

Python 3.11 or higher
Google API key with Gemini Live access
Chrome browser

Setup

git clone https://github.com/areychana/meetmind.git
cd meetmind/app

python -m venv .venv
.venv\Scripts\activate        # Windows
# source .venv/bin/activate   # macOS / Linux

pip install -r requirements.txt

cp .env.example .env
# add your GOOGLE_API_KEY to .env

uvicorn main:app --reload

Open http://127.0.0.1:8000 in Chrome.

Running Tests

pip install pytest pytest-asyncio
pytest tests/ -v

How It Works

Voice Activity Detection

A ScriptProcessorNode monitors audio energy (RMS) in real time. When energy spikes across 3 consecutive frames, the AI is interrupted immediately, ensuring your voice always takes priority.

Audio Scheduling

Incoming 24kHz PCM chunks from Gemini are decoded and queued sequentially using AudioBufferSourceNode.start(when) with a rolling nextStartTime cursor. This prevents gaps and overlaps, delivering seamless playback.

Bidirectional Streaming

The FastAPI backend runs two concurrent asyncio tasks per WebSocket connection:

upstream_task - Consumes browser messages (audio blobs, text input, screen frames) and pushes to LiveRequestQueue
downstream_task - Iterates through runner.run_live() and forwards audio chunks and transcription events back to the browser

Screen Context

Screen frames are captured every 5 seconds at 1280x720 resolution (JPEG quality 0.4) via ImageCapture.grabFrame() and sent as realtime blobs. The agent references screen content only when you explicitly ask.

License

MIT

Built for the Gemini Live Agent Hackathon by areychana, 2026.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
app		app
screenshots		screenshots
.gitignore		.gitignore
README.md		README.md
architecture.png		architecture.png
deploy.sh		deploy.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MeetMind

What is this?

Features

Live demo:

Screenshots

Architecture

Tech Stack

Project Structure

Getting Started

Prerequisites

Setup

Running Tests

How It Works

Voice Activity Detection

Audio Scheduling

Bidirectional Streaming

Screen Context

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MeetMind

What is this?

Features

Live demo:

Screenshots

Architecture

Tech Stack

Project Structure

Getting Started

Prerequisites

Setup

Running Tests

How It Works

Voice Activity Detection

Audio Scheduling

Bidirectional Streaming

Screen Context

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages