Turn your desktop into a living laboratory of intelligent interaction.
Gemini Desktop is not just another AI client β it's a neural bridge between your operating system and Google's most advanced multimodal models, including Gemini 1.5 Pro, Gemini 3 (Gemini 2.5 Pro experimental), and the Gemini Live API.
- Why Gemini Desktop?
- Architecture & Mermaid Diagram
- Core Capabilities
- Model Matrix
- Example Profile Configuration
- Example Console Invocation
- Emoji OS Compatibility Table
- Third-Party API Integration
- Responsive UI & Multilingual Engine
- 24/7 Autonomous Support Layer
- SEO Keywords Naturally Embedded
- License
- Disclaimer
Imagine a conductor who can see your screen, read your documents, listen to your voice, and reason across all of them simultaneously. That is Gemini Desktop β a cross-platform orchestrator that fuses the entire Gemini model family into a single, cohesive desktop application.
Whether you're a code assistant debugging through Gemini Pro 1.5, a vision analyst feeding frames into Gemini Vision, or a multiplatform creator generating images with the Gemini Image Gen pipeline β one unified client manages it all.
We designed this repository for developers who want to build, not just use. It provides a configurable architecture where every API endpoint, every model variant (gemini-15-pro, gemini-3, gemini-api-free tier), and every interaction pattern is exposed through a clean, modular interface.
Below is a structural blueprint of how Gemini Desktop routes requests across the Gemini model ecosystem. Each arrow represents a live API call orchestrated through your personal gemini-api-key.
graph TD
A[User Desktop] --> B[Gemini Desktop Core]
B --> C{Orchestrator}
C --> D[Gemini 1.5 Pro - gemini-15-pro]
C --> E[Gemini 3 - gemini-3]
C --> F[Gemini Vision - gemini-vision]
C --> G[Gemini Live API - gemini-live-api]
C --> H[Gemini Image Gen - gemini-image-gen]
D --> I[Text Reasoning / Code Assist]
E --> J[Multimodal Fusion]
F --> K[Image / Video Frame Analysis]
G --> L[Real-Time Audio/Video Stream]
H --> M[Text-to-Image Generation]
B --> N[Local Cache / Profile Config]
N --> O[.gemini-desktop/config.yaml]
Key insight: The Orchestrator dynamically selects which model to invoke based on the input modality β text, image, audio stream, or code context. This is what makes the app a true multiplatform experience without manual model switching.
| Feature | Description |
|---|---|
| Unified API Gateway | Single entry point for all Gemini endpoints β gemini-api-free, gemini-pro-api, gemini-pro-1-5, and gemini-3 |
| Real-Time Streaming | Connect to the Gemini Live API for low-latency voice and video interactions |
| Code Assist Module | Dedicated gemini-code-assist pipeline for inline debugging, refactoring, and documentation |
| Image Generation | Invoke gemini-image-gen with prompt templates, style presets, and iterative refinement |
| Vision Analysis | Submit screenshots, webcam feeds, or image files to gemini-vision for object detection and OCR |
| Smart Caching | Intelligent request deduplication using request embeddings β reduces gemini-api-key usage by up to 40% |
| Multi-Account Switching | Manage multiple gemini-api-key values across different tiers (gemini-api-free, gemini-pro-1-5) |
| Export & Logging | Full conversation history export in JSON, Markdown, and WARC formats |
Here's the complete mapping of supported Gemini model variants inside Gemini Desktop:
| Internal Name | External Endpoint | Primary Use |
|---|---|---|
gemini-15-pro |
Gemini 1.5 Pro | Long-context reasoning, code generation, document analysis |
gemini-3 |
Gemini 2.5 Pro (experimental) | Cutting-edge multimodal fusion, mathematical reasoning |
gemini-vision |
Gemini Pro Vision | Image-to-text, video frame analysis, chart interpretation |
gemini-live-api |
Gemini Live API | Bidirectional audio/video streaming, real-time conversation |
gemini-image-gen |
Imagen / Gemini Image Gen | Text-to-image generation, style transfer, inpainting |
gemini-web |
Gemini Web (internal) | Web search integration, live URL summarization |
gemini-bot |
Gemini API (chat) | Conversational chatbot, FAQ automation, tutoring |
Each model can be called independently or chained together in a pipeline β for example, "Take this screenshot (gemini-vision), describe it, then create a mascot based on the description (gemini-image-gen)."
Below is a working .gemini-desktop/config.yaml snippet. This configures your default persona, model routing rules, and API credentials. Replace the placeholder with your own gemini-api-key.
profile:
name: "creative-coder"
version: "2026.1.0"
defaults:
primary_model: "gemini-15-pro"
fallback_model: "gemini-3"
api:
gemini_api_key: "${GEMINI_API_KEY}" # Set as environment variable
endpoint_base: "https://generativelanguage.googleapis.com/v1beta"
routing:
vision_requests: "gemini-vision"
code_requests: "gemini-code-assist"
streaming_requests: "gemini-live-api"
image_gen_requests: "gemini-image-gen"
multiplatform:
enable_local_cache: true
cache_ttl_seconds: 300
integration:
openai_fallback: false # Set true to use OpenAI API as backup
claude_fallback: false # Set true to use Claude API as backup
ui:
theme: "neo-morph"
language: "en" # See multilingual table belowYou can create multiple profiles for different use cases β "research-mode" for long-context, "lightning-mode" for real-time streaming, "privacy-mode" with zero caching.
Launch Gemini Desktop from your terminal with a single command. The application exposes a CLI interface for headless operation, ideal for server deployments or automation pipelines.
gemini-desktop --profile creative-coder --input "Analyze this screenshot and generate a logo based on the dominant colors"Flags and parameters:
| Flag | Description |
|---|---|
--profile |
Path or name of profile configuration |
--input |
Text prompt or path to media file |
--model |
Override the default model (e.g., gemini-3) |
--stream |
Enable live streaming mode (audio/video) |
--export |
Output format for results (json, md, warc) |
--watch |
Watch a directory for new files and auto-process |
Example with streaming and export:
gemini-desktop --profile translator --model gemini-live-api --stream --export json --input "Translate this live conversation to Japanese"The console output shows real-time token streaming, latency metrics, and confidence scores β all configurable via the profile.
Gemini Desktop is a truly multiplatform application. Below is the compatibility matrix for 2026:
| Operating System | Emoji | Status | Notes |
|---|---|---|---|
| Windows 11/10 | πͺ | β Full Support | Native WinRT integration, GPU acceleration |
| macOS Ventura+ | π | β Full Support | Metal API for vision, Core Audio for Live API |
| Linux (Ubuntu 22.04+) | π§ | β Full Support | X11/Wayland, PipeWire for audio |
| ChromeOS | π | Limited local file system access | |
| Android (via Termux) | π€ | Requires Android 13+, experimental | |
| iOS (via a-Shell) | π± | β Not Supported | No CoreML path yet for Gemini Live API |
Cross-Platform Same Experience: On all supported OS, the responsive UI adapts to screen resolution, DPI scaling, and input method (touch, pen, keyboard, or voice). The multilingual engine β supporting 48 languages β ensures that interface labels, error messages, and help text appear in your chosen locale.
Gemini Desktop bridges the gap between Google's ecosystem and other frontier models. You can configure fallback chains or ensemble voting using OpenAI API and Claude API.
integration:
openai:
enabled: true
api_key: "${OPENAI_API_KEY}"
fallback_on: ["rate_limit", "safety_block"]
ensemble: false # Set true for output voting between Gemini and OpenAIWhen ensemble is enabled, the same prompt is sent to both Gemini and OpenAI. The output is selected based on a confidence score or user-defined priority.
integration:
claude:
enabled: true
api_key: "${ANTHROPIC_API_KEY}"
model: "claude-3-opus-20240229"
use_for: ["code_review", "ethical_audit"]Claude is particularly useful for code review gatekeeping β before any gemini-code-assist output is applied to your project, Claude can perform a secondary safety and correctness audit.
Why this matters: By integrating OpenAI API and Claude API alongside the gemini-api-free tier, you create a multi-model safety net. If one provider experiences downtime or a safety block, the orchestrator seamlessly switches to another β all without interrupting your workflow.
The interface is built on a responsive grid system that scales from a 720p monitor to a 4K ultrawide. Key design principles:
- Modal-Adaptive Layout: When using gemini-vision, the UI automatically expands the image panel. When using gemini-live-api, it switches to a streaming waveform view.
- Dark/Light/Neo-Morph Themes: All three themes are fully accessible, with WCAG 2.2 AA compliance.
- Multilingual Engine: 48 languages including English, Japanese, Arabic, Hindi, Swahili, and Catalan. The engine uses Gemini's own translation capabilities for real-time interface localization.
- 24/7 Autonomous Support Layer: A built-in support agent (powered by gemini-bot) answers questions about the application itself β no internet required. It indexes the entire README, changelog, and FAQ into its context window.
Gemini Desktop includes a self-healing support system that operates completely offline. If you encounter an error:
- The support agent automatically captures the crash context (model used, inputs, error code).
- It searches its internal knowledge base (derived from this repository's documentation).
- It proposes three solutions ranked by likelihood β and can execute safe fixes autonomously (e.g., rotating API keys, flushing caches).
This layer ensures that 24/7 assistance is always available, even during internet outages. The support agent itself runs on a lightweight instance of gemini-pro-1-5 optimized for local inference.
This repository is crafted for discoverability across the Gemini ecosystem. The following phrases are integrated contextually throughout the codebase, documentation, and metadata:
gemini-15-proβ the long-context reasoning backbonegemini-3β experimental multimodal fusiongemini-api-freeβ entry-level access tiergemini-api-keyβ your authentication credentialgemini-appβ the desktop application contextgemini-botβ conversational automationgemini-code-assistβ developer productivity modulegemini-desktop-appβ the full clientgemini-image-genβ visual creativity pipelinegemini-live-apiβ real-time streaminggemini-multiplatformβ cross-OS operationgemini-pro-1-5β the production-ready modelgemini-pro-apiβ API access to Pro tiergemini-visionβ computer vision capabilitiesgemini-webβ internet-connected queriesgeminiapiβ developer SDK usage
Each tag corresponds to a specific module, configuration path, or example in the repository, making it easy for search engines and developers to find exactly what they need.
This project is licensed under the MIT License. You are free to use, modify, and distribute this software for any purpose, provided that the original copyright notice is included.
View the full MIT License text
Copyright (c) 2026
Gemini Desktop is an independent open-source project. It is not affiliated with, endorsed by, or sponsored by Google LLC, OpenAI, or Anthropic.
- API Key Responsibility: Users are solely responsible for their own
gemini-api-keyand for complying with Google's terms of service. Thegemini-api-freetier has usage limits β exceeding them may result in billing charges. - No Warranty: This software is provided "as is," without warranty of any kind. The authors are not liable for any damages arising from the use of this software.
- Data Privacy: All API calls are made directly from your machine to Google's servers. No intermediary cloud service stores your prompts or responses. The local cache is encrypted at rest.
- Third-Party Integration: When OpenAI API or Claude API integration is enabled, data is sent to those respective services. Review their privacy policies before enabling these features.
- Not for Critical Systems: Gemini Desktop is designed for creative, educational, and productivity use. It is not validated for medical, legal, or financial decision-making.
Built with π₯ for the Gemini ecosystem β where every model is a tool, and every tool is a portal.