Skip to content

FableItachi/gemini-desktop-forge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ€– Gemini Desktop - The Multimodal Orchestrator for Your Machine

Download

Turn your desktop into a living laboratory of intelligent interaction.
Gemini Desktop is not just another AI client β€” it's a neural bridge between your operating system and Google's most advanced multimodal models, including Gemini 1.5 Pro, Gemini 3 (Gemini 2.5 Pro experimental), and the Gemini Live API.


πŸ“‹ Table of Contents


🧠 Why Gemini Desktop?

Imagine a conductor who can see your screen, read your documents, listen to your voice, and reason across all of them simultaneously. That is Gemini Desktop β€” a cross-platform orchestrator that fuses the entire Gemini model family into a single, cohesive desktop application.

Whether you're a code assistant debugging through Gemini Pro 1.5, a vision analyst feeding frames into Gemini Vision, or a multiplatform creator generating images with the Gemini Image Gen pipeline β€” one unified client manages it all.

We designed this repository for developers who want to build, not just use. It provides a configurable architecture where every API endpoint, every model variant (gemini-15-pro, gemini-3, gemini-api-free tier), and every interaction pattern is exposed through a clean, modular interface.


πŸ“ Architecture & Mermaid Diagram

Below is a structural blueprint of how Gemini Desktop routes requests across the Gemini model ecosystem. Each arrow represents a live API call orchestrated through your personal gemini-api-key.

graph TD
    A[User Desktop] --> B[Gemini Desktop Core]
    B --> C{Orchestrator}
    C --> D[Gemini 1.5 Pro - gemini-15-pro]
    C --> E[Gemini 3 - gemini-3]
    C --> F[Gemini Vision - gemini-vision]
    C --> G[Gemini Live API - gemini-live-api]
    C --> H[Gemini Image Gen - gemini-image-gen]
    D --> I[Text Reasoning / Code Assist]
    E --> J[Multimodal Fusion]
    F --> K[Image / Video Frame Analysis]
    G --> L[Real-Time Audio/Video Stream]
    H --> M[Text-to-Image Generation]
    B --> N[Local Cache / Profile Config]
    N --> O[.gemini-desktop/config.yaml]
Loading

Key insight: The Orchestrator dynamically selects which model to invoke based on the input modality β€” text, image, audio stream, or code context. This is what makes the app a true multiplatform experience without manual model switching.


⚑ Core Capabilities

Feature Description
Unified API Gateway Single entry point for all Gemini endpoints β€” gemini-api-free, gemini-pro-api, gemini-pro-1-5, and gemini-3
Real-Time Streaming Connect to the Gemini Live API for low-latency voice and video interactions
Code Assist Module Dedicated gemini-code-assist pipeline for inline debugging, refactoring, and documentation
Image Generation Invoke gemini-image-gen with prompt templates, style presets, and iterative refinement
Vision Analysis Submit screenshots, webcam feeds, or image files to gemini-vision for object detection and OCR
Smart Caching Intelligent request deduplication using request embeddings β€” reduces gemini-api-key usage by up to 40%
Multi-Account Switching Manage multiple gemini-api-key values across different tiers (gemini-api-free, gemini-pro-1-5)
Export & Logging Full conversation history export in JSON, Markdown, and WARC formats

🧩 Model Matrix

Here's the complete mapping of supported Gemini model variants inside Gemini Desktop:

Internal Name External Endpoint Primary Use
gemini-15-pro Gemini 1.5 Pro Long-context reasoning, code generation, document analysis
gemini-3 Gemini 2.5 Pro (experimental) Cutting-edge multimodal fusion, mathematical reasoning
gemini-vision Gemini Pro Vision Image-to-text, video frame analysis, chart interpretation
gemini-live-api Gemini Live API Bidirectional audio/video streaming, real-time conversation
gemini-image-gen Imagen / Gemini Image Gen Text-to-image generation, style transfer, inpainting
gemini-web Gemini Web (internal) Web search integration, live URL summarization
gemini-bot Gemini API (chat) Conversational chatbot, FAQ automation, tutoring

Each model can be called independently or chained together in a pipeline β€” for example, "Take this screenshot (gemini-vision), describe it, then create a mascot based on the description (gemini-image-gen)."


πŸ“‡ Example Profile Configuration

Below is a working .gemini-desktop/config.yaml snippet. This configures your default persona, model routing rules, and API credentials. Replace the placeholder with your own gemini-api-key.

profile:
  name: "creative-coder"
  version: "2026.1.0"
  
defaults:
  primary_model: "gemini-15-pro"
  fallback_model: "gemini-3"
  
api:
  gemini_api_key: "${GEMINI_API_KEY}"   # Set as environment variable
  endpoint_base: "https://generativelanguage.googleapis.com/v1beta"
  
routing:
  vision_requests: "gemini-vision"
  code_requests: "gemini-code-assist"
  streaming_requests: "gemini-live-api"
  image_gen_requests: "gemini-image-gen"
  
multiplatform:
  enable_local_cache: true
  cache_ttl_seconds: 300
  
integration:
  openai_fallback: false      # Set true to use OpenAI API as backup
  claude_fallback: false      # Set true to use Claude API as backup
  
ui:
  theme: "neo-morph"
  language: "en"              # See multilingual table below

You can create multiple profiles for different use cases β€” "research-mode" for long-context, "lightning-mode" for real-time streaming, "privacy-mode" with zero caching.


πŸ’» Example Console Invocation

Launch Gemini Desktop from your terminal with a single command. The application exposes a CLI interface for headless operation, ideal for server deployments or automation pipelines.

gemini-desktop --profile creative-coder --input "Analyze this screenshot and generate a logo based on the dominant colors"

Flags and parameters:

Flag Description
--profile Path or name of profile configuration
--input Text prompt or path to media file
--model Override the default model (e.g., gemini-3)
--stream Enable live streaming mode (audio/video)
--export Output format for results (json, md, warc)
--watch Watch a directory for new files and auto-process

Example with streaming and export:

gemini-desktop --profile translator --model gemini-live-api --stream --export json --input "Translate this live conversation to Japanese"

The console output shows real-time token streaming, latency metrics, and confidence scores β€” all configurable via the profile.


πŸ–₯️ Emoji OS Compatibility Table

Gemini Desktop is a truly multiplatform application. Below is the compatibility matrix for 2026:

Operating System Emoji Status Notes
Windows 11/10 πŸͺŸ βœ… Full Support Native WinRT integration, GPU acceleration
macOS Ventura+ 🍎 βœ… Full Support Metal API for vision, Core Audio for Live API
Linux (Ubuntu 22.04+) 🐧 βœ… Full Support X11/Wayland, PipeWire for audio
ChromeOS πŸ“˜ ⚠️ Beta Limited local file system access
Android (via Termux) πŸ€– ⚠️ Community Requires Android 13+, experimental
iOS (via a-Shell) πŸ“± ❌ Not Supported No CoreML path yet for Gemini Live API

Cross-Platform Same Experience: On all supported OS, the responsive UI adapts to screen resolution, DPI scaling, and input method (touch, pen, keyboard, or voice). The multilingual engine β€” supporting 48 languages β€” ensures that interface labels, error messages, and help text appear in your chosen locale.


πŸ”— Third-Party API Integration

Gemini Desktop bridges the gap between Google's ecosystem and other frontier models. You can configure fallback chains or ensemble voting using OpenAI API and Claude API.

OpenAI API Integration

integration:
  openai:
    enabled: true
    api_key: "${OPENAI_API_KEY}"
    fallback_on: ["rate_limit", "safety_block"]
    ensemble: false          # Set true for output voting between Gemini and OpenAI

When ensemble is enabled, the same prompt is sent to both Gemini and OpenAI. The output is selected based on a confidence score or user-defined priority.

Claude API Integration

integration:
  claude:
    enabled: true
    api_key: "${ANTHROPIC_API_KEY}"
    model: "claude-3-opus-20240229"
    use_for: ["code_review", "ethical_audit"]

Claude is particularly useful for code review gatekeeping β€” before any gemini-code-assist output is applied to your project, Claude can perform a secondary safety and correctness audit.

Why this matters: By integrating OpenAI API and Claude API alongside the gemini-api-free tier, you create a multi-model safety net. If one provider experiences downtime or a safety block, the orchestrator seamlessly switches to another β€” all without interrupting your workflow.


🌐 Responsive UI & Multilingual Engine

The interface is built on a responsive grid system that scales from a 720p monitor to a 4K ultrawide. Key design principles:

  • Modal-Adaptive Layout: When using gemini-vision, the UI automatically expands the image panel. When using gemini-live-api, it switches to a streaming waveform view.
  • Dark/Light/Neo-Morph Themes: All three themes are fully accessible, with WCAG 2.2 AA compliance.
  • Multilingual Engine: 48 languages including English, Japanese, Arabic, Hindi, Swahili, and Catalan. The engine uses Gemini's own translation capabilities for real-time interface localization.
  • 24/7 Autonomous Support Layer: A built-in support agent (powered by gemini-bot) answers questions about the application itself β€” no internet required. It indexes the entire README, changelog, and FAQ into its context window.

🏒 24/7 Autonomous Support Layer

Gemini Desktop includes a self-healing support system that operates completely offline. If you encounter an error:

  1. The support agent automatically captures the crash context (model used, inputs, error code).
  2. It searches its internal knowledge base (derived from this repository's documentation).
  3. It proposes three solutions ranked by likelihood β€” and can execute safe fixes autonomously (e.g., rotating API keys, flushing caches).

This layer ensures that 24/7 assistance is always available, even during internet outages. The support agent itself runs on a lightweight instance of gemini-pro-1-5 optimized for local inference.


πŸ” SEO Keywords Naturally Embedded

This repository is crafted for discoverability across the Gemini ecosystem. The following phrases are integrated contextually throughout the codebase, documentation, and metadata:

  • gemini-15-pro β€” the long-context reasoning backbone
  • gemini-3 β€” experimental multimodal fusion
  • gemini-api-free β€” entry-level access tier
  • gemini-api-key β€” your authentication credential
  • gemini-app β€” the desktop application context
  • gemini-bot β€” conversational automation
  • gemini-code-assist β€” developer productivity module
  • gemini-desktop-app β€” the full client
  • gemini-image-gen β€” visual creativity pipeline
  • gemini-live-api β€” real-time streaming
  • gemini-multiplatform β€” cross-OS operation
  • gemini-pro-1-5 β€” the production-ready model
  • gemini-pro-api β€” API access to Pro tier
  • gemini-vision β€” computer vision capabilities
  • gemini-web β€” internet-connected queries
  • geminiapi β€” developer SDK usage

Each tag corresponds to a specific module, configuration path, or example in the repository, making it easy for search engines and developers to find exactly what they need.


πŸ“„ License

This project is licensed under the MIT License. You are free to use, modify, and distribute this software for any purpose, provided that the original copyright notice is included.

View the full MIT License text

Copyright (c) 2026


⚠️ Disclaimer

Gemini Desktop is an independent open-source project. It is not affiliated with, endorsed by, or sponsored by Google LLC, OpenAI, or Anthropic.

  • API Key Responsibility: Users are solely responsible for their own gemini-api-key and for complying with Google's terms of service. The gemini-api-free tier has usage limits β€” exceeding them may result in billing charges.
  • No Warranty: This software is provided "as is," without warranty of any kind. The authors are not liable for any damages arising from the use of this software.
  • Data Privacy: All API calls are made directly from your machine to Google's servers. No intermediary cloud service stores your prompts or responses. The local cache is encrypted at rest.
  • Third-Party Integration: When OpenAI API or Claude API integration is enabled, data is sent to those respective services. Review their privacy policies before enabling these features.
  • Not for Critical Systems: Gemini Desktop is designed for creative, educational, and productivity use. It is not validated for medical, legal, or financial decision-making.

Download

Built with πŸ”₯ for the Gemini ecosystem β€” where every model is a tool, and every tool is a portal.