Skip to content

RhythrosaLabs/otto-mate-2

Repository files navigation

Ottomate

Self-hosted multi-model AI agent platform.
Describe a goal — Ottomate plans, codes, browses, generates media, builds apps, and orchestrates 190+ services autonomously.

Built with Next.js 15 · Claude · GPT · Gemini · Replicate · Luma · FLUX · HuggingFace

Created by Dan Sheils

Next.js Claude GPT-4.1 License self-hosted

Quick StartFeaturesScreenshotsPagesModelsConnectorsArchitecture


What is Ottomate?

Ottomate is a self-hosted, multi-model AI agent platform built with Next.js 15.
Describe a goal in plain English — the agent plans multi-step workflows, writes and executes code, searches the web, talks to 190+ external services, generates images, video, and audio, and saves every artifact it produces.

It ships as a single npm install with zero external infrastructure. A SQLite database is created on first launch.

Key capabilities:

  • Autonomous task execution — plans, reasons, and iterates with tool use until the goal is met
  • Computer Control — full desktop automation via Anthropic's native computer use tools — screenshots, mouse, keyboard, bash, and file editing with visual feedback
  • Multi-model orchestration — Claude Opus/Sonnet, GPT-4o/4.1, Gemini 2.0, Perplexity Sonar, OpenRouter, with automatic failover
  • Code execution — runs Python, Node.js, and shell scripts in-process with captured output
  • Web browsing — searches (Brave, Perplexity, Serper, Tavily), scrapes pages, and automates browsers via Playwright
  • 190+ connectors — Gmail, Slack, GitHub, Jira, Stripe, Notion, HubSpot, WhatsApp, and many more
  • Nova AI creative suite — universal SmartBar that auto-detects output type (image, video, audio, 3D, text), searches Replicate + HuggingFace models live, generates media, and offers post-generation actions (animate, upscale, make 3D). Includes image generation (FLUX, DALL-E 3 + 25 styles), video generation (Minimax, Kling, Wan, Seedance), AI soundtracks (MusicGen), speech (12 voices, 2 providers), image editing (7 AI operations with canvas masking), and a unified gallery
  • Dreamscape Video Studio — 17-mode AI creative studio built around Luma Dream Machine (Ray 2, Ray Flash 2, Photon 1, Photon Flash 1) with storyboards, 20 camera presets, continuity library, and an AI Director with command chain orchestration
  • AI media generation — Luma Dream Machine (video/image), Replicate (1000s of models), FLUX, DALL-E 3, ElevenLabs (voice), MusicGen (music)
  • Sub-agents — spawns specialized child agents for parallel work
  • Persistent memory — key-value store the agent reads/writes across tasks
  • Scheduled tasks — cron expressions, intervals, daily/weekly recurrence
  • Skills marketplace — 270+ pre-built skills across 10 categories
  • Voice input — dictate tasks via Whisper or browser speech recognition
  • Slash commands/image, /research, /code, /email, /video, /scrape, and more

Quick Start

Prerequisites

Requirement Notes
Node.js 18+ node -v to check
Anthropic API key console.anthropic.com — this is the only required key

Install & run

git clone https://github.com/RhythrosaLabs/otto-mate-2.git
cd otto-mate-2
npm install

# Add your API key (minimum required)
echo "ANTHROPIC_API_KEY=sk-ant-..." >> .env.local

# Skip login in local dev (no credentials required)
echo "DISABLE_AUTH=true" >> .env.local

# Start Next.js only
npm run dev

# — or — start all services (bolt-diy App Builder + code-server Coding Companion)
npm run dev:all

Open http://localhost:3000 — the app loads directly when DISABLE_AUTH=true is set.

Optional keys unlock more models and features. See Environment Variables below.

Production auth: remove DISABLE_AUTH and set NEXTAUTH_SECRET (any random string, e.g. openssl rand -hex 32) to enable JWT-based login.


Features

Task Engine

The core loop: you describe a goal → the agent creates a plan → executes steps (tool calls, code, API requests, sub-agents) → streams results back in real time. Tasks support follow-up chat, file attachments, voice input, and slash commands.

Multi-Model Failover

The agent picks the best model automatically or you choose manually. If a provider is down or rate-limited, it fails over through the chain: Anthropic → OpenAI → Google → OpenRouter (DeepSeek) → Perplexity with exponential backoff.

Dreamscape Video Studio

A 17-mode AI creative studio built around Luma Dream Machine (Ray 2, Ray Flash 2, Photon 1, Photon Flash 1) with Replicate model support and MusicGen/Bark audio generation. Organize work into storyboards, artboards, and moodboards — each containing individual shots you can generate, extend, remix, and chain together.

Generation modes: text-to-video, image-to-video, extend, reverse-extend, interpolate, text-to-image, image reference, character reference (persistent identity across shots), style reference, modify video, modify video with keyframes, modify image, reframe (change aspect ratio of existing media), music generation (MusicGen), sound effects (Bark), voiceover, and lip-sync.

Production controls: 20 camera motion presets (pan, zoom, orbit, crane, dolly, tracking, handheld, static, arc, dutch tilt, whip pan — each with directional variants), 9 modify intensity levels (adhere → flex → reimagine, each with 3 sub-levels), 4 resolutions (540p → 4K), 7 aspect ratios (1:1, 3:4, 4:3, 9:16, 16:9, 9:21, 21:9), 5s/9s/10s durations, HDR output (EXR), loop toggle, batch generation (up to 4 variants), and a draft/hi-fi phase workflow with auto-upgrade from Flash to full models.

Auto-model intelligence: Recommends the optimal model per generation mode — e.g., Flash models for fast text-to-video drafts, full Ray 2 / Photon 1 for character consistency, style transfer, and modify operations.

AI Director: A built-in chat agent that interprets natural language into multi-step command chains with dependency ordering, continuity sheets (style anchors, character references, setting references), concept pill word-swapping for rapid prompt iteration, and creative query "Enhance" mode for prompt variations. Supports parallel step execution, smart retry with model downgrade on access errors, and progress tracking per step.

Additional features: Shot tagging, likes, and bookmarks for organization. Annotation overlay system (arrows, rectangles, text labels) that feeds spatial context into prompts. Board export/import as JSON. Per-shot media preview with mute controls. Search and filter across all shots. Film Player for sequential playback of completed shots. Ideas Gallery for browsing all completed work across boards. Continuity library for persisting style, character, and setting references across boards. Handoff system for sending media between studios.

Creative Suite (Nova)

A full-featured AI media generation hub accessible from the sidebar. The home view features a universal SmartBar that auto-detects output type (Image, Video, Audio, 3D, Text) from your prompt, searches Replicate + HuggingFace model libraries in real time, and renders results inline with quick actions (animate to video, upscale 4×, make 3D). Five dedicated creation tools and a gallery are accessible as sub-pages:

Generate Image: Text-to-image with FLUX Schnell (fast), FLUX 1.1 Pro, FLUX 2 Pro, DALL-E 3, plus live model search across Replicate and HuggingFace. Supports 8 aspect ratios, 25 style presets (cinematic, anime, watercolor, pixel art, etc.), 9 lighting presets, 8 camera angles, 4 content types, visual intensity slider, negative prompts, structure and style references with adjustable strength, seed control, and batch generation (1–4 images). Inline quick actions: edit, generative fill, animate to video, upscale, copy to clipboard, save to gallery.

Generate Video: Text-to-video and image-to-video generation via Replicate models (Minimax Video-01-Live default, plus Kling, Wan 2.1, Seedance, and any searched model). Supports 3 aspect ratios (16:9, 9:16, 1:1), 3 durations (4s, 5s, 10s), 6 camera motions (pan, zoom, orbit), motion intensity slider, and optional first-frame image upload. Add Soundtrack button links directly to soundtrack generation.

Generate Soundtrack: AI music generation via Replicate models (MusicGen default, plus searched models). 13 genres, 12 moods, 4 tempo ranges, 3 energy levels, 12 multi-select instruments, duration slider (5–30s), and optional video upload for scoring. Animated waveform visualizer with genre/mood/tempo/instrument tags on results.

Generate Speech: Professional AI voiceovers with 12 voices across 2 providers — OpenAI TTS (Alloy, Echo, Fable, Onyx, Nova, Shimmer) and ElevenLabs (Rachel, Drew, Clyde, Paul, Domi, Bella). Speed slider (0.5×–2.0×), 9 language options, character count with estimated duration, plus live model search for additional TTS models.

Edit Image: Full image editing suite with 7 AI operations — generative fill, remove object, replace background, generative expand, upscale (2×/4×), remove background, and prompt-to-edit. Canvas-based brush masking with adjustable size and line interpolation. Expand supports directional control (all/left/right/up/down) and target ratios. Accept/discard workflow with undo history.

Connectors Marketplace

190+ integrations across 28 categories: communication, storage, development, project management, CRM, data, AI (LLMs, image, video, audio, speech, code, design, search, 3D, vector), analytics, automation, browser, cloud, ecommerce, finance, marketing, music, productivity, security, and social media. OAuth flows for Google/Microsoft/GitHub/Notion/Dropbox; API key entry for everything else. 135+ connectors have a completely free tier.

Skills

Skills are reusable instruction sets (like Custom GPTs). Browse 270+ pre-built skills across 10 categories (code, writing, research, data, automation, architecture, infrastructure, security, testing, custom) in the marketplace or create your own.

Scheduling

Schedule any task to run automatically. Supports one-time (with optional delete-after-run), recurring intervals, daily, weekly, and full cron expressions. Enable/disable individual schedules and see next-run timestamps.

Memory

A persistent key-value store with tags that the agent reads and writes during task execution. Stored facts, preferences, and context carry over across tasks. You can search, add, tag, or delete entries manually.

Computer Control

Full desktop automation powered by Anthropic's native computer use tools (computer_20251124 + bash_20250124 + text_editor_20250728). Give it a task in plain English and watch Claude take screenshots, move the mouse, type, click, run shell commands, and edit files — all with live visual feedback streamed to your browser.

Capabilities: Screenshot capture with coordinate scaling, mouse control (click, double-click, right-click, drag, scroll), keyboard input (type, key press, hotkeys), bash shell execution, file viewing and editing, and zoom into screen regions.

Features: Live screenshot viewer with action dot overlay, real-time activity log, configurable max steps (default 75, up to 200), app permission system (approve/deny access per app), blocked app list, model selection (Sonnet/Opus), prompt caching for cost efficiency, extended thinking for better reasoning, and a Continue button to resume when max steps are reached.

Cross-provider fallback: Non-Anthropic models (GPT-4, Gemini, etc.) can delegate GUI tasks to Computer Control via the delegate_to_computer_control tool, giving every provider access to full desktop automation with image feedback.

Analytics & Audit

Analytics shows KPIs: total tasks, success rate, average duration, top tools (with per-tool success rates), model usage (with average cost per call), daily task volume (last 30 days), and recent errors. Audit Trail is a paginated log of every agent action — tool calls, model invocations, and task events — with duration, metadata, search, and filters (event type, tool, success status).


Tutorial


Screenshots

Home

The main prompt interface — type a goal, use slash commands, attach files, or pick from the prompt gallery.

Home

Connectors

190+ integrations — connect Gmail, Slack, GitHub, Stripe, Notion, and more with OAuth or API keys.

Connectors

Dreamscape Video Studio

17-mode AI creative studio with storyboards, 20 camera presets, continuity library, and the AI Director command chain system.

Dreamscape

Nova — Generate

AI-powered creative hub with SmartBar (auto-detect output type, dual-provider model search), image/video/audio/speech generation, image editing, and unified gallery.

Nova

Skills Marketplace

270+ pre-built skills across 10 categories — or create your own.

Skills

More screenshots

Scheduled Tasks

Scheduled


Pages

Sidebar Navigation

The sidebar contains 21 items. Five are optional — hidden by default and toggled on in Settings › Features.

Page Icon Optional Description
Ottomate Monitor Centered prompt input with 12 slash commands, voice input (Whisper + browser speech), file attachments, gallery suggestions, and category chips
Tasks CheckSquare List all tasks with status filters (running/completed/failed), search, sort, calendar view
Files FolderOpen Finder-style file browser with icon/list/gallery views, 50+ format support, folders, preview pane, and source filters
Connectors Plug Integration marketplace — connect 190+ services via OAuth or API key
Skills Zap Create, edit, and install reusable agent behaviors; 270+ in the marketplace
Documents FileEdit Create and manage text documents and spreadsheets with AI writing assistance, search, and relative timestamps
App Builder Package Embedded bolt-diy full-stack AI app builder (WebContainers, port 5173)
Coding Companion Terminal Embedded code-server (VS Code in browser, port 3100)
Video Studio Clapperboard 17-mode AI creative studio — Luma Dream Machine (Ray 2, Ray Flash 2, Photon 1, Photon Flash 1) video/image/audio generation organized into storyboards with 20 camera presets, character identity persistence, 9 modify intensities, draft/hi-fi phases, AI Director with command chains, continuity library, annotations, and Film Player
Multimedia Playground Layers Power-user workbench — dual-provider model search (Replicate + HuggingFace), multi-column comparison, quick actions per result type
Audio Studio Music Embedded openDAW browser-based DAW (port 8080)
Creative Suite Flame AI creative hub with SmartBar (auto-detects output type, dual-provider model search), generate images (FLUX, DALL-E 3 + 25 styles), video (Minimax, Kling, Wan, Seedance), soundtracks (13 genres, 12 instruments), speech (12 voices, 2 providers), edit images (7 operations), and unified gallery. Route: /computer/firefly
3D Studio Box Embedded Blockbench 3D model editor (port 3001)
Dispatch Send Messaging channel setup — configure Telegram, Discord, Slack, and WhatsApp webhooks; view connection status and send test messages
Memory Brain View, search, add, and delete agent memory entries
Scheduled Clock Cron-based task scheduler with interval, daily, weekly, and cron modes
Analytics BarChart3 Performance dashboard — KPIs, tool popularity, model costs, error patterns
Audit Trail Shield Paginated log of every agent action with filters and metadata
Sessions MessageSquare Group related tasks into conversation sessions with shared context
Computer Control MousePointer2 Full desktop automation — give Claude a task and watch it control your screen with live screenshots, mouse, keyboard, bash, and file editing. Configurable max steps, app permissions, and Continue button
Settings Settings Default model, token/cost budgets, themes, verbose mode, optional feature toggles, health check

Sub-Pages

These pages are reached from within their parent page:

Page Parent Description
Task Detail Tasks Live agent execution with Steps, Chat, Files, and Preview tabs — streaming output, token tracking, context budget
Document Editor Documents Rich text editor with AI writing assistance, auto-save, and title sync
Generate Image Creative Suite Text-to-image with FLUX, DALL-E 3, 25 style presets, aspect ratios, and live model search
Generate Video Creative Suite Text-to-video and image-to-video via Minimax, Kling, Wan 2.1, Seedance, and more
Generate Soundtrack Creative Suite AI music generation via MusicGen — 13 genres, 12 moods, 12 instruments, optional video upload
Generate Speech Creative Suite AI voiceovers — 12 voices across OpenAI TTS and ElevenLabs, speed/language control
Edit Image Creative Suite 7 AI editing operations (generative fill, remove object, replace background, expand, upscale, remove BG, prompt-to-edit) with canvas brush masking

Additional Pages

These pages exist but are not shown in the sidebar:

Page Description
Replicate Replicate-only model explorer with Smart Run (auto-selects best model), quick category buttons, inline or task-based execution
Dream Machine Direct Luma Dream Machine interface for quick video/image generation
Channels Legacy channel configuration page (webhook URLs for Telegram, Discord, Slack, WhatsApp) — superseded by Dispatch in the sidebar
Image Studio Redirects to Creative Suite (/computer/firefly) — kept alive to avoid broken links from agent system prompts
WhatsApp WhatsApp Business API integration dashboard — connection status, send messages, webhook URL display
Onboarding First-run setup wizard — health check, model selection, guided intro
Admin Internal admin tools (not exposed in sidebar)

Models

Ottomate supports 18 model options across 5 providers, plus a free tier:

Model ID Provider Best for
Auto (Recommended) auto Agent picks best model per sub-task automatically
Claude Opus 4.6 claude-opus-4-6 Anthropic Complex reasoning, multi-step orchestration
Claude Sonnet 4.6 claude-sonnet-4-6 Anthropic Balanced speed/quality, general tasks
Claude Haiku 4.5 claude-haiku-4-5 Anthropic Ultra-fast, cheapest Claude
GPT-5.4 gpt-5.4 OpenAI Strong reasoning, coding, vision, 1M context
GPT-5.4 Mini gpt-5.4-mini OpenAI Fast and balanced, great for general tasks
GPT-5.4 Nano gpt-5.4-nano OpenAI Ultra-cheap for simple tasks
Gemini 2.5 Pro gemini-2.5-pro Google Deep research, long documents, advanced reasoning
Gemini 2.5 Flash gemini-2.5-flash Google Best price-performance, fast and capable
Gemini 2.5 Flash-Lite gemini-2.5-flash-lite Google Ultra-cheap for simple queries
Gemini 2.5 Nano gemini-2.5-nano Google Smallest model, edge tasks
Sonar sonar Perplexity Real-time web-augmented search
Sonar Pro sonar-pro Perplexity Deeper web-augmented analysis
Sonar Reasoning Pro sonar-reasoning-pro Perplexity Multi-step reasoning + web search
Sonar Deep Research sonar-deep-research Perplexity Long-form web research
OpenRouter (Any Model) openrouter OpenRouter Route to 200+ models (DeepSeek, Llama, Mistral, Qwen, etc.)
Free (OpenRouter) free OpenRouter Zero-cost inference via Nemotron, Qwen, Llama, Gemma & more

Set auto to let the agent pick the best model per task. Legacy model IDs (e.g. gpt-4o, gemini-1.5-pro) are automatically remapped to their current equivalents.


Environment Variables

Create a .env.local file in the project root:

Variable Required Description
ANTHROPIC_API_KEY Yes Claude models — console.anthropic.com
DISABLE_AUTH No Set to true to skip login in local dev (no JWT required)
NEXTAUTH_SECRET No (prod) JWT signing secret for session cookies — required when DISABLE_AUTH is unset. Generate with openssl rand -hex 32
OPENAI_API_KEY No GPT-4o, GPT-4.1, DALL-E 3
GOOGLE_AI_API_KEY No Gemini 2.5 Pro/Flash/Flash-Lite/Nano
GROQ_API_KEY No Llama / Mixtral via Groq
OPENROUTER_API_KEY No Access 200+ models including free tier via OpenRouter
PERPLEXITY_API_KEY No Real-time web search via Perplexity Sonar
BRAVE_SEARCH_API_KEY No Web search via Brave
SERPER_API_KEY No Google search via Serper
TAVILY_API_KEY No AI-powered web search
REPLICATE_API_TOKEN No Run 1000s of ML models on Replicate
LUMA_API_KEY No Luma Dream Machine video/image generation
ELEVENLABS_API_KEY No Text-to-speech via ElevenLabs
DATABASE_PATH No SQLite DB path (default: ./perplexity-computer.db)
APP_URL No Public URL (default: http://localhost:3000)
GOOGLE_CLIENT_ID / SECRET No OAuth for Gmail, Drive, Sheets, Docs, Calendar
MICROSOFT_CLIENT_ID / SECRET No OAuth for Outlook, OneDrive, Teams
GITHUB_CLIENT_ID / SECRET No GitHub OAuth
NOTION_CLIENT_ID / SECRET No Notion OAuth
DROPBOX_CLIENT_ID / SECRET No Dropbox OAuth

Connectors

Navigate to Connectors in the sidebar. Click Connect on any service to begin setup.

  • OAuth connectors — click "Sign in with [Provider]" and authorize in the popup
  • API key connectors — paste your token and click Connect
  • Free badge = no credit card required

Free-tier connectors (135+)

Connector Auth Notes
Gmail / Google Calendar / Drive / Sheets / Docs OAuth Free with Google account
Outlook / OneDrive / Microsoft Calendar OAuth Free with Microsoft account
Slack API key Free workspace available
Discord API key Free bot token
Telegram API key Free via BotFather
Dropbox OAuth 2 GB free
Box API key 10 GB free
GitHub OAuth Free public + private repos
GitLab API key Free on GitLab.com
Vercel API key Free Hobby plan
Sentry API key Free Developer plan
Linear API key Free personal plan
Jira / Confluence API key Free up to 10 users
Asana API key Free up to 10 teammates
ClickUp API key Free Forever plan
Monday.com API key Free 2 seats
HubSpot API key Free CRM
Notion OAuth Free personal plan
Airtable API key Free unlimited bases
Supabase API key Free 500 MB
PostgreSQL Conn. string Self-hosted or cloud free tier
Figma API key Free Starter plan
Calendly API key Free Basic plan
WordPress / Webflow / Wix API key Free tiers available
Hugging Face API key Free (rate limited)
ElevenLabs API key 10k chars/month free
Stripe API key Free test mode
Mailchimp / Klaviyo API key Free up to 500 contacts
Communication connector setup

Gmail + Google Calendar

Auth: OAuth (Google) | Free: Yes

  1. Click Sign in with Google in the connector modal
  2. For your own OAuth app: go to Google Cloud Console → Credentials, create an OAuth 2.0 Client ID, add http://localhost:3000/api/auth/callback/google as redirect URI
  3. Enable APIs: Gmail, Calendar, Drive, Sheets, Docs
  4. Add GOOGLE_CLIENT_ID and GOOGLE_CLIENT_SECRET to .env.local

Outlook + Microsoft Calendar

Auth: OAuth (Microsoft) | Free: Yes

  1. Click Sign in with Microsoft in the connector modal
  2. For your own OAuth app: Azure Portal → App registrations, add redirect URI http://localhost:3000/api/auth/callback/microsoft
  3. Add MICROSOFT_CLIENT_ID and MICROSOFT_CLIENT_SECRET to .env.local

Slack

  1. api.slack.com/apps → Create New App → add Bot Token Scopes: chat:write, channels:read, channels:history
  2. Install to Workspace → copy Bot User OAuth Token (xoxb-...)

Discord

  1. discord.com/developers/applications → New Application → Bot → Reset Token
  2. OAuth2 URL Generator: bot scope + Send Messages permission → invite bot

Telegram

  1. Message @BotFather/newbot → copy the token

Zoom

Zoom Marketplace → Server-to-Server OAuth → generate token

Twilio

twilio.com → Console → copy Account SID + Auth Token

Storage connector setup

Google Drive / Sheets / Docs

Connected automatically when you sign in with Google OAuth.

OneDrive

Connected automatically when you sign in with Microsoft OAuth.

Dropbox

  1. dropbox.com/developers/apps → Create app → Scoped access
  2. Add redirect URI: http://localhost:3000/api/auth/callback/dropbox
  3. Add DROPBOX_CLIENT_ID and DROPBOX_CLIENT_SECRET to .env.local

Box

app.box.com/developers/console → Create New App → generate Developer Token

Development connector setup

GitHub

Option A (OAuth): github.com/settings/developers → OAuth Apps → callback URL http://localhost:3000/api/auth/callback/github → add to .env.local

Option B (PAT): github.com/settings/tokens/new → scopes repo, user → paste in connector modal

Vercel

vercel.com → Account Settings → Tokens → Create Token

GitLab

gitlab.com → User Settings → Access Tokens → scopes api, read_repository, write_repository

Sentry

sentry.io → Settings → Auth Tokens → scopes project:read, event:read, event:write

Datadog

app.datadoghq.com → API Keys + Application Keys

Project management connector setup

Linear

linear.app/settings/api → Create key

Jira

id.atlassian.com/manage-profile/security/api-tokens → Create API token → enter as email:token@domain

Asana

app.asana.com/0/my-apps → Create new token

ClickUp

Avatar → Settings → Apps → Generate API Key

Monday.com

Avatar → Developers → My Access Tokens

Confluence

Uses the same Atlassian API token as Jira.

CRM connector setup

HubSpot

Settings → Integrations → Private Apps → Create → select CRM scopes → copy access token

Salesforce

Developer Edition (free) → Setup → Connected App → copy Access Token

Zendesk

Admin Center → APIs → Zendesk API → enable Token Access → create API token

Data connector setup

Airtable

airtable.com/create/tokens → Create token → scopes data.records:read, data.records:write

Supabase

supabase.com → Project Settings → API → copy service_role key

PostgreSQL

Paste connection string: postgresql://user:pass@host:5432/dbname (works with Neon, Railway, Render, or self-hosted)

Snowflake

Enter accountidentifier:username:password

Productivity connector setup

Notion

OAuth: notion.so/my-integrations → New integration → enable Public → redirect URI http://localhost:3000/api/auth/callback/notion

Token: Copy Internal Integration Token (secret_...) → share pages with the integration

Figma

Account Settings → Personal access tokens → create token

Calendly

calendly.com/integrations/api_webhooks → Generate New Token

WordPress.com

Me → Security → enable 2FA → Application Passwords → enter as username:apppassword

Webflow

Project Settings → Integrations → API Access → Generate API Token

Wix

manage.wix.com/account/api-keys → Generate API Key

AI service connector setup

OpenAI

platform.openai.com/api-keys → Create new secret key (sk-...)

Hugging Face

huggingface.co/settings/tokens → Read token (hf_...)

ElevenLabs

elevenlabs.io → Profile → API Key

Replicate

replicate.com/account/api-tokens → copy token (r8_...)

Finance & marketing connector setup

Stripe

dashboard.stripe.com → Developers → API keys → copy Secret key (sk_test_... or sk_live_...)

Shopify

Admin → Settings → Apps → Develop apps → configure scopes → copy Admin API access token

Mailchimp

Account → Extras → API keys → Create A Key (includes datacenter: abc123-us1)

Klaviyo

Settings → API Keys → Create Private API Key

OAuth setup (Google, Microsoft, GitHub, Notion, Dropbox)

Google OAuth

Connects Gmail, Drive, Sheets, Docs, and Calendar in one click.

  1. Create a project at console.cloud.google.com
  2. Enable APIs: Gmail, Calendar, Drive, Sheets, Docs
  3. APIs & Services → Credentials → OAuth client ID (Web) → redirect URI http://localhost:3000/api/auth/callback/google
  4. Configure consent screen with test users
  5. Add to .env.local:
    GOOGLE_CLIENT_ID=your-client-id.apps.googleusercontent.com
    GOOGLE_CLIENT_SECRET=GOCSPX-...
    

Microsoft OAuth

Connects Outlook, OneDrive, Teams, and SharePoint.

  1. Azure Portal → App registrations → New registration
  2. Redirect URI: http://localhost:3000/api/auth/callback/microsoft
  3. API permissions → Microsoft Graph: Mail.ReadWrite, Mail.Send, Calendars.ReadWrite, Files.ReadWrite, offline_access
  4. Certificates & secrets → New client secret
  5. Add to .env.local:
    MICROSOFT_CLIENT_ID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
    MICROSOFT_CLIENT_SECRET=your-secret-value
    

GitHub OAuth

  1. github.com/settings/developers → OAuth Apps → callback URL http://localhost:3000/api/auth/callback/github
  2. Add to .env.local:
    GITHUB_CLIENT_ID=your-client-id
    GITHUB_CLIENT_SECRET=your-client-secret
    

Notion OAuth

  1. notion.so/my-integrations → New integration → enable Public → redirect URI http://localhost:3000/api/auth/callback/notion
  2. Add to .env.local:
    NOTION_CLIENT_ID=your-client-id
    NOTION_CLIENT_SECRET=your-client-secret
    

Dropbox OAuth

  1. dropbox.com/developers/apps → Create app → redirect URI http://localhost:3000/api/auth/callback/dropbox
  2. Add to .env.local:
    DROPBOX_CLIENT_ID=your-app-key
    DROPBOX_CLIENT_SECRET=your-app-secret
    

Architecture

src/
├── app/
│   ├── api/                        # API routes
│   │   ├── auth/                   # OAuth initiation + callback
│   │   ├── tasks/                  # Task CRUD + SSE streaming
│   │   ├── connectors/             # Connector config CRUD
│   │   ├── files/                  # File listing + serving
│   │   ├── gallery/                # Gallery items
│   │   ├── memory/                 # Memory CRUD
│   │   ├── skills/                 # Skills CRUD
│   │   ├── scheduled-tasks/        # Scheduler engine
│   │   ├── analytics/              # Usage analytics
│   │   ├── sessions/               # Session grouping
│   │   ├── audit/                  # Audit log
│   │   ├── replicate/              # Replicate model runner
│   │   ├── dreamscape/             # Luma Dream Machine
│   │   ├── huggingface/            # HuggingFace inference
│   │   ├── luma/                   # Luma Dream Machine API
│   │   ├── firefly/                # Nova creative suite APIs (image/video/audio/speech/models via Replicate + HuggingFace)
│   │   ├── generate/               # Generic model generation
│   │   ├── health/                 # Health check + service health (code-server, etc.)
│   │   ├── context/                # Context management
│   │   ├── usage/                  # Usage tracking
│   │   ├── hooks/                  # Webhook handlers
│   │   ├── channels/               # Channel config (Telegram, Discord, etc.)
│   │   ├── settings/               # Global settings CRUD
│   │   ├── social-auth/            # Social media OAuth
│   │   ├── whatsapp/               # WhatsApp Cloud API
│   │   └── voice/                  # Whisper transcription
│   └── computer/                   # All UI pages (25+ routes)
│       ├── firefly/                # Creative Suite (generate image/video/soundtrack/speech, edit, gallery); route `/computer/firefly`
│       ├── dreamscape/             # Dreamscape Video Studio (storyboards, AI Director, 17 modes)
│       ├── dispatch/               # Dispatch — messaging channel setup (Telegram, Discord, Slack, WhatsApp)
│       ├── playground/             # Multimedia Playground (Replicate + HuggingFace model workbench; optional)
│       ├── replicate/              # Replicate model explorer (Smart Run)
│       ├── 3d-studio/              # Embedded Blockbench 3D editor (optional)
│       ├── coding-companion/       # Embedded code-server / VS Code (optional)
│       ├── app-builder/            # Embedded bolt-diy app builder (optional)
│       ├── audio-studio/           # Embedded openDAW browser DAW (optional)
│       ├── image-studio/           # Redirects → /computer/firefly (legacy route)
│       ├── channels/               # Legacy channel config (superseded by Dispatch)
│       ├── whatsapp/               # WhatsApp Business API dashboard
│       └── ...                     # Tasks, Files, Documents, Connectors, Skills, etc.
├── lib/
│   ├── agent.ts                    # Core AI agent (~7,500 lines)
│   ├── db.ts                       # SQLite via better-sqlite3
│   ├── types.ts                    # TypeScript types (~440 lines)
│   ├── connectors-data.ts          # 190+ connector definitions
│   ├── skill-catalog.ts            # 270+ pre-built skills
│   ├── model-fallback.ts           # Multi-provider failover
│   ├── scheduler.ts                # Cron/interval scheduler
│   ├── replicate.ts                # Replicate API client
│   ├── huggingface.ts              # HuggingFace client
│   ├── social-media-browser.ts     # Social media automation
│   ├── personas.ts                 # Agent personality presets
│   ├── models.ts                   # Model configurations & free model list
│   ├── schemas.ts                  # Zod validation schemas
│   ├── constants.ts                # App-wide constants (NAV_ITEMS, API helpers)
│   ├── background-ops.ts           # Background task operations
│   ├── steel-client.ts             # Steel browser client
│   ├── whatsapp.ts                 # WhatsApp Cloud API client
│   ├── running-tasks.ts            # Global AbortController map for live tasks
│   ├── skill-converters.ts         # Skill format converters
│   ├── themes.ts                   # UI theme definitions
│   └── utils.ts                    # Shared utilities
├── components/
│   ├── sidebar.tsx                  # Navigation sidebar
│   ├── persistent-layout.tsx        # LRU keep-alive panel manager (up to 20 pages)
│   ├── bolt-persistent-iframe.tsx   # Persistent bolt.diy iframe (App Builder, port 5173)
│   ├── kilocode-persistent-iframe.tsx # Persistent code-server iframe (Coding Companion, port 3100)
│   ├── blender-persistent-iframe.tsx  # Persistent Blockbench iframe (3D Studio, port 3001)
│   ├── lmms-persistent-iframe.tsx     # Persistent openDAW iframe (port 8080)
│   ├── command-palette.tsx          # ⌘K command palette
│   ├── keyboard-shortcuts.tsx       # Global keyboard shortcuts
│   └── background-status.tsx        # Floating background task status indicator
└── tests/                           # Playwright E2E tests

Database (SQLite via better-sqlite3)

Table Purpose
tasks Task records with status, model, messages, priority
agent_steps Tool calls and results per task
messages Chat messages per task
task_files Files produced by tasks
file_folders Folder hierarchy for the file manager
sub_tasks Spawned sub-agent tasks
skills Saved skill definitions
gallery_items Generated media
connector_configs Service credentials (API keys, OAuth tokens)
memory Agent long-term memory (key-value + tags)
token_usage Per-call token and cost tracking
scheduled_tasks Cron/interval schedules
agent_learnings Patterns the agent learns over time
agent_analytics Every agent action logged (audit trail)
settings Global configuration
sessions Conversation session groupings
documents Saved text documents
skill_performance Skill usage and performance metrics

Tech Stack

Core

Layer Technology
Framework Next.js 15.1.4 (App Router)
Runtime Node.js 20+
Language TypeScript 5
UI Library React 19
CSS Tailwind CSS 3.4 (custom pplx-* color palette)
Components Radix UI (Dialog, Dropdown, ScrollArea, Select, Tabs, Tooltip)
Animation Framer Motion 11
Icons Lucide React
Class utilities clsx + tailwind-merge + class-variance-authority
Markdown react-markdown + remark-gfm
Validation Zod
IDs uuid
Database SQLite via better-sqlite3 (WAL mode)
File storage Local filesystem (./task-files/{taskId}/)
Testing Playwright E2E (@playwright/test)
Linting ESLint 8 + eslint-config-next
Process management pm2

AI Providers & Models

Provider Models SDK
Anthropic Claude Opus 4.6, Claude Sonnet 4.6, Claude Haiku 4.5 @anthropic-ai/sdk
OpenAI GPT-5.4, GPT-5.4 Mini, GPT-5.4 Nano openai
Google Gemini 2.5 Pro, Gemini 2.5 Flash, Gemini 2.5 Flash-Lite, Gemini 2.5 Nano @google/generative-ai
Perplexity Sonar, Sonar Pro, Sonar Reasoning Pro, Sonar Deep Research openai (baseURL override)
OpenRouter 200+ models incl. DeepSeek, Llama 3.3, Qwen, Gemma, Mistral, free-tier models openai (baseURL override)

Failover chain: Anthropic → OpenAI → Google → OpenRouter → Perplexity (exponential backoff: 2s, 5s, 15s)

Image Generation

Service Models / Notes
Replicate FLUX Schnell (default), FLUX 1.1 Pro, FLUX 2 Pro, SDXL, Ideogram v2 Turbo, Recraft v3, SD Inpainting, recraft-crisp-upscale, recraft-remove-background, BLIP captioning, face swap
OpenAI DALL-E 3
HuggingFace SDXL, Stable Diffusion v1.5, and 1000s of community models via live search

Video Generation

Service Models / Notes
Luma Dream Machine Ray 2, Ray Flash 2, Photon 1, Photon Flash 1 — text-to-video, image-to-video, extend, interpolate, reframe, modify (via Dreamscape)
Replicate Minimax Video-01-Live (default in Nova), Wan 2.1 (T2V + I2V 480p), Seedance 1 Lite, Kling via fofr, Hunyuan Video, Stable Video Diffusion
Runway ML Gen-3 Alpha Turbo (image-to-video, via agent tool — requires RUNWAY_API_KEY)

Audio Generation

Service Notes
Replicate (MusicGen) Music generation — stereo-melody-large, stereo-large, melody-large, large, medium, small variants
OpenAI TTS 9 voices: alloy, echo, fable, onyx, nova, shimmer, aria, roger, sarah
ElevenLabs eleven_multilingual_v2 — multilingual TTS with 6 voices (Rachel, Drew, Clyde, Paul, Domi, Bella), voice listing API
OpenAI Whisper (whisper-1) Speech-to-text transcription

Browser Automation & Computer Use

Tool Notes
Steel (steel.dev) Cloud Chrome sessions, CAPTCHA solving, anti-bot detection. Modes: STEEL_API_KEY (cloud) or STEEL_BASE_URL (self-hosted)
Playwright Local Chrome automation — web scraping, social media posting, form fill
Computer Use (macOS) screencapture + cliclick + AppleScript for desktop control
Anthropic Computer Use (native) computer_20251124 + bash_20250124 + text_editor_20250728 — native beta tools with image feedback, coordinate scaling, prompt caching
Cheerio Server-side HTML parsing

Messaging Integrations

Service API
WhatsApp Meta Business Cloud API (graph.facebook.com/v21.0)
Telegram Bot API (api.telegram.org)
Slack Web API (slack.com/api)
Discord REST API v10

Next.js Configuration Highlights

Setting Value
serverExternalPackages better-sqlite3, playwright (prevents bundling native modules)
Proxy rewrite /bolt/* localhost:5173 (bolt-diy App Builder)
Proxy rewrite /kilocode/* localhost:3100 (code-server proxy)
COOP + COEP headers on /computer/* same-origin + credentialless — enables SharedArrayBuffer for WebContainers
Image remote patterns All HTTPS origins allowed

Architectural Patterns

Pattern Description
SSE Streaming Tasks, app-builder, and document AI all stream via text/event-stream
Tool calling All providers use JSON-schema function calling; tools defined in agent.ts
Parallel tool execution Parallelizable tool calls batched with Promise.all
Persistent iframes 4 embedded apps (bolt-diy, code-server, Blockbench, openDAW) stay mounted off-screen via position:fixed; top:-200vh + inert to preserve state across route changes
LRU page cache PersistentLayout caches up to 20 React page trees so state survives navigation
Multi-provider failover Exponential backoff across 5 providers with automatic model switching
Context compaction Keeps first 2 + last 8 messages, summarizes the middle to stay within token budget
Semantic memory engine Importance scoring, memory classification, and compression across tasks
OAuth 2.0 Google, Microsoft, GitHub, Notion, Dropbox — tokens stored in SQLite
Bearer token middleware Optional OTTOMATE_AUTH_TOKEN on all /api/* routes
Sub-agents Spawns tool-scoped child agents for research, coding, creativity, data, etc.
Code sandboxing child_process.exec with security validation in sandbox-executor.ts
useSyncExternalStore Cross-page background operation state (no Redux)

Embedded Sub-Applications

App Port Tech Purpose
Next.js (main) 3000 Next.js 15, React 19 Main UI + all API routes
bolt-diy 5173 Remix + Vite + pnpm Full-stack AI app builder (WebContainers)
Blockbench 3001 Custom JS/Vite 3D model editor
openDAW 8080 npm/Vite Browser-based DAW / audio studio
code-server proxy 3100→3101 Node.js MJS VS Code in browser (Coding Companion)

Troubleshooting

Problem Solution
Tasks not running / "model not found" Ensure ANTHROPIC_API_KEY is set in .env.local and restart the dev server
OAuth "redirect_uri_mismatch" Add the exact URI (including http:// and port) in the provider's developer console
"GOOGLE_CLIENT_ID not configured" Add GOOGLE_CLIENT_ID and GOOGLE_CLIENT_SECRET to .env.local, restart
Google "Access blocked: request is invalid" Configure the OAuth consent screen and add test users at APIs & Services → OAuth consent screen
Code execution fails Python runs via python3 — ensure it's installed. macOS timeout is handled automatically
Files not showing Files are stored in ./task-files/<taskId>/ — ensure the directory is writable
Connector API calls failing Re-check the token (no extra spaces). OAuth tokens may need re-authorization
Database errors Delete perplexity-computer.db to reset (loses history). Schema is recreated on startup

Author


Topics

ai-agent autonomous-agent multi-model next-js self-hosted anthropic claude gpt-4 gemini openai perplexity openrouter ai-tools task-automation code-generation web-scraping image-generation video-generation audio-generation text-to-speech musicgen replicate luma-dream-machine adobe-firefly sqlite typescript tailwindcss playwright


Changelog

v2.2.0 — Dispatch, Optional Features, Dev Auth Bypass (May 2026)

  • Dispatch — new sidebar page replacing Channels; configure Telegram, Discord, Slack, and WhatsApp webhooks with inline setup guides, connection status, and test-message support
  • Optional features system — App Builder, Coding Companion, Audio Studio, 3D Studio, and Multimedia Playground are now hidden by default and toggled on in Settings › Features
  • Dev auth bypass — set DISABLE_AUTH=true in .env.local to skip JWT login in local development; NEXTAUTH_SECRET required only in production
  • Image Studio route kept alive (/computer/image-studio) as a redirect to Creative Suite to avoid broken agent links
  • Settings › Features — new feature-flags panel for optional embedded apps

v2.1.0 — Computer Control & Cross-Provider Fallback (Apr 2, 2026)

  • Computer Control — full desktop automation page with live screenshot viewer, activity log, app permissions, and configurable max steps (75 default, up to 200)
  • delegate_to_computer_control tool — any model (GPT-4, Gemini, etc.) can now delegate GUI tasks to Anthropic's native computer use system with full image feedback
  • Continue button — when Computer Control hits max steps, one click resumes the session
  • Settings integration — Computer Control max iterations now respects the saved setting from the Settings page
  • Build fixes — excluded subproject type checking, fixed Firefly/Replicate route typing

v2.0.0 — Computer Control, Image Studio, Handoff System (Mar 2026)

  • Computer Control page (initial), Image Studio, Handoff system, Firefly/Replicate upgrades
  • Major agent improvements — tool dedup, context budget management, parallelism, app awareness
  • Comprehensive wiki documentation (24 pages)

License

MIT © 2026 Dan Sheils

About

Ottomate — your universal AI agent workbench. Give it a goal — it plans, codes, browses, and delivers autonomously. 190+ connectors, advanced multimedia creative suite, browser automation, visual pipelines, 200+ skills, cron task scheduler, and more

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors