Ottomate

Self-hosted multi-model AI agent platform.
Describe a goal — Ottomate plans, codes, browses, generates media, builds apps, and orchestrates 190+ services autonomously.

Built with Next.js 15 · Claude · GPT · Gemini · Replicate · Luma · FLUX · HuggingFace

Created by Dan Sheils

Quick Start • Features • Screenshots • Pages • Models • Connectors • Architecture

What is Ottomate?

Ottomate is a self-hosted, multi-model AI agent platform built with Next.js 15.
Describe a goal in plain English — the agent plans multi-step workflows, writes and executes code, searches the web, talks to 190+ external services, generates images, video, and audio, and saves every artifact it produces.

It ships as a single npm install with zero external infrastructure. A SQLite database is created on first launch.

Key capabilities:

Autonomous task execution — plans, reasons, and iterates with tool use until the goal is met
Computer Control — full desktop automation via Anthropic's native computer use tools — screenshots, mouse, keyboard, bash, and file editing with visual feedback
Multi-model orchestration — Claude Opus/Sonnet, GPT-4o/4.1, Gemini 2.0, Perplexity Sonar, OpenRouter, with automatic failover
Code execution — runs Python, Node.js, and shell scripts in-process with captured output
Web browsing — searches (Brave, Perplexity, Serper, Tavily), scrapes pages, and automates browsers via Playwright
190+ connectors — Gmail, Slack, GitHub, Jira, Stripe, Notion, HubSpot, WhatsApp, and many more
Nova AI creative suite — universal SmartBar that auto-detects output type (image, video, audio, 3D, text), searches Replicate + HuggingFace models live, generates media, and offers post-generation actions (animate, upscale, make 3D). Includes image generation (FLUX, DALL-E 3 + 25 styles), video generation (Minimax, Kling, Wan, Seedance), AI soundtracks (MusicGen), speech (12 voices, 2 providers), image editing (7 AI operations with canvas masking), and a unified gallery
Dreamscape Video Studio — 17-mode AI creative studio built around Luma Dream Machine (Ray 2, Ray Flash 2, Photon 1, Photon Flash 1) with storyboards, 20 camera presets, continuity library, and an AI Director with command chain orchestration
AI media generation — Luma Dream Machine (video/image), Replicate (1000s of models), FLUX, DALL-E 3, ElevenLabs (voice), MusicGen (music)
Sub-agents — spawns specialized child agents for parallel work
Persistent memory — key-value store the agent reads/writes across tasks
Scheduled tasks — cron expressions, intervals, daily/weekly recurrence
Skills marketplace — 270+ pre-built skills across 10 categories
Voice input — dictate tasks via Whisper or browser speech recognition
Slash commands — /image, /research, /code, /email, /video, /scrape, and more

Quick Start

Prerequisites

Requirement	Notes
Node.js 18+	`node -v` to check
Anthropic API key	console.anthropic.com — this is the only required key

Install & run

git clone https://github.com/RhythrosaLabs/otto-mate-2.git
cd otto-mate-2
npm install

# Add your API key (minimum required)
echo "ANTHROPIC_API_KEY=sk-ant-..." >> .env.local

# Skip login in local dev (no credentials required)
echo "DISABLE_AUTH=true" >> .env.local

# Start Next.js only
npm run dev

# — or — start all services (bolt-diy App Builder + code-server Coding Companion)
npm run dev:all

Open http://localhost:3000 — the app loads directly when DISABLE_AUTH=true is set.

Optional keys unlock more models and features. See Environment Variables below.

Production auth: remove DISABLE_AUTH and set NEXTAUTH_SECRET (any random string, e.g. openssl rand -hex 32) to enable JWT-based login.

Features

Task Engine

The core loop: you describe a goal → the agent creates a plan → executes steps (tool calls, code, API requests, sub-agents) → streams results back in real time. Tasks support follow-up chat, file attachments, voice input, and slash commands.

Multi-Model Failover

The agent picks the best model automatically or you choose manually. If a provider is down or rate-limited, it fails over through the chain: Anthropic → OpenAI → Google → OpenRouter (DeepSeek) → Perplexity with exponential backoff.

Dreamscape Video Studio

A 17-mode AI creative studio built around Luma Dream Machine (Ray 2, Ray Flash 2, Photon 1, Photon Flash 1) with Replicate model support and MusicGen/Bark audio generation. Organize work into storyboards, artboards, and moodboards — each containing individual shots you can generate, extend, remix, and chain together.

Generation modes: text-to-video, image-to-video, extend, reverse-extend, interpolate, text-to-image, image reference, character reference (persistent identity across shots), style reference, modify video, modify video with keyframes, modify image, reframe (change aspect ratio of existing media), music generation (MusicGen), sound effects (Bark), voiceover, and lip-sync.

Production controls: 20 camera motion presets (pan, zoom, orbit, crane, dolly, tracking, handheld, static, arc, dutch tilt, whip pan — each with directional variants), 9 modify intensity levels (adhere → flex → reimagine, each with 3 sub-levels), 4 resolutions (540p → 4K), 7 aspect ratios (1:1, 3:4, 4:3, 9:16, 16:9, 9:21, 21:9), 5s/9s/10s durations, HDR output (EXR), loop toggle, batch generation (up to 4 variants), and a draft/hi-fi phase workflow with auto-upgrade from Flash to full models.

Auto-model intelligence: Recommends the optimal model per generation mode — e.g., Flash models for fast text-to-video drafts, full Ray 2 / Photon 1 for character consistency, style transfer, and modify operations.

AI Director: A built-in chat agent that interprets natural language into multi-step command chains with dependency ordering, continuity sheets (style anchors, character references, setting references), concept pill word-swapping for rapid prompt iteration, and creative query "Enhance" mode for prompt variations. Supports parallel step execution, smart retry with model downgrade on access errors, and progress tracking per step.

Additional features: Shot tagging, likes, and bookmarks for organization. Annotation overlay system (arrows, rectangles, text labels) that feeds spatial context into prompts. Board export/import as JSON. Per-shot media preview with mute controls. Search and filter across all shots. Film Player for sequential playback of completed shots. Ideas Gallery for browsing all completed work across boards. Continuity library for persisting style, character, and setting references across boards. Handoff system for sending media between studios.

Creative Suite (Nova)

A full-featured AI media generation hub accessible from the sidebar. The home view features a universal SmartBar that auto-detects output type (Image, Video, Audio, 3D, Text) from your prompt, searches Replicate + HuggingFace model libraries in real time, and renders results inline with quick actions (animate to video, upscale 4×, make 3D). Five dedicated creation tools and a gallery are accessible as sub-pages:

Generate Image: Text-to-image with FLUX Schnell (fast), FLUX 1.1 Pro, FLUX 2 Pro, DALL-E 3, plus live model search across Replicate and HuggingFace. Supports 8 aspect ratios, 25 style presets (cinematic, anime, watercolor, pixel art, etc.), 9 lighting presets, 8 camera angles, 4 content types, visual intensity slider, negative prompts, structure and style references with adjustable strength, seed control, and batch generation (1–4 images). Inline quick actions: edit, generative fill, animate to video, upscale, copy to clipboard, save to gallery.

Generate Video: Text-to-video and image-to-video generation via Replicate models (Minimax Video-01-Live default, plus Kling, Wan 2.1, Seedance, and any searched model). Supports 3 aspect ratios (16:9, 9:16, 1:1), 3 durations (4s, 5s, 10s), 6 camera motions (pan, zoom, orbit), motion intensity slider, and optional first-frame image upload. Add Soundtrack button links directly to soundtrack generation.

Generate Soundtrack: AI music generation via Replicate models (MusicGen default, plus searched models). 13 genres, 12 moods, 4 tempo ranges, 3 energy levels, 12 multi-select instruments, duration slider (5–30s), and optional video upload for scoring. Animated waveform visualizer with genre/mood/tempo/instrument tags on results.

Generate Speech: Professional AI voiceovers with 12 voices across 2 providers — OpenAI TTS (Alloy, Echo, Fable, Onyx, Nova, Shimmer) and ElevenLabs (Rachel, Drew, Clyde, Paul, Domi, Bella). Speed slider (0.5×–2.0×), 9 language options, character count with estimated duration, plus live model search for additional TTS models.

Edit Image: Full image editing suite with 7 AI operations — generative fill, remove object, replace background, generative expand, upscale (2×/4×), remove background, and prompt-to-edit. Canvas-based brush masking with adjustable size and line interpolation. Expand supports directional control (all/left/right/up/down) and target ratios. Accept/discard workflow with undo history.

Connectors Marketplace

190+ integrations across 28 categories: communication, storage, development, project management, CRM, data, AI (LLMs, image, video, audio, speech, code, design, search, 3D, vector), analytics, automation, browser, cloud, ecommerce, finance, marketing, music, productivity, security, and social media. OAuth flows for Google/Microsoft/GitHub/Notion/Dropbox; API key entry for everything else. 135+ connectors have a completely free tier.

Skills

Skills are reusable instruction sets (like Custom GPTs). Browse 270+ pre-built skills across 10 categories (code, writing, research, data, automation, architecture, infrastructure, security, testing, custom) in the marketplace or create your own.

Scheduling

Schedule any task to run automatically. Supports one-time (with optional delete-after-run), recurring intervals, daily, weekly, and full cron expressions. Enable/disable individual schedules and see next-run timestamps.

Memory

A persistent key-value store with tags that the agent reads and writes during task execution. Stored facts, preferences, and context carry over across tasks. You can search, add, tag, or delete entries manually.

Computer Control

Full desktop automation powered by Anthropic's native computer use tools (computer_20251124 + bash_20250124 + text_editor_20250728). Give it a task in plain English and watch Claude take screenshots, move the mouse, type, click, run shell commands, and edit files — all with live visual feedback streamed to your browser.

Capabilities: Screenshot capture with coordinate scaling, mouse control (click, double-click, right-click, drag, scroll), keyboard input (type, key press, hotkeys), bash shell execution, file viewing and editing, and zoom into screen regions.

Features: Live screenshot viewer with action dot overlay, real-time activity log, configurable max steps (default 75, up to 200), app permission system (approve/deny access per app), blocked app list, model selection (Sonnet/Opus), prompt caching for cost efficiency, extended thinking for better reasoning, and a Continue button to resume when max steps are reached.

Cross-provider fallback: Non-Anthropic models (GPT-4, Gemini, etc.) can delegate GUI tasks to Computer Control via the delegate_to_computer_control tool, giving every provider access to full desktop automation with image feedback.

Analytics & Audit

Analytics shows KPIs: total tasks, success rate, average duration, top tools (with per-tool success rates), model usage (with average cost per call), daily task volume (last 30 days), and recent errors. Audit Trail is a paginated log of every agent action — tool calls, model invocations, and task events — with duration, metadata, search, and filters (event type, tool, success status).

Tutorial

Screenshots

Home

The main prompt interface — type a goal, use slash commands, attach files, or pick from the prompt gallery.

Connectors

190+ integrations — connect Gmail, Slack, GitHub, Stripe, Notion, and more with OAuth or API keys.

Dreamscape Video Studio

17-mode AI creative studio with storyboards, 20 camera presets, continuity library, and the AI Director command chain system.

Nova — Generate

AI-powered creative hub with SmartBar (auto-detect output type, dual-provider model search), image/video/audio/speech generation, image editing, and unified gallery.

Skills Marketplace

270+ pre-built skills across 10 categories — or create your own.

More screenshots

Scheduled Tasks

Pages

Sidebar Navigation

The sidebar contains 21 items. Five are optional — hidden by default and toggled on in Settings › Features.

Page	Icon	Optional	Description
Ottomate	Monitor		Centered prompt input with 12 slash commands, voice input (Whisper + browser speech), file attachments, gallery suggestions, and category chips
Tasks	CheckSquare		List all tasks with status filters (running/completed/failed), search, sort, calendar view
Files	FolderOpen		Finder-style file browser with icon/list/gallery views, 50+ format support, folders, preview pane, and source filters
Connectors	Plug		Integration marketplace — connect 190+ services via OAuth or API key
Skills	Zap		Create, edit, and install reusable agent behaviors; 270+ in the marketplace
Documents	FileEdit		Create and manage text documents and spreadsheets with AI writing assistance, search, and relative timestamps
App Builder	Package	✓	Embedded bolt-diy full-stack AI app builder (WebContainers, port 5173)
Coding Companion	Terminal	✓	Embedded code-server (VS Code in browser, port 3100)
Video Studio	Clapperboard		17-mode AI creative studio — Luma Dream Machine (Ray 2, Ray Flash 2, Photon 1, Photon Flash 1) video/image/audio generation organized into storyboards with 20 camera presets, character identity persistence, 9 modify intensities, draft/hi-fi phases, AI Director with command chains, continuity library, annotations, and Film Player
Multimedia Playground	Layers	✓	Power-user workbench — dual-provider model search (Replicate + HuggingFace), multi-column comparison, quick actions per result type
Audio Studio	Music	✓	Embedded openDAW browser-based DAW (port 8080)
Creative Suite	Flame		AI creative hub with SmartBar (auto-detects output type, dual-provider model search), generate images (FLUX, DALL-E 3 + 25 styles), video (Minimax, Kling, Wan, Seedance), soundtracks (13 genres, 12 instruments), speech (12 voices, 2 providers), edit images (7 operations), and unified gallery. Route: `/computer/firefly`
3D Studio	Box	✓	Embedded Blockbench 3D model editor (port 3001)
Dispatch	Send		Messaging channel setup — configure Telegram, Discord, Slack, and WhatsApp webhooks; view connection status and send test messages
Memory	Brain		View, search, add, and delete agent memory entries
Scheduled	Clock		Cron-based task scheduler with interval, daily, weekly, and cron modes
Analytics	BarChart3		Performance dashboard — KPIs, tool popularity, model costs, error patterns
Audit Trail	Shield		Paginated log of every agent action with filters and metadata
Sessions	MessageSquare		Group related tasks into conversation sessions with shared context
Computer Control	MousePointer2		Full desktop automation — give Claude a task and watch it control your screen with live screenshots, mouse, keyboard, bash, and file editing. Configurable max steps, app permissions, and Continue button
Settings	Settings		Default model, token/cost budgets, themes, verbose mode, optional feature toggles, health check

Sub-Pages

These pages are reached from within their parent page:

Page	Parent	Description
Task Detail	Tasks	Live agent execution with Steps, Chat, Files, and Preview tabs — streaming output, token tracking, context budget
Document Editor	Documents	Rich text editor with AI writing assistance, auto-save, and title sync
Generate Image	Creative Suite	Text-to-image with FLUX, DALL-E 3, 25 style presets, aspect ratios, and live model search
Generate Video	Creative Suite	Text-to-video and image-to-video via Minimax, Kling, Wan 2.1, Seedance, and more
Generate Soundtrack	Creative Suite	AI music generation via MusicGen — 13 genres, 12 moods, 12 instruments, optional video upload
Generate Speech	Creative Suite	AI voiceovers — 12 voices across OpenAI TTS and ElevenLabs, speed/language control
Edit Image	Creative Suite	7 AI editing operations (generative fill, remove object, replace background, expand, upscale, remove BG, prompt-to-edit) with canvas brush masking

Additional Pages

These pages exist but are not shown in the sidebar:

Page	Description
Replicate	Replicate-only model explorer with Smart Run (auto-selects best model), quick category buttons, inline or task-based execution
Dream Machine	Direct Luma Dream Machine interface for quick video/image generation
Channels	Legacy channel configuration page (webhook URLs for Telegram, Discord, Slack, WhatsApp) — superseded by Dispatch in the sidebar
Image Studio	Redirects to Creative Suite (`/computer/firefly`) — kept alive to avoid broken links from agent system prompts
WhatsApp	WhatsApp Business API integration dashboard — connection status, send messages, webhook URL display
Onboarding	First-run setup wizard — health check, model selection, guided intro
Admin	Internal admin tools (not exposed in sidebar)

Models

Ottomate supports 18 model options across 5 providers, plus a free tier:

Model	ID	Provider	Best for
Auto (Recommended)	`auto`	—	Agent picks best model per sub-task automatically
Claude Opus 4.6	`claude-opus-4-6`	Anthropic	Complex reasoning, multi-step orchestration
Claude Sonnet 4.6	`claude-sonnet-4-6`	Anthropic	Balanced speed/quality, general tasks
Claude Haiku 4.5	`claude-haiku-4-5`	Anthropic	Ultra-fast, cheapest Claude
GPT-5.4	`gpt-5.4`	OpenAI	Strong reasoning, coding, vision, 1M context
GPT-5.4 Mini	`gpt-5.4-mini`	OpenAI	Fast and balanced, great for general tasks
GPT-5.4 Nano	`gpt-5.4-nano`	OpenAI	Ultra-cheap for simple tasks
Gemini 2.5 Pro	`gemini-2.5-pro`	Google	Deep research, long documents, advanced reasoning
Gemini 2.5 Flash	`gemini-2.5-flash`	Google	Best price-performance, fast and capable
Gemini 2.5 Flash-Lite	`gemini-2.5-flash-lite`	Google	Ultra-cheap for simple queries
Gemini 2.5 Nano	`gemini-2.5-nano`	Google	Smallest model, edge tasks
Sonar	`sonar`	Perplexity	Real-time web-augmented search
Sonar Pro	`sonar-pro`	Perplexity	Deeper web-augmented analysis
Sonar Reasoning Pro	`sonar-reasoning-pro`	Perplexity	Multi-step reasoning + web search
Sonar Deep Research	`sonar-deep-research`	Perplexity	Long-form web research
OpenRouter (Any Model)	`openrouter`	OpenRouter	Route to 200+ models (DeepSeek, Llama, Mistral, Qwen, etc.)
Free (OpenRouter)	`free`	OpenRouter	Zero-cost inference via Nemotron, Qwen, Llama, Gemma & more

Set auto to let the agent pick the best model per task. Legacy model IDs (e.g. gpt-4o, gemini-1.5-pro) are automatically remapped to their current equivalents.

Environment Variables

Create a .env.local file in the project root:

Variable	Required	Description
`ANTHROPIC_API_KEY`	Yes	Claude models — console.anthropic.com
`DISABLE_AUTH`	No	Set to `true` to skip login in local dev (no JWT required)
`NEXTAUTH_SECRET`	No (prod)	JWT signing secret for session cookies — required when `DISABLE_AUTH` is unset. Generate with `openssl rand -hex 32`
`OPENAI_API_KEY`	No	GPT-4o, GPT-4.1, DALL-E 3
`GOOGLE_AI_API_KEY`	No	Gemini 2.5 Pro/Flash/Flash-Lite/Nano
`GROQ_API_KEY`	No	Llama / Mixtral via Groq
`OPENROUTER_API_KEY`	No	Access 200+ models including free tier via OpenRouter
`PERPLEXITY_API_KEY`	No	Real-time web search via Perplexity Sonar
`BRAVE_SEARCH_API_KEY`	No	Web search via Brave
`SERPER_API_KEY`	No	Google search via Serper
`TAVILY_API_KEY`	No	AI-powered web search
`REPLICATE_API_TOKEN`	No	Run 1000s of ML models on Replicate
`LUMA_API_KEY`	No	Luma Dream Machine video/image generation
`ELEVENLABS_API_KEY`	No	Text-to-speech via ElevenLabs
`DATABASE_PATH`	No	SQLite DB path (default: `./perplexity-computer.db`)
`APP_URL`	No	Public URL (default: `http://localhost:3000`)
`GOOGLE_CLIENT_ID` / `SECRET`	No	OAuth for Gmail, Drive, Sheets, Docs, Calendar
`MICROSOFT_CLIENT_ID` / `SECRET`	No	OAuth for Outlook, OneDrive, Teams
`GITHUB_CLIENT_ID` / `SECRET`	No	GitHub OAuth
`NOTION_CLIENT_ID` / `SECRET`	No	Notion OAuth
`DROPBOX_CLIENT_ID` / `SECRET`	No	Dropbox OAuth

Connectors

Navigate to Connectors in the sidebar. Click Connect on any service to begin setup.

OAuth connectors — click "Sign in with [Provider]" and authorize in the popup
API key connectors — paste your token and click Connect
Free badge = no credit card required

Free-tier connectors (135+)

Connector	Auth	Notes
Gmail / Google Calendar / Drive / Sheets / Docs	OAuth	Free with Google account
Outlook / OneDrive / Microsoft Calendar	OAuth	Free with Microsoft account
Slack	API key	Free workspace available
Discord	API key	Free bot token
Telegram	API key	Free via BotFather
Dropbox	OAuth	2 GB free
Box	API key	10 GB free
GitHub	OAuth	Free public + private repos
GitLab	API key	Free on GitLab.com
Vercel	API key	Free Hobby plan
Sentry	API key	Free Developer plan
Linear	API key	Free personal plan
Jira / Confluence	API key	Free up to 10 users
Asana	API key	Free up to 10 teammates
ClickUp	API key	Free Forever plan
Monday.com	API key	Free 2 seats
HubSpot	API key	Free CRM
Notion	OAuth	Free personal plan
Airtable	API key	Free unlimited bases
Supabase	API key	Free 500 MB
PostgreSQL	Conn. string	Self-hosted or cloud free tier
Figma	API key	Free Starter plan
Calendly	API key	Free Basic plan
WordPress / Webflow / Wix	API key	Free tiers available
Hugging Face	API key	Free (rate limited)
ElevenLabs	API key	10k chars/month free
Stripe	API key	Free test mode
Mailchimp / Klaviyo	API key	Free up to 500 contacts

Communication connector setup

Gmail + Google Calendar

Auth: OAuth (Google) | Free: Yes

Click Sign in with Google in the connector modal
For your own OAuth app: go to Google Cloud Console → Credentials, create an OAuth 2.0 Client ID, add http://localhost:3000/api/auth/callback/google as redirect URI
Enable APIs: Gmail, Calendar, Drive, Sheets, Docs
Add GOOGLE_CLIENT_ID and GOOGLE_CLIENT_SECRET to .env.local

Outlook + Microsoft Calendar

Auth: OAuth (Microsoft) | Free: Yes

Click Sign in with Microsoft in the connector modal
For your own OAuth app: Azure Portal → App registrations, add redirect URI http://localhost:3000/api/auth/callback/microsoft
Add MICROSOFT_CLIENT_ID and MICROSOFT_CLIENT_SECRET to .env.local

Slack

api.slack.com/apps → Create New App → add Bot Token Scopes: chat:write, channels:read, channels:history
Install to Workspace → copy Bot User OAuth Token (xoxb-...)

Discord

discord.com/developers/applications → New Application → Bot → Reset Token
OAuth2 URL Generator: bot scope + Send Messages permission → invite bot

Telegram

Message @BotFather → /newbot → copy the token

Zoom

Zoom Marketplace → Server-to-Server OAuth → generate token

Twilio

twilio.com → Console → copy Account SID + Auth Token

Storage connector setup

Google Drive / Sheets / Docs

Connected automatically when you sign in with Google OAuth.

OneDrive

Connected automatically when you sign in with Microsoft OAuth.

Dropbox

dropbox.com/developers/apps → Create app → Scoped access
Add redirect URI: http://localhost:3000/api/auth/callback/dropbox
Add DROPBOX_CLIENT_ID and DROPBOX_CLIENT_SECRET to .env.local

Box

app.box.com/developers/console → Create New App → generate Developer Token

Development connector setup

GitHub

Option A (OAuth): github.com/settings/developers → OAuth Apps → callback URL http://localhost:3000/api/auth/callback/github → add to .env.local

Option B (PAT): github.com/settings/tokens/new → scopes repo, user → paste in connector modal

Vercel

vercel.com → Account Settings → Tokens → Create Token

GitLab

gitlab.com → User Settings → Access Tokens → scopes api, read_repository, write_repository

Sentry

sentry.io → Settings → Auth Tokens → scopes project:read, event:read, event:write

Datadog

app.datadoghq.com → API Keys + Application Keys

Project management connector setup

Linear

linear.app/settings/api → Create key

Jira

id.atlassian.com/manage-profile/security/api-tokens → Create API token → enter as email:token@domain

Asana

app.asana.com/0/my-apps → Create new token

ClickUp

Avatar → Settings → Apps → Generate API Key

Monday.com

Avatar → Developers → My Access Tokens

Confluence

Uses the same Atlassian API token as Jira.

CRM connector setup

HubSpot

Settings → Integrations → Private Apps → Create → select CRM scopes → copy access token

Salesforce

Developer Edition (free) → Setup → Connected App → copy Access Token

Zendesk

Admin Center → APIs → Zendesk API → enable Token Access → create API token

Data connector setup

Airtable

airtable.com/create/tokens → Create token → scopes data.records:read, data.records:write

Supabase

supabase.com → Project Settings → API → copy service_role key

PostgreSQL

Paste connection string: postgresql://user:pass@host:5432/dbname (works with Neon, Railway, Render, or self-hosted)

Snowflake

Enter accountidentifier:username:password

Productivity connector setup

Notion

OAuth: notion.so/my-integrations → New integration → enable Public → redirect URI http://localhost:3000/api/auth/callback/notion

Token: Copy Internal Integration Token (secret_...) → share pages with the integration

Figma

Account Settings → Personal access tokens → create token

Calendly

calendly.com/integrations/api_webhooks → Generate New Token

WordPress.com

Me → Security → enable 2FA → Application Passwords → enter as username:apppassword

Webflow

Project Settings → Integrations → API Access → Generate API Token

Wix

manage.wix.com/account/api-keys → Generate API Key

AI service connector setup

OpenAI

platform.openai.com/api-keys → Create new secret key (sk-...)

Hugging Face

huggingface.co/settings/tokens → Read token (hf_...)

ElevenLabs

elevenlabs.io → Profile → API Key

Replicate

replicate.com/account/api-tokens → copy token (r8_...)

Finance & marketing connector setup

Stripe

dashboard.stripe.com → Developers → API keys → copy Secret key (sk_test_... or sk_live_...)

Shopify

Admin → Settings → Apps → Develop apps → configure scopes → copy Admin API access token

Mailchimp

Account → Extras → API keys → Create A Key (includes datacenter: abc123-us1)

Klaviyo

Settings → API Keys → Create Private API Key

OAuth setup (Google, Microsoft, GitHub, Notion, Dropbox)

Google OAuth

Connects Gmail, Drive, Sheets, Docs, and Calendar in one click.

Create a project at console.cloud.google.com
Enable APIs: Gmail, Calendar, Drive, Sheets, Docs
APIs & Services → Credentials → OAuth client ID (Web) → redirect URI http://localhost:3000/api/auth/callback/google
Configure consent screen with test users

Add to .env.local:

GOOGLE_CLIENT_ID=your-client-id.apps.googleusercontent.com
GOOGLE_CLIENT_SECRET=GOCSPX-...

Microsoft OAuth

Connects Outlook, OneDrive, Teams, and SharePoint.

Azure Portal → App registrations → New registration
Redirect URI: http://localhost:3000/api/auth/callback/microsoft
API permissions → Microsoft Graph: Mail.ReadWrite, Mail.Send, Calendars.ReadWrite, Files.ReadWrite, offline_access
Certificates & secrets → New client secret

Add to .env.local:

MICROSOFT_CLIENT_ID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
MICROSOFT_CLIENT_SECRET=your-secret-value

GitHub OAuth

github.com/settings/developers → OAuth Apps → callback URL http://localhost:3000/api/auth/callback/github

Add to .env.local:

GITHUB_CLIENT_ID=your-client-id
GITHUB_CLIENT_SECRET=your-client-secret

Notion OAuth

notion.so/my-integrations → New integration → enable Public → redirect URI http://localhost:3000/api/auth/callback/notion

Add to .env.local:

NOTION_CLIENT_ID=your-client-id
NOTION_CLIENT_SECRET=your-client-secret

Dropbox OAuth

dropbox.com/developers/apps → Create app → redirect URI http://localhost:3000/api/auth/callback/dropbox

Add to .env.local:

DROPBOX_CLIENT_ID=your-app-key
DROPBOX_CLIENT_SECRET=your-app-secret

Architecture

src/
├── app/
│   ├── api/                        # API routes
│   │   ├── auth/                   # OAuth initiation + callback
│   │   ├── tasks/                  # Task CRUD + SSE streaming
│   │   ├── connectors/             # Connector config CRUD
│   │   ├── files/                  # File listing + serving
│   │   ├── gallery/                # Gallery items
│   │   ├── memory/                 # Memory CRUD
│   │   ├── skills/                 # Skills CRUD
│   │   ├── scheduled-tasks/        # Scheduler engine
│   │   ├── analytics/              # Usage analytics
│   │   ├── sessions/               # Session grouping
│   │   ├── audit/                  # Audit log
│   │   ├── replicate/              # Replicate model runner
│   │   ├── dreamscape/             # Luma Dream Machine
│   │   ├── huggingface/            # HuggingFace inference
│   │   ├── luma/                   # Luma Dream Machine API
│   │   ├── firefly/                # Nova creative suite APIs (image/video/audio/speech/models via Replicate + HuggingFace)
│   │   ├── generate/               # Generic model generation
│   │   ├── health/                 # Health check + service health (code-server, etc.)
│   │   ├── context/                # Context management
│   │   ├── usage/                  # Usage tracking
│   │   ├── hooks/                  # Webhook handlers
│   │   ├── channels/               # Channel config (Telegram, Discord, etc.)
│   │   ├── settings/               # Global settings CRUD
│   │   ├── social-auth/            # Social media OAuth
│   │   ├── whatsapp/               # WhatsApp Cloud API
│   │   └── voice/                  # Whisper transcription
│   └── computer/                   # All UI pages (25+ routes)
│       ├── firefly/                # Creative Suite (generate image/video/soundtrack/speech, edit, gallery); route `/computer/firefly`
│       ├── dreamscape/             # Dreamscape Video Studio (storyboards, AI Director, 17 modes)
│       ├── dispatch/               # Dispatch — messaging channel setup (Telegram, Discord, Slack, WhatsApp)
│       ├── playground/             # Multimedia Playground (Replicate + HuggingFace model workbench; optional)
│       ├── replicate/              # Replicate model explorer (Smart Run)
│       ├── 3d-studio/              # Embedded Blockbench 3D editor (optional)
│       ├── coding-companion/       # Embedded code-server / VS Code (optional)
│       ├── app-builder/            # Embedded bolt-diy app builder (optional)
│       ├── audio-studio/           # Embedded openDAW browser DAW (optional)
│       ├── image-studio/           # Redirects → /computer/firefly (legacy route)
│       ├── channels/               # Legacy channel config (superseded by Dispatch)
│       ├── whatsapp/               # WhatsApp Business API dashboard
│       └── ...                     # Tasks, Files, Documents, Connectors, Skills, etc.
├── lib/
│   ├── agent.ts                    # Core AI agent (~7,500 lines)
│   ├── db.ts                       # SQLite via better-sqlite3
│   ├── types.ts                    # TypeScript types (~440 lines)
│   ├── connectors-data.ts          # 190+ connector definitions
│   ├── skill-catalog.ts            # 270+ pre-built skills
│   ├── model-fallback.ts           # Multi-provider failover
│   ├── scheduler.ts                # Cron/interval scheduler
│   ├── replicate.ts                # Replicate API client
│   ├── huggingface.ts              # HuggingFace client
│   ├── social-media-browser.ts     # Social media automation
│   ├── personas.ts                 # Agent personality presets
│   ├── models.ts                   # Model configurations & free model list
│   ├── schemas.ts                  # Zod validation schemas
│   ├── constants.ts                # App-wide constants (NAV_ITEMS, API helpers)
│   ├── background-ops.ts           # Background task operations
│   ├── steel-client.ts             # Steel browser client
│   ├── whatsapp.ts                 # WhatsApp Cloud API client
│   ├── running-tasks.ts            # Global AbortController map for live tasks
│   ├── skill-converters.ts         # Skill format converters
│   ├── themes.ts                   # UI theme definitions
│   └── utils.ts                    # Shared utilities
├── components/
│   ├── sidebar.tsx                  # Navigation sidebar
│   ├── persistent-layout.tsx        # LRU keep-alive panel manager (up to 20 pages)
│   ├── bolt-persistent-iframe.tsx   # Persistent bolt.diy iframe (App Builder, port 5173)
│   ├── kilocode-persistent-iframe.tsx # Persistent code-server iframe (Coding Companion, port 3100)
│   ├── blender-persistent-iframe.tsx  # Persistent Blockbench iframe (3D Studio, port 3001)
│   ├── lmms-persistent-iframe.tsx     # Persistent openDAW iframe (port 8080)
│   ├── command-palette.tsx          # ⌘K command palette
│   ├── keyboard-shortcuts.tsx       # Global keyboard shortcuts
│   └── background-status.tsx        # Floating background task status indicator
└── tests/                           # Playwright E2E tests

Database (SQLite via better-sqlite3)

Table	Purpose
`tasks`	Task records with status, model, messages, priority
`agent_steps`	Tool calls and results per task
`messages`	Chat messages per task
`task_files`	Files produced by tasks
`file_folders`	Folder hierarchy for the file manager
`sub_tasks`	Spawned sub-agent tasks
`skills`	Saved skill definitions
`gallery_items`	Generated media
`connector_configs`	Service credentials (API keys, OAuth tokens)
`memory`	Agent long-term memory (key-value + tags)
`token_usage`	Per-call token and cost tracking
`scheduled_tasks`	Cron/interval schedules
`agent_learnings`	Patterns the agent learns over time
`agent_analytics`	Every agent action logged (audit trail)
`settings`	Global configuration
`sessions`	Conversation session groupings
`documents`	Saved text documents
`skill_performance`	Skill usage and performance metrics

Tech Stack

Core

Layer	Technology
Framework	Next.js 15.1.4 (App Router)
Runtime	Node.js 20+
Language	TypeScript 5
UI Library	React 19
CSS	Tailwind CSS 3.4 (custom `pplx-*` color palette)
Components	Radix UI (Dialog, Dropdown, ScrollArea, Select, Tabs, Tooltip)
Animation	Framer Motion 11
Icons	Lucide React
Class utilities	clsx + tailwind-merge + class-variance-authority
Markdown	react-markdown + remark-gfm
Validation	Zod
IDs	uuid
Database	SQLite via `better-sqlite3` (WAL mode)
File storage	Local filesystem (`./task-files/{taskId}/`)
Testing	Playwright E2E (`@playwright/test`)
Linting	ESLint 8 + `eslint-config-next`
Process management	pm2

AI Providers & Models

Provider	Models	SDK
Anthropic	Claude Opus 4.6, Claude Sonnet 4.6, Claude Haiku 4.5	`@anthropic-ai/sdk`
OpenAI	GPT-5.4, GPT-5.4 Mini, GPT-5.4 Nano	`openai`
Google	Gemini 2.5 Pro, Gemini 2.5 Flash, Gemini 2.5 Flash-Lite, Gemini 2.5 Nano	`@google/generative-ai`
Perplexity	Sonar, Sonar Pro, Sonar Reasoning Pro, Sonar Deep Research	`openai` (baseURL override)
OpenRouter	200+ models incl. DeepSeek, Llama 3.3, Qwen, Gemma, Mistral, free-tier models	`openai` (baseURL override)

Failover chain: Anthropic → OpenAI → Google → OpenRouter → Perplexity (exponential backoff: 2s, 5s, 15s)

Image Generation

Service	Models / Notes
Replicate	FLUX Schnell (default), FLUX 1.1 Pro, FLUX 2 Pro, SDXL, Ideogram v2 Turbo, Recraft v3, SD Inpainting, recraft-crisp-upscale, recraft-remove-background, BLIP captioning, face swap
OpenAI	DALL-E 3
HuggingFace	SDXL, Stable Diffusion v1.5, and 1000s of community models via live search

Video Generation

Service	Models / Notes
Luma Dream Machine	Ray 2, Ray Flash 2, Photon 1, Photon Flash 1 — text-to-video, image-to-video, extend, interpolate, reframe, modify (via Dreamscape)
Replicate	Minimax Video-01-Live (default in Nova), Wan 2.1 (T2V + I2V 480p), Seedance 1 Lite, Kling via fofr, Hunyuan Video, Stable Video Diffusion
Runway ML	Gen-3 Alpha Turbo (image-to-video, via agent tool — requires RUNWAY_API_KEY)

Audio Generation

Service	Notes
Replicate (MusicGen)	Music generation — stereo-melody-large, stereo-large, melody-large, large, medium, small variants
OpenAI TTS	9 voices: alloy, echo, fable, onyx, nova, shimmer, aria, roger, sarah
ElevenLabs	`eleven_multilingual_v2` — multilingual TTS with 6 voices (Rachel, Drew, Clyde, Paul, Domi, Bella), voice listing API
OpenAI Whisper (`whisper-1`)	Speech-to-text transcription

Browser Automation & Computer Use

Tool	Notes
Steel (steel.dev)	Cloud Chrome sessions, CAPTCHA solving, anti-bot detection. Modes: `STEEL_API_KEY` (cloud) or `STEEL_BASE_URL` (self-hosted)
Playwright	Local Chrome automation — web scraping, social media posting, form fill
Computer Use (macOS)	`screencapture` + `cliclick` + AppleScript for desktop control
Anthropic Computer Use (native)	`computer_20251124` + `bash_20250124` + `text_editor_20250728` — native beta tools with image feedback, coordinate scaling, prompt caching
Cheerio	Server-side HTML parsing

Messaging Integrations

Service	API
WhatsApp	Meta Business Cloud API (`graph.facebook.com/v21.0`)
Telegram	Bot API (`api.telegram.org`)
Slack	Web API (`slack.com/api`)
Discord	REST API v10

Next.js Configuration Highlights

Setting	Value
`serverExternalPackages`	`better-sqlite3`, `playwright` (prevents bundling native modules)
Proxy rewrite `/bolt/*`	→ `localhost:5173` (bolt-diy App Builder)
Proxy rewrite `/kilocode/*`	→ `localhost:3100` (code-server proxy)
COOP + COEP headers on `/computer/*`	`same-origin` + `credentialless` — enables SharedArrayBuffer for WebContainers
Image remote patterns	All HTTPS origins allowed

Architectural Patterns

Pattern	Description
SSE Streaming	Tasks, app-builder, and document AI all stream via `text/event-stream`
Tool calling	All providers use JSON-schema function calling; tools defined in `agent.ts`
Parallel tool execution	Parallelizable tool calls batched with `Promise.all`
Persistent iframes	4 embedded apps (bolt-diy, code-server, Blockbench, openDAW) stay mounted off-screen via `position:fixed; top:-200vh` + `inert` to preserve state across route changes
LRU page cache	`PersistentLayout` caches up to 20 React page trees so state survives navigation
Multi-provider failover	Exponential backoff across 5 providers with automatic model switching
Context compaction	Keeps first 2 + last 8 messages, summarizes the middle to stay within token budget
Semantic memory engine	Importance scoring, memory classification, and compression across tasks
OAuth 2.0	Google, Microsoft, GitHub, Notion, Dropbox — tokens stored in SQLite
Bearer token middleware	Optional `OTTOMATE_AUTH_TOKEN` on all `/api/*` routes
Sub-agents	Spawns tool-scoped child agents for research, coding, creativity, data, etc.
Code sandboxing	`child_process.exec` with security validation in `sandbox-executor.ts`
`useSyncExternalStore`	Cross-page background operation state (no Redux)

Embedded Sub-Applications

App	Port	Tech	Purpose
Next.js (main)	3000	Next.js 15, React 19	Main UI + all API routes
bolt-diy	5173	Remix + Vite + pnpm	Full-stack AI app builder (WebContainers)
Blockbench	3001	Custom JS/Vite	3D model editor
openDAW	8080	npm/Vite	Browser-based DAW / audio studio
code-server proxy	3100→3101	Node.js MJS	VS Code in browser (Coding Companion)

Troubleshooting

Problem	Solution
Tasks not running / "model not found"	Ensure `ANTHROPIC_API_KEY` is set in `.env.local` and restart the dev server
OAuth "redirect_uri_mismatch"	Add the exact URI (including `http://` and port) in the provider's developer console
"GOOGLE_CLIENT_ID not configured"	Add `GOOGLE_CLIENT_ID` and `GOOGLE_CLIENT_SECRET` to `.env.local`, restart
Google "Access blocked: request is invalid"	Configure the OAuth consent screen and add test users at APIs & Services → OAuth consent screen
Code execution fails	Python runs via `python3` — ensure it's installed. macOS `timeout` is handled automatically
Files not showing	Files are stored in `./task-files/<taskId>/` — ensure the directory is writable
Connector API calls failing	Re-check the token (no extra spaces). OAuth tokens may need re-authorization
Database errors	Delete `perplexity-computer.db` to reset (loses history). Schema is recreated on startup

Author

GitHub: @RhythrosaLabs
Portfolio: danielsheils.myportfolio.com

Topics

ai-agent autonomous-agent multi-model next-js self-hosted anthropic claude gpt-4 gemini openai perplexity openrouter ai-tools task-automation code-generation web-scraping image-generation video-generation audio-generation text-to-speech musicgen replicate luma-dream-machine adobe-firefly sqlite typescript tailwindcss playwright

Changelog

v2.2.0 — Dispatch, Optional Features, Dev Auth Bypass (May 2026)

Dispatch — new sidebar page replacing Channels; configure Telegram, Discord, Slack, and WhatsApp webhooks with inline setup guides, connection status, and test-message support
Optional features system — App Builder, Coding Companion, Audio Studio, 3D Studio, and Multimedia Playground are now hidden by default and toggled on in Settings › Features
Dev auth bypass — set DISABLE_AUTH=true in .env.local to skip JWT login in local development; NEXTAUTH_SECRET required only in production
Image Studio route kept alive (/computer/image-studio) as a redirect to Creative Suite to avoid broken agent links
Settings › Features — new feature-flags panel for optional embedded apps

v2.1.0 — Computer Control & Cross-Provider Fallback (Apr 2, 2026)

Computer Control — full desktop automation page with live screenshot viewer, activity log, app permissions, and configurable max steps (75 default, up to 200)
delegate_to_computer_control tool — any model (GPT-4, Gemini, etc.) can now delegate GUI tasks to Anthropic's native computer use system with full image feedback
Continue button — when Computer Control hits max steps, one click resumes the session
Settings integration — Computer Control max iterations now respects the saved setting from the Settings page
Build fixes — excluded subproject type checking, fixed Firefly/Replicate route typing

v2.0.0 — Computer Control, Image Studio, Handoff System (Mar 2026)

Computer Control page (initial), Image Studio, Handoff system, Firefly/Replicate upgrades
Major agent improvements — tool dedup, context budget management, parallelism, app awareness
Comprehensive wiki documentation (24 pages)

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
docs		docs
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.env.local.example		.env.local.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
fly.toml		fly.toml
next-env.d.ts		next-env.d.ts
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
playwright.config.ts		playwright.config.ts
pm2.config.cjs		pm2.config.cjs
postcss.config.js		postcss.config.js
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json
vercel.json		vercel.json

Folders and files

Latest commit

History

Repository files navigation

Ottomate

What is Ottomate?

Quick Start

Prerequisites

Install & run

Features

Task Engine

Multi-Model Failover

Dreamscape Video Studio

Creative Suite (Nova)

Connectors Marketplace

Skills

Scheduling

Memory

Computer Control

Analytics & Audit

Tutorial

Screenshots

Home

Connectors

Dreamscape Video Studio

Nova — Generate

Skills Marketplace

Scheduled Tasks

Pages

Sidebar Navigation

Sub-Pages

Additional Pages

Models

Environment Variables

Connectors

Free-tier connectors (135+)

Gmail + Google Calendar

Outlook + Microsoft Calendar

Slack

Discord

Telegram

Zoom

Twilio

Google Drive / Sheets / Docs

OneDrive

Dropbox

Box

GitHub

Vercel

GitLab

Sentry

Datadog

Linear

Jira

Asana

ClickUp

Monday.com

Confluence

HubSpot

Salesforce

Zendesk

Airtable

Supabase

PostgreSQL

Snowflake

Notion

Figma

Calendly

WordPress.com

Webflow

Wix

OpenAI

Hugging Face

ElevenLabs

Replicate

Stripe

Shopify

Mailchimp

Klaviyo

Google OAuth

Packages