Skip to content

johnbrodowski/ApexComputerUse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

123 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

robot_07_59_17 PM

ApexComputerUse

Give AI agents control of any Windows app — no vision model, no screenshots, no cloud.

.NET Platform License

ApexComputerUse reads the Windows accessibility tree (the same data the OS exposes to screen readers) and serves it over a plain HTTP REST API. Any AI agent — in any language, on any machine — can find, inspect, and control any desktop app or browser by making simple HTTP requests. No screenshots. No pixel coordinates. No cloud dependency.

5–20 tokens per action instead of 1,000–3,500 for a screenshot. A full browser page in onscreen-only mode is ~126 elements of compact JSON — less than the cost of a single screenshot of the same page.

Works on Win32, WPF, UWP, WinForms, and browsers. Controlled via HTTP REST, named pipes, cmd.exe, and Telegram.


Screenshots

Main Desktop UI

main_ui

Interactive Web Console (GET /)

web_console

Scene Editor — WinForms

scene_editor

Scene Editor — Browser (GET /editor)

scene_editor_web

AI-Generated Drawing!

space_scene

UI Map Overlay

image

Quickstart

Requirements: Windows 10/11 · .NET 10 SDK

git clone https://github.com/your-org/ApexComputerUse
cd ApexComputerUse
dotnet build
dotnet run --project ApexComputerUse
  1. The app opens. The HTTP server starts automatically on port 8080 (HttpAutoStart=true in appsettings.json).
  2. By default it binds to localhost only (HttpBindAll=false), so no first-run UAC network setup is required.
  3. If you enable HttpBindAll=true, the app prompts once (UAC) to configure URL ACL + Windows Firewall for the selected port.
  4. The API key is shown in the Remote Control tab → API Key field — copy it.
  5. Open http://localhost:8080/?apiKey=<key> in a browser — the interactive console appears (the browser console pre-fills the key).
  6. Pick any open window from the Windows panel on the left.
  7. Browse its element tree, click an action button, see the result.

Chat tab: switch to the Chat tab and click Load Chat to open the streaming AI chat UI directly inside the app. Configure provider and API key in the settings group above, then chat away.

Clients tab: use the Clients tab to register other machines running ApexComputerUse. Add each machine's name, IP/host, port, and API key, then click Test to confirm the connection is live. This registry lets you — or an AI agent — track and target multiple Apex endpoints from a single instance.

Or go straight to curl (replace <key> with the API key from the Remote Control tab):

# Confirm the server is up
curl -H "X-Api-Key: <key>" http://localhost:8080/ping

# Find Notepad and read its text editor content (two calls)
curl -H "X-Api-Key: <key>" -X POST http://localhost:8080/find \
     -H "Content-Type: application/json" -d '{"window":"Notepad"}'
curl -H "X-Api-Key: <key>" http://localhost:8080/exec?action=gettext

# Or combine both in one call
curl -H "X-Api-Key: <key>" -X POST http://localhost:8080/find-exec \
     -H "Content-Type: application/json" -d '{"window":"Notepad","action":"gettext"}'

OCR: requires eng.traineddata — download from github.com/tesseract-ocr/tessdata and place it in tessdata\ next to the executable.

AI Vision: requires a GGUF vision model and projector — see Usage — AI.


Why ApexComputerUse

The problem with screenshot-based automation

Most AI computer-use tools — Claude Computer Use, OpenAI CUA, UI-TARS, OmniParser — work by sending a screenshot to a vision model and guessing pixel coordinates to click. This approach has compounding costs:

  • Screenshot token costs scale with resolution and vary by provider. A 1024×768 image runs ~765 tokens (OpenAI) to ~1,050 tokens (Anthropic). At 1920×1080 that rises to ~1,840 tokens (Anthropic) or ~2,125 tokens (OpenAI). At 2048×2048, OpenAI charges ~2,765 tokens and Anthropic ~2,500–3,500 tokens. Gemini is the exception, typically staying under 1,000 tokens even for ~4K images. And this cost is paid on every single step.
  • Screenshots stack in conversation history — a 20-step task accumulates 20+ images in context.
  • Coordinate grounding is fragile: it breaks on window resize, DPI scaling, and multi-monitor setups.
  • Published benchmarks confirm the accuracy ceiling: even specialist 7B vision models score only 18.9% on real professional UIs (ScreenSpot-Pro, 2025). GPT-4o scores below 2% on unscaled professional screens.

The structured-tree approach

ApexComputerUse reads the accessibility tree the OS already maintains — the same tree used by screen readers and test automation. This gives every element a name, control type, and AutomationId, without rendering a pixel.

Interacting with an element by name costs 5–20 tokens. The element map for a full browser page in onscreen-only mode is typically 100–200 elements of compact JSON — compared to ~1,050 tokens for a single screenshot of the same page, with none of the coordinate fragility.

This is the same direction taken by the most efficient browser-only tools: browser-use claims 50% fewer tokens than screenshot alternatives; Vercel's agent-browser returns 200–400 tokens per page snapshot and uses 82–93% fewer tokens than Playwright MCP. ApexComputerUse brings the same approach to the entire Windows desktop.

How it compares

Tool Coverage HTTP API Stable element IDs Onscreen filter Status
ApexComputerUse Windows desktop + browsers ✅ REST ✅ SHA-256 hash ?onscreen=true Active
UFO2 (Microsoft) Windows desktop + browsers ❌ research agent ❌ bounding-box Partial Research only
UI Automata Windows desktop + browsers MCP only Selector-based Shadow DOM cache Active
Windows-Use Windows desktop ❌ Python lib Partial Active
WinAppDriver Windows desktop WebDriver XPath / selectors Paused by Microsoft
browser-use Browser only ❌ Python lib Element hash Active
Playwright MCP Browser only MCP Session-scoped refs Partial Active
Claude Computer Use Any (screenshot) Cloud API ❌ coordinates Active

No other tool combines: Windows UIA3 coverage, SHA-256 stable element IDs, a language-agnostic HTTP REST API, and an onscreen visibility filter — in a single deployable binary.

Compatible AI Agents

ApexComputerUse exposes a plain HTTP REST API, which means any AI agent that can execute shell commands or fetch a URL can use it. No SDK, no plugin, no special integration required — if the agent can run curl, it can drive any Windows app or browser through this server.

Access paths

There are three ways an agent can interact with ApexComputerUse:

1. Shell / terminal access (curl or any HTTP client) Any agent that can run shell commands can call the API directly with curl, Python requests, or PowerShell Invoke-RestMethod. This covers the widest range of tools and requires no configuration beyond starting the HTTP server.

2. URL fetch / WebFetch tool Some agents have a dedicated tool for fetching URLs rather than running shell commands. ApexComputerUse's HTML responses embed a full <script type="application/json" id="apex-result"> block, so any agent that can fetch a webpage gets structured JSON data back without needing a vision model.

3. MCP server (optional wrapper) Several agents support the Model Context Protocol. If you prefer a tighter integration, the REST API can be wrapped as an MCP server so the agent sees your actions as named tools rather than raw HTTP calls.


Agent compatibility table

Agent Type Shell access URL fetch MCP Notes
Claude Code CLI ✅ Bash tool ✅ WebFetch tool curl is blocked by default but Claude Code automatically falls back to Python requests for the same result
Cline VS Code extension ✅ Terminal ✅ Via shell Full agentic loop; browser control; human-in-the-loop approval for each command
Aider CLI ✅ Shell ✅ Via shell Oldest and most widely deployed open-source coding CLI; works with any model via Ollama or API key
Goose (Block) CLI + Desktop ✅ Shell ✅ Via shell Apache 2.0; model-agnostic; native MCP support
Cursor (Agent Mode) IDE ✅ Terminal ✅ Via shell Agent mode can run terminal commands; MCP support available
Windsurf (Cascade) IDE ✅ Terminal ✅ Via shell Cascade runs commands automatically; MCP support with admin controls
GitHub Copilot (Agent Mode) VS Code extension ✅ Terminal ✅ Via shell VS Code Agent mode handles terminal commands and iteration
OpenHands / Devin Cloud agent ✅ Shell ✅ Via shell Varies Requires network path from the cloud sandbox to your Windows machine
Roo Code / Continue VS Code extension ✅ Terminal ✅ Via shell Open-source; BYOK; shell access via VS Code terminal integration
Autocomplete-only tools Extension Tabnine, Supermaven, etc. generate code only — no agentic shell or HTTP access

Local model users: any agent backed by a local model via Ollama (Qwen Coder, DeepSeek Coder, CodeLlama, etc.) that also has shell access works the same way. The model itself doesn't need internet access — the agent runtime executes the curl commands.


Quickest agent integration (Claude Code example)

Start the HTTP server, then drop this into your Claude Code session:

The ApexComputerUse REST API is running at http://localhost:8080.
Use curl (or Python requests if curl is blocked) to control Windows apps.
Start with: curl http://localhost:8080/ping
Then: curl http://localhost:8080/windows  (to see what's open)
Then find and interact with any element using /find and /exec (or /find-exec for both in one call).

Claude Code will handle the rest — finding windows, reading the element tree, clicking, typing, and verifying results across turns using its stable element IDs.


Stable element IDs

Every element is assigned a SHA-256 hash-based numeric ID derived from its control type, name, AutomationId, and position in the tree. These IDs are stable across sessions — an agent can reference the same element in turn 1 and turn 20 without re-querying the tree. No other tool in the Windows desktop automation space publishes this property.

The onscreen filter

GET /elements?onscreen=true prunes any element where IsOffscreen = true during the tree scan, skipping entire offscreen subtrees. On a live Chewy.com product page this reduces 634 elements to 126 — an 80% reduction — putting token cost per step in the same range as the best browser-only tools while covering all desktop apps too.

The filter composes with the type filter and the new depth/expansion params: ?onscreen=true&type=Button.

When ?match= is combined with ?onscreen=true, the match search scans all elements (including offscreen ones) so content that has been scrolled out of view can still be found by text search. Offscreen matches are tagged with "isOffscreen": true in the response. Use exec action=scrollinto on the returned element ID to bring an offscreen match into view before interacting with it.

Progressive tree expansion

For deep pages, fetch a shallow overview first, then drill into only the branches you care about:

# Step 1 — shallow overview (fast, small response)
curl "http://localhost:8080/elements?depth=2&onscreen=true"
# Nodes that have children beyond the depth limit show "childCount": N instead of "children"

# Step 2 — expand a specific node by its ID (IDs are stable between calls)
curl "http://localhost:8080/elements?id=708379645&depth=2&onscreen=true"
# Returns only that subtree, 2 levels deep — existing map entries are preserved

This lets an AI agent navigate to the relevant section of a large page without fetching the whole tree on every step.

Browser-friendly tree filters

Modern web pages often wrap every visible element in several identity-less Pane/Group/Custom nodes and produce deep trees with many one-child chains. Two opt-in /elements parameters strip that noise:

# RECOMMENDED: global text search — replaces almost all hierarchical drill-down.
# Searches Name, AutomationId, Value, AND ClassName across the entire window tree
# (including offscreen elements). Returns every match with its ancestor path plus
# `depth` levels of descendants. Combine with includePath=true for breadcrumbs.
# The parameter name is `match=` — there is NO separate `global=true` flag; `match=`
# alone forces a full-tree scan and the tester-friendly behaviour described above.
# When `match=` is set, `depth=` is ignored (otherwise depth pruning would hide deep
# matches before they could be found).
curl "http://localhost:8080/elements?match=add+to+cart&onscreen=true&depth=1&includePath=true"

# Collapse "1-in-1-in-1" wrapper chains. A wrapper is skipped only when it has
# exactly one child, no name, no AutomationId, and its control type is Pane,
# Group, or Custom. Named containers and anything with an AutomationId survive.
curl "http://localhost:8080/elements?onscreen=true&collapseChains=true"

# Ancestor breadcrumb on every emitted node: "Chrome > Document > Main > Form".
curl "http://localhost:8080/elements?onscreen=true&includePath=true"

# Opt into Value pattern + HelpText on every node — useful for web inputs
# whose Name is empty and whose visible content lives in the Value pattern.
curl "http://localhost:8080/elements?onscreen=true&properties=extra"

# All new filters combine cleanly with existing ones.
curl "http://localhost:8080/elements?onscreen=true&collapseChains=true&match=submit&type=Button&depth=1&properties=extra"

Truncated nodes (ones whose children were cut off by depth) now also emit descendantCount alongside childCount, so an agent can decide whether a subtree is worth expanding without another round trip. Element IDs are computed against the real, unflattened tree — hoisting a descendant through collapseChains does not change its ID, and follow-up /elements?id=<id> and /execute id=<id> calls still resolve.

/find now populates the response's structured element object (id, controlType, name, automationId, className, frameworkId, isEnabled, isOffscreen, boundingRectangle, plus value/helpText when properties=extra), in addition to the existing human-readable string in message.


Features

  • Find any window and element by name or AutomationId (exact or fuzzy match)
  • Filter element search by ControlType
  • Persistent, hash-based stable element and window IDs (survive app restarts)
  • Onscreen-only element map (?onscreen=true) — prunes offscreen subtrees at scan time
  • Progressive tree expansion (?depth=N + ?id=<elementId>) — fetch a shallow overview then drill into only the branches you need, without re-scanning the whole window
  • Element nodes include boundingRectangle (x, y, width, height) for spatial context and visual rendering
  • Execute all common UI actions: click, type, select, toggle, scroll, drag & drop, etc.
  • OCR any UI element using Tesseract
  • Multimodal AI: describe UI elements, ask questions about them, analyse image/audio files using a local vision LLM (LLamaSharp MTMD)
  • Remote control via HTTP REST API (curl-friendly JSON)
  • Remote control via named pipe (PowerShell module included)
  • Remote control via cmd.exe batch helper (apex.cmd)
  • Remote control via Telegram bot
  • Screenshot capture of elements, windows, and full screen (returned as base64 PNG)
  • Interactive HTTP test console — served at GET /, includes live windows list, element tree browser, grouped command builder covering every action, inline capture/OCR/AI vision/UI map buttons, format selector (JSON/HTML/Text/PDF), format demo links, and a response log
  • AI DrawingPOST /draw renders any combination of shapes (rect, ellipse, circle, line, arrow, polygon, text) to a base64 PNG; GET /draw/demo renders a built-in multi-colour space scene; ?overlay=true shows the result as a click-through screen overlay
  • Layered Scene Editor — persistent, structured drawing canvas with stable shape IDs so AI can generate a composition and the user can refine it; full REST API at /scenes/*; interactive WinForms editor (Tools → Scene Editor) and browser editor (GET /editor)
  • UI Map Renderer — renders the element tree as a colour-coded overlay drawn directly on screen, and optionally exports a PNG image; accessible via Tools → Render UI Map or GET /uimap
  • Format-adaptive responses — every endpoint serves HTML, plain text, JSON, or PDF via URL extension (.json, .html, .txt, .pdf), ?format= parameter, or Accept header; default is an HTML page with embedded JSON readable by any AI that can fetch a URL
  • System utility routes/health (unauthenticated), /ping, /metrics, /sysinfo, /env, /ls, /run, /run-tests, /shutdown for AI agents that need OS-level context without a separate tool
  • WindowMonitor — background STA poll thread detects desktop window opens / closes / title changes once per second; fires WindowsChanged / WindowClosed events that auto-prune the CommandProcessor element + window caches when a window goes away (no more stale-handle errors when an app closes mid-session). Optional WatchElements mode adds descendant-level diff tracking, narrowable to the foreground window or to titles matching a substring filter for tractable scan cost. Inspect activity via GET /winmon/log and drain via POST /winmon/clear
  • Live monitoring dashboard — browser-based status page at GET /dashboard; shows health, per-route metrics, system info, registered clients, AI chat session status, and WindowMonitor activity log. Auto-refreshes every 5 seconds. Requires AllowDiagnostics permission.
  • Native HTTPS — opt-in TLS via http.sys (no proxy); Scripts/setup-https.ps1 generates a self-signed cert, binds it via netsh http add sslcert, and adds a Firewall rule in one elevated step. Supports user-supplied PFX. Three remote-access options documented in Scripts/README-remote-access.md: SSH tunnel, native HTTPS, and Caddy reverse proxy.
  • Embedded AI chat in the Chat tab — the Chat tab opens the streaming HTML chat UI (/chat) in your default browser; click Open In Browser to launch it. The HTML page handles streaming, provider/model display, and session reset natively.
  • AI Chat over HTTP — streaming chat UI at GET /chat backed by /chat/send, /chat/status, /chat/reset; same 8 providers as the desktop AI Chat window; also accessible from any browser
  • Agentic tool loop in AI Chat — when the local HTTP server is running, the AI can issue ApexComputerUse API calls inside ```apex code blocks; results are fed back automatically for up to 8 turns until the AI produces a clean answer (AiChatService.SendAsync + SetLocalServer)
  • Auto-start on launch — HTTP server starts automatically (HttpAutoStart=true by default), binds to localhost by default (HttpBindAll=false), and can be switched to all-interfaces mode with one-time netsh setup (URL ACL + Firewall rule)
  • Auto-download setup — Model tab "Download All" button fetches the LFM2.5-VL model, projector, and Tesseract data to fixed local paths on first launch

Setup

1. Build and run

git clone https://github.com/your-org/ApexComputerUse
cd ApexComputerUse
dotnet run --project ApexComputerUse

2. First-run network setup (only when HttpBindAll=true)

When HttpBindAll=true, ApexComputerUse checks whether the HTTP URL ACL and Windows Firewall inbound rule exist for the configured port. If either is missing, a single elevated cmd window opens (one UAC prompt) and runs:

netsh http add urlacl url=http://+:{port}/ user=Everyone
netsh advfirewall firewall add rule name="ApexComputerUse" dir=in action=allow protocol=TCP localport={port}

This happens once and is tracked in %APPDATA%\ApexComputerUse\settings.json. With the default HttpBindAll=false, this setup is skipped.

3. Models and OCR data (optional — auto-download available)

Open the Model tab and click Download All to automatically fetch:

  • LFM2.5-VL-450M-Q4_0.gguf — vision LLM (450 M parameters, quantized)
  • mmproj-LFM2.5-VL-450m-F16.gguf — multimodal projector
  • eng.traineddata — Tesseract English OCR data

Files are saved to models\ and tessdata\ next to the executable. On first launch the app detects missing files and switches to the Model tab automatically.

To download manually: copy eng.traineddata from github.com/tesseract-ocr/tessdata into tessdata\, and place both .gguf files in models\.

4. Remote access (optional)

Three options — see Scripts/README-remote-access.md for full details:

Option When to use Setup
SSH tunnel Ad-hoc, no certificates .\Scripts\ssh-tunnel.ps1 -Server user@mypc
Native HTTPS Permanent TLS, no proxy .\Scripts\setup-https.ps1 (run as Admin), then set HttpsEnabled: true in appsettings.json
Caddy proxy Public domain + auto Let's Encrypt caddy run --config Scripts/Caddyfile with DOMAIN= set

5. Telegram Bot (optional)

  1. Message @BotFather on Telegram and create a bot with /newbot.
  2. Copy the token (format: 123456789:ABC-DEF...).
  3. Paste it into the Bot Token field in the app and click Start Telegram.
  4. Add your Telegram chat ID to the Allowed Chat IDs field to restrict who can send commands.

Security & Configuration

HTTP API Authentication

Every HTTP request must include the API key. Three equivalent methods:

# Authorization header (recommended)
curl -H "Authorization: Bearer <key>" http://localhost:8080/ping

# X-Api-Key header
curl -H "X-Api-Key: <key>" http://localhost:8080/ping

# Query parameter (use only for browser links / quick tests)
curl "http://localhost:8080/ping?apiKey=<key>"

Requests without a valid key receive HTTP 401. The interactive web console (GET /) pre-fills the key automatically — paste it from the Remote Control tab on first launch.

To disable authentication (local development only), clear the API Key field in the app.

Named Pipe Security

The named pipe is ACL-restricted to the current Windows user. Other local users and unprivileged processes cannot connect.

Telegram Bot Authorization

Enter one or more Telegram chat IDs in the Allowed Chat IDs field (comma-separated). Any message from an unlisted chat ID receives "Unauthorized." and is logged. Leave the field empty only for local testing.

Client Permission Gating (non-loopback callers)

Requests from localhost / loopback always have full access. Non-loopback callers are matched against entries in the Clients tab and constrained by per-client permissions (allow_automation, allow_capture, allow_ai, allow_scenes, allow_shell_run, allow_clients, allow_diagnostics). Unknown non-loopback callers are denied.

Shell Execution (/run)

The POST /run and GET /run endpoints execute arbitrary cmd.exe commands. They are disabled by default. Enable them explicitly:

  • In appsettings.json: "EnableShellRun": true
  • Or via environment variable: APEX_ENABLE_SHELL_RUN=true

Configuration

All settings can be layered via three sources (highest priority last wins for env vars):

appsettings.json (next to the executable — shipped defaults shown):

{
  "HttpPort":             8080,
  "HttpBindAll":          false,
  "HttpAutoStart":        true,
  "PipeName":             "ApexComputerUse",
  "LogLevel":             "Information",
  "EnableShellRun":       false,
  "TelegramToken":        "",
  "TestRunnerExePath":    "",
  "TestRunnerConfigPath": ""
}

Shipped defaults are HttpAutoStart=true and HttpBindAll=false (auto-start on localhost only). Set HttpBindAll=true for LAN access.

Environment variables (prefix APEX_, override appsettings.json):

Variable Description
APEX_HTTP_PORT HTTP listen port (default 8080)
APEX_HTTP_BIND_ALL true to bind all interfaces instead of localhost only
APEX_HTTP_AUTOSTART true to auto-start HTTP server in GUI mode
APEX_PIPE_NAME Named pipe name
APEX_LOG_LEVEL Serilog minimum level: Debug / Information / Warning / Error
APEX_ENABLE_SHELL_RUN true to enable the /run shell-execution endpoint
APEX_API_KEY Override the auto-generated API key
APEX_ALLOWED_CHAT_IDS Comma-separated Telegram chat ID whitelist
APEX_TELEGRAM_TOKEN Telegram bot token
APEX_MODEL_PATH Default LLM .gguf path
APEX_MMPROJ_PATH Default multimodal projector .gguf path
APEX_TEST_RUNNER_EXE_PATH Path to TestApplications/TestRunner executable for /run-tests
APEX_TEST_RUNNER_CONFIG_PATH Optional config file path passed to TestRunner

Network binding: HttpBindAll = false (the default) binds to http://localhost:{port}/ — loopback only, safe for single-machine use. Set APEX_HTTP_BIND_ALL=true to bind all interfaces for network-wide access (ensure firewall rules are in place).

Logs are written to %LOCALAPPDATA%\ApexComputerUse\Logs\apex-YYYYMMDD.log (daily rotation, 7-day retention).

Run as a Windows Service

ApexComputerUse can run headlessly as a Windows service (no GUI):

# Install
sc.exe create ApexComputerUse binPath="C:\ApexComputerUse\ApexComputerUse.exe --service" start=auto
sc.exe start ApexComputerUse

# Uninstall
sc.exe stop ApexComputerUse
sc.exe delete ApexComputerUse

Configure via appsettings.json or APEX_* environment variables before starting the service. The APEX_TELEGRAM_TOKEN and APEX_API_KEY variables are the recommended way to inject secrets in a service context.

Command-line overrides

Program.cs supports lightweight startup overrides:

  • --port <n> sets APEX_HTTP_PORT for that process
  • --pipe <name> sets APEX_PIPE_NAME for that process
  • --client marks the instance as a subordinate client instance

Usage — UI

Field Description
Window Name Partial title of the target window. Fuzzy-matched if no exact match found.
AutomationId The element's AutomationId (checked first).
Element Name The element's Name property (fallback if AutomationId is blank).
Search Type Filter the element search to a specific ControlType. All searches everything.
Control Type Selects the action group (Button, TextBox, etc.).
Action The action to perform on the found element.
Value / Index Input for actions that need it (text to type, index, row,col, x,y, etc.).

Find Element — locates the window and element, logs what was found. Execute Action — runs the selected action against the last found element.

Tools menu

Item Description
Run AI Computer Use Mode Launches the interactive multimodal AI agent loop (requires model loaded on the Model tab).
Output UI Map Scans the current window's element tree and logs it as nested JSON to the console tab.
Render UI Map Scans the current window's element tree, draws a colour-coded bounding-box overlay on screen for 5 seconds, and offers to save the overlay as a PNG image.
Scene Editor Opens the layered scene editor — create scenes, add shapes to layers, drag to reposition, use AI to generate and refine compositions.
AI Chat Opens a standalone streaming chat window with support for 8 AI providers (OpenAI, Anthropic, DeepSeek, Grok, Groq, Duck, LM Studio, LlamaSharp). Configure API keys in ai-settings.json next to the executable. The Chat tab opens the same chat UI in your default browser — click Open In Browser after the HTTP server starts.

Window and Element ID Mapping

Every window and element is assigned a stable numeric ID (SHA-256 hash-based) that persists across sessions. These IDs can be used in find commands instead of titles or AutomationIds.

# 1. Get windows with their IDs
curl http://localhost:8080/windows
# Returns: [{"id":42,"title":"Notepad"},{"id":107,"title":"Calculator"},...]

# 2. Get elements with their IDs for the current window
curl http://localhost:8080/elements

# Onscreen elements only (prunes offscreen subtrees — 80% fewer elements on browser pages)
curl "http://localhost:8080/elements?onscreen=true"

# Limit tree depth — nodes at the cutoff show "childCount" instead of "children"
curl "http://localhost:8080/elements?depth=2&onscreen=true"

# Expand a specific subtree by numeric ID (IDs are stable; map is preserved between expansion calls)
curl "http://localhost:8080/elements?id=708379645&depth=2&onscreen=true"

# Combine with type filter
curl "http://localhost:8080/elements?onscreen=true&type=Button"

# Returns nested JSON including bounding rectangles:
# {
#   "id": 105,
#   "controlType": "Edit",
#   "name": "Text Editor",
#   "automationId": "15",
#   "boundingRectangle": { "x": 0, "y": 30, "width": 800, "height": 600 },
#   "children": [...]
# }
#
# When a depth limit truncates a node's children, "childCount" appears instead:
# {
#   "id": 708379645,
#   "controlType": "Pane",
#   "name": "",
#   "boundingRectangle": { ... },
#   "childCount": 7    <-- call /elements?id=708379645 to expand
# }

# 3. Find using numeric IDs (no fuzzy matching, direct map lookup)
curl -X POST http://localhost:8080/find \
     -H "Content-Type: application/json" \
     -d '{"window":42,"id":105}'

Using numeric IDs is faster and unambiguous — the element is resolved directly from the in-memory map without any search or fuzzy logic. Every find call also auto-focuses the matched window. When a title/name search is low-confidence or ambiguous, /find now refuses to guess and returns error_data.candidates; choose one of those candidates or use IDs from /windows and /elements.


Token Economics

Map rendering isn't just a debugging convenience — it has compounding implications for token consumption at scale.

The Core Difference

With screenshot-based AI automation, every interaction requires sending a fresh image to the model. At typical desktop resolutions that's 1,000–3,500 tokens per screenshot depending on the provider and resolution — every single step, accumulating in conversation history. With ApexComputerUse's map approach, the UI is rendered once as a structured, text-based representation. After that initial render, each individual interaction references elements by name, costing 5–20 tokens on average.

The ?onscreen=true filter further reduces the element map to only what is visible in the current viewport. On a real browser page this produces 126 elements of compact JSON — well under the cost of a single screenshot of the same page.

Real-world token costs (approximate — varies by provider and resolution)

Per step 20-step task
Screenshot (1024×768) ~765–1,050 tokens ~15,000–21,000 tokens in images alone
Screenshot (1920×1080) ~1,840–2,125 tokens ~37,000–43,000 tokens in images alone
Screenshot (2048×2048) ~2,765–3,500 tokens ~55,000–70,000 tokens in images alone
ApexComputerUse (full map) 400–1,800 tokens (one-time) + ~10 per action ~1,000 tokens total
ApexComputerUse (?onscreen=true) 200–600 tokens (one-time) + ~10 per action ~400 tokens total

Provider breakdown: at 1024×768, Anthropic ≈ 1,050 tokens / OpenAI ≈ 765 tokens. At 1920×1080, Anthropic ≈ 1,840 / OpenAI ≈ 2,125. At 2048×2048, OpenAI ≈ 2,765 / Anthropic ≈ 2,500–3,500. Gemini is notably more efficient — typically under 1,000 tokens even for ~4K images. All providers compound costs across steps: every screenshot remains in context for the life of the conversation.


Example 1 — Small App (Calculator, tray utility, simple tool)

Screenshot: 2,500 tokens each · Initial map: 400 tokens · Per-action after map: 8 tokens

By time period — 1 person:

Timeframe Screen Capture Map Approach Tokens Saved
1 day 250,000 1,192 248,808
1 week 1,750,000 8,344 1,741,656
1 year 91,250,000 435,080 90,814,920

Annual totals — by team size:

Team Size Screen Capture Map Approach Reduction Factor
1 person 91,250,000 435,080 ~210x
10 people 912,500,000 4,350,800 ~210x
50 people 4,562,500,000 21,754,000 ~210x

Usage — HTTP API

Start the HTTP server from the Remote Control group box, then use curl or open http://localhost:8080/?apiKey=<key> in a browser to access the interactive test console.

Authentication reminder: every route except GET /health requires the API key. For curl, add -H "X-Api-Key: <key>". For browser URLs, append ?apiKey=<key>.

Interactive Test Console (GET /)

Opening the root URL in any browser launches a dark-themed console with:

  • Windows panel — live list of all open windows; click to select and auto-load its element tree
  • Elements panel — nested element tree flattened with indentation; onscreen-only toggle; ControlType filter; click any element to select it
  • Command builder — grouped action buttons covering every action: Click, Text, Keys, State, Scroll, Toggle, Select, Window, Range/Slider, Grid/Table, Transform, Wait, Capture, AI Vision; Value input (multiline, Ctrl+Enter to execute) with context-sensitive hints; ▶ Execute button
  • AI Vision buttonsstatus, describe, ask, file; requires model loaded on the Model tab
  • Format selector — dropdown in the header (JSON / HTML / Text / PDF); all requests use the selected format; format demo links (help, status, windows) open directly in a new tab in the chosen format
  • Scene Editor link — opens the browser-based canvas editor in a new tab
  • Response log — newest result at top; captures rendered as inline images (click to zoom); PDF responses shown as an "Open PDF" link (browser-native rendering)

Format negotiation

Every endpoint adapts its response to whatever format the caller can consume, selected by priority:

  1. URL file extension — append .json, .html, .txt, or .pdf to any path
  2. ?format= query parameterhtml, text, json, or pdf
  3. Accept request headertext/html, text/plain, application/json, or application/pdf
  4. Default: html
# URL extension (highest priority — works even if the AI cannot set headers or query params)
curl http://localhost:8080/status.json
curl http://localhost:8080/help.txt
curl http://localhost:8080/windows.html
curl http://localhost:8080/status.pdf --output status.pdf

# ?format= query parameter
curl "http://localhost:8080/ping?format=text"
curl "http://localhost:8080/ping?format=json"

# Accept header
curl -H "Accept: application/json"  http://localhost:8080/ping
curl -H "Accept: application/pdf"   http://localhost:8080/help --output help.pdf

# HTML response (default — works in any browser or AI that can fetch a page)
curl http://localhost:8080/ping

HTML includes a <pre> block for human readability and an embedded <script type="application/json" id="apex-result"> block containing the full result as JSON — allowing any AI that can fetch a webpage to extract structured data without a vision model.

PDF is a valid A4 document using the built-in Courier font (no external dependencies). Useful for AI systems that can only accept PDF attachments.

GET access to command endpoints

All command endpoints accept both POST (JSON body) and GET (query string parameters), so any command can be expressed as a plain URL — no request body required:

# Find a window via GET
curl "http://localhost:8080/find?window=Notepad"

# Execute an action via GET
curl "http://localhost:8080/exec?action=gettext"

# Combine with URL extension for full URL-only access
curl "http://localhost:8080/find.json?window=Notepad&id=15"
curl "http://localhost:8080/exec.pdf?action=describe" --output result.pdf

GET parameter names match the JSON body field names: window, id / automationId, name / elementName, type / searchType, action, value, onscreen, depth, prompt, model, proj.

/elements-specific: depth=N limits tree depth (truncated nodes show childCount); id=<numericId> expands from a previously-mapped element without clearing the rest of the map.

Response format

All endpoints return the same canonical structure:

{
  "success": true,
  "action": "ping",
  "data":   { "key": "value", ... },
  "error":  null,
  "error_data": null
}

HTTP status: 200 on success, 400 on error.

error_data is an additive object populated on failures (null when there is no error). Its shape is action-specific — for example, action-execution failures may carry failed_pattern, supported_patterns, element_state, and a remediation hint; waitfor timeouts carry timeout_ms, predicate, property, expected, and last_observed; wait-window timeouts carry last_observed_titles. Existing callers that only read success / data / error continue to work unchanged.

gettext and getvalue responses include a source field inside data — one of TextPattern, ValuePattern, LegacyIAccessible, or Name — naming the UIA accessor that produced the text. Inside batch step results this appears as extras.source.

Element nodes returned by /elements and /find include className alongside id, controlType, name, automationId, frameworkId, isEnabled, isOffscreen, and boundingRectangle. match= searches className along with the other text fields.


System / utility routes

# Unauthenticated liveness probe — safe for external monitoring (the only route that doesn't require the API key)
curl http://localhost:8080/health

# Authenticated health check
curl -H "X-Api-Key: <key>" http://localhost:8080/ping

# Per-route request counters
curl -H "X-Api-Key: <key>" http://localhost:8080/metrics

# Recent WindowMonitor activity (window open/close/rename, optional element add/remove). Append .json for raw JSON.
curl -H "X-Api-Key: <key>" http://localhost:8080/winmon/log.json
# Drain the buffer
curl -H "X-Api-Key: <key>" -X POST http://localhost:8080/winmon/clear.json

# System information (OS, machine, user, CPU, CLR)
curl -H "X-Api-Key: <key>" http://localhost:8080/sysinfo

# All environment variables
curl -H "X-Api-Key: <key>" http://localhost:8080/env

# Directory listing (defaults to current working directory)
curl -H "X-Api-Key: <key>" http://localhost:8080/ls
curl -H "X-Api-Key: <key>" "http://localhost:8080/ls?path=C:\Users"

# Trigger the bundled integration test runner (TestApplications/TestRunner)
# Requires TestRunnerExePath (or APEX_TEST_RUNNER_EXE_PATH) to be configured.
curl -H "X-Api-Key: <key>" -X POST http://localhost:8080/run-tests

# Gracefully stop the HTTP server
curl -H "X-Api-Key: <key>" -X POST http://localhost:8080/shutdown

# Run a shell command (cmd.exe /c); 30-second timeout
# Requires EnableShellRun = true in appsettings.json or APEX_ENABLE_SHELL_RUN=true
curl -H "X-Api-Key: <key>" "http://localhost:8080/run?cmd=whoami"
curl -H "X-Api-Key: <key>" "http://localhost:8080/run?command=whoami"
curl -H "X-Api-Key: <key>" -X POST http://localhost:8080/run \
     -H "Content-Type: application/json" \
     -d '{"command":"dir C:\\"}'

/run response data fields: cmd, stdout, stderr, exit_code.

Security note: /run executes arbitrary commands as the process user. It is disabled by default and should only be enabled in trusted, authenticated environments.


UI automation routes

# List all open windows (with stable IDs)
curl http://localhost:8080/windows

# Get current state
curl http://localhost:8080/status

# List all elements in the current window (nested JSON with IDs and bounding rectangles)
curl http://localhost:8080/elements

# Onscreen elements only — prunes offscreen subtrees for maximum token efficiency
curl "http://localhost:8080/elements?onscreen=true"

# Limit depth — truncated nodes show "childCount" so you know where to drill in
curl "http://localhost:8080/elements?depth=2&onscreen=true"

# Expand a specific node by numeric ID (preserves the rest of the map — IDs stay stable)
curl "http://localhost:8080/elements?id=<elementId>&depth=2&onscreen=true"

# Filter by ControlType
curl "http://localhost:8080/elements?type=Button"

# Text search across Name, AutomationId, Value, and ClassName — returns only
# matching branches, each wrapped in its ancestor path, with `depth` levels below.
curl "http://localhost:8080/elements?match=add+to+cart&onscreen=true&depth=1"

# Collapse identity-less single-child Pane/Group/Custom wrapper chains
# (named containers and anything with an AutomationId are preserved).
curl "http://localhost:8080/elements?onscreen=true&collapseChains=true"

# Add an ancestor breadcrumb ("path") to every emitted node.
curl "http://localhost:8080/elements?onscreen=true&includePath=true"

# Opt into Value pattern + HelpText (omitted by default to keep payloads small).
curl "http://localhost:8080/elements?onscreen=true&properties=extra"

# All filters combined
curl "http://localhost:8080/elements?depth=3&onscreen=true&type=Button&collapseChains=true&match=submit&properties=extra"

# Render the current window's UI element tree as a colour-coded PNG (returns base64)
curl http://localhost:8080/uimap

# Help
curl http://localhost:8080/help

# Find a window and element by title/name
curl -X POST http://localhost:8080/find \
     -H "Content-Type: application/json" \
     -d '{"window":"Notepad","id":"15"}'

# Find by element name with ControlType filter
curl -X POST http://localhost:8080/find \
     -H "Content-Type: application/json" \
     -d '{"window":"Notepad","name":"Text Editor","type":"Edit"}'

# Find by numeric window/element IDs (fast, no fuzzy search)
curl -X POST http://localhost:8080/find \
     -H "Content-Type: application/json" \
     -d '{"window":42,"id":105}'

# Visual Studio handoff targets:
# F5/debug: find name="Debug Target" type="SplitButton", then exec keys {F5}
# Ctrl+F5/no-debug: find name="Start Without Debugging" type="Button", then exec keys Ctrl+{F5}

# Type text into the found element
curl -X POST http://localhost:8080/execute \
     -H "Content-Type: application/json" \
     -d '{"action":"type","value":"Hello World"}'

# Click a button
curl -X POST http://localhost:8080/execute \
     -H "Content-Type: application/json" \
     -d '{"action":"click"}'

# Read text from element
curl -X POST http://localhost:8080/execute \
     -H "Content-Type: application/json" \
     -d '{"action":"gettext"}'

# Capture current element (returns base64 PNG in data field)
curl -X POST http://localhost:8080/capture

# Capture full screen
curl -X POST http://localhost:8080/capture \
     -H "Content-Type: application/json" \
     -d '{"action":"screen"}'

# Capture multiple elements stitched into one image
curl -X POST http://localhost:8080/capture \
     -H "Content-Type: application/json" \
     -d '{"action":"elements","value":"42,105,106"}'

# OCR the found element
curl -X POST http://localhost:8080/ocr

# OCR a region (x,y,width,height) within the element
curl -X POST http://localhost:8080/ocr \
     -H "Content-Type: application/json" \
     -d '{"value":"0,0,300,50"}'

# Check AI model status
curl http://localhost:8080/ai/status

# Load a vision/audio LLM (run once; model stays loaded until the server restarts)
curl -X POST http://localhost:8080/ai/init \
     -H "Content-Type: application/json" \
     -d '{"model":"C:\\models\\vision.gguf","proj":"C:\\models\\mmproj.gguf"}'

# Describe the currently selected UI element using the vision model
# Captures the element as an image and sends it to the LLM
curl -X POST http://localhost:8080/ai/describe

# Describe with a custom prompt
curl -X POST http://localhost:8080/ai/describe \
     -H "Content-Type: application/json" \
     -d '{"prompt":"List every button you can see."}'

# Ask a specific question about the current element
curl -X POST http://localhost:8080/ai/ask \
     -H "Content-Type: application/json" \
     -d '{"prompt":"Is there an error message visible?"}'

# Describe an image file on disk
curl -X POST http://localhost:8080/ai/file \
     -H "Content-Type: application/json" \
     -d '{"value":"C:\\screenshots\\app.png","prompt":"What dialog is shown?"}'

Request body fields

Field Aliases Description
window Window title (partial match) or numeric ID from /windows
automationId id Element AutomationId string or numeric ID from /elements
elementName name Element Name property (fallback if id not given)
searchType type ControlType filter (All or e.g. Button)
action Action name (see list below)
value Value/input for the action
model modelPath AI: path to LLM .gguf file
proj mmProjPath AI: path to multimodal projector .gguf file
prompt AI: question or instruction text

Usage — AI Drawing

The drawing engine renders GDI+ shapes to a base64 PNG on demand. Every shape type supports colour, opacity, fill/stroke, and dashed lines.

Quick draw

# Draw a filled blue circle with white text
curl -X POST http://localhost:8080/draw \
     -H "Content-Type: application/json" \
     -d '{
       "value": "{\"canvas\":\"blank\",\"width\":400,\"height\":300,\"shapes\":[
         {\"type\":\"circle\",\"x\":200,\"y\":150,\"r\":80,\"color\":\"royalblue\",\"fill\":true},
         {\"type\":\"text\",\"x\":200,\"y\":140,\"text\":\"Hello!\",\"color\":\"white\",\"font_size\":20,\"font_bold\":true,\"align\":\"center\"}
       ]}"
     }'

# Render the built-in space scene
curl http://localhost:8080/draw/demo

# Show it as a full-screen overlay for 6 seconds
curl "http://localhost:8080/draw/demo?overlay=true&ms=6000"

The data.result field contains the base64 PNG. The web console renders it inline.

Shape types

Type Key fields Description
rect x y w h corner_radius Rectangle (rounded if corner_radius > 0)
ellipse x y w h Ellipse inside bounding box
circle x y r Circle — x,y is the centre
line x y x2 y2 Straight line
arrow x y x2 y2 Line with arrowhead at (x2,y2)
polygon points[] Closed polygon — flat array of x,y pairs
triangle x y w h Triangle — bounding-box anchored, top-centre apex
arc x y w h start_angle sweep_angle Open arc — angles in degrees, clockwise from 3 o'clock
text x y text font_size font_bold align background Rendered text

Common fields on all shapes: color, fill (bool), stroke_width, opacity (0–1), dashed (bool), rotation (degrees, centre-origin).

Canvas values: blank (transparent), white, black, screen (live screenshot), window (current window), element (current element).


Usage — Layered Scene Editor

The scene system lets AI agents and users collaborate on persistent, structured drawings. Every shape has a stable ID; coordinates are always accurate; the AI can read them back and refine the composition at any time.

REST API (/scenes/*)

# Create a scene
curl -X POST http://localhost:8080/scenes \
     -H "Content-Type: application/json" \
     -d '{"name":"My Scene","width":800,"height":600,"background":"#1a1a2e"}'
# → data.scene contains the full scene with id

# List scenes
curl http://localhost:8080/scenes

# Get a scene
curl http://localhost:8080/scenes/{id}

# Add a layer
curl -X POST http://localhost:8080/scenes/{id}/layers \
     -H "Content-Type: application/json" \
     -d '{"name":"Background"}'

# Add a shape to a layer
curl -X POST http://localhost:8080/scenes/{id}/layers/{lid}/shapes \
     -H "Content-Type: application/json" \
     -d '{"shape":{"type":"circle","x":400,"y":300,"r":80,"color":"royalblue","fill":true},"name":"Planet"}'

# Render the scene to a PNG
curl http://localhost:8080/scenes/{id}/render
# → data.result is base64 PNG

# Patch shape geometry (after user drags it — never clobbers color/style)
curl -X PATCH http://localhost:8080/scenes/{id}/layers/{lid}/shapes/{sid} \
     -H "Content-Type: application/json" \
     -d '{"x":420,"y":310}'

# Move a shape to a different layer
curl -X POST http://localhost:8080/scenes/{id}/shapes/{sid}/move \
     -H "Content-Type: application/json" \
     -d '{"target_layer_id":"{newLayerId}"}'

# Delete a shape / layer / scene
curl -X DELETE http://localhost:8080/scenes/{id}/layers/{lid}/shapes/{sid}
curl -X DELETE http://localhost:8080/scenes/{id}/layers/{lid}
curl -X DELETE http://localhost:8080/scenes/{id}

Full route reference

Method Route Description
GET / POST /scenes List all scenes / create scene
GET / PUT / PATCH / DELETE /scenes/{id} Get / update meta / delete scene
GET /scenes/{id}/render Render scene → base64 PNG
GET / POST /scenes/{id}/layers List layers / add layer
GET / PUT / PATCH / DELETE /scenes/{id}/layers/{lid} Get / update / delete layer
GET / POST /scenes/{id}/layers/{lid}/shapes List shapes / add shape
GET / PUT / PATCH / DELETE /scenes/{id}/layers/{lid}/shapes/{sid} Get / replace / patch geometry / delete shape
POST /scenes/{id}/shapes/{sid}/move Move shape to a different layer

Scene Editor — WinForms (Tools → Scene Editor)

The desktop editor opens a standalone window with:

  • Scene list — create, select, or delete scenes
  • Toolbar — arrow (select/move), rect, ellipse, circle, line, text, delete
  • Canvas — double-buffered; drag shapes to reposition; draw new shapes by clicking and dragging; mouse wheel to zoom
  • Layers panel — add/delete layers; click to select the active layer; eye icon to toggle visibility
  • Properties panel — x, y, w, h, r fields for the selected shape; edits commit to the store immediately
  • Keyboard shortcuts — V/R/E/C/L/T for tools, Delete to remove selected shape, Escape to deselect

All changes are persisted to disk (%LOCALAPPDATA%\ApexComputerUse\scenes\{id}.json) and immediately available via the REST API.

Scene Editor — Browser (GET /editor)

Open http://localhost:8080/editor?apiKey=<key> for the same editing experience in a browser:

  • HTML5 Canvas renderer for all 7 shape types
  • Click-and-drag to place shapes; click to select and drag to move
  • Layer panel with add/delete/visibility toggle
  • Properties panel showing live coordinates
  • Keyboard shortcuts (V/R/E/C/L/T, Delete, Escape)
  • All changes sync to the same /scenes/* REST API

Usage — Telegram Bot

After starting the bot, send commands to it in any Telegram chat:

/find window=Notepad id=15
/find window=Calculator name=Equals type=Button
/exec action=type value="Hello from Telegram"
/exec action=click
/exec action=gettext
/ocr
/ocr value=0,0,300,50
/status
/windows
/elements
/elements type=Button
/help

Key=value pairs support quoted values for multi-word strings:

/find window="My Application" name="Save Button"
/exec action=type value="some text with spaces"

AI commands work the same way:

/ai action=status
/ai action=init model=C:\models\vision.gguf proj=C:\models\mmproj.gguf
/ai action=describe
/ai action=describe prompt="List every button you can see."
/ai action=ask prompt="Is there an error message visible?"
/ai action=file value=C:\screenshots\app.png prompt="What dialog is shown?"

Usage — PowerShell

The app exposes a named pipe server (default name ApexComputerUse). Start it from the Remote Control group box, then use the bundled ApexComputerUse.psm1 module:

# Import the module
Import-Module .\Scripts\ApexComputerUse.psm1

# Connect to the pipe (must be started in the app first)
Connect-FlaUI                        # default pipe name: ApexComputerUse
Connect-FlaUI -PipeName MyPipe -TimeoutMs 10000

# Discovery
Get-FlaUIWindows                     # list all open window titles
Get-FlaUIStatus                      # current window/element state
Get-FlaUIHelp                        # command reference
Get-FlaUIElements                    # list all elements in current window
Get-FlaUIElements -Type Button       # filter by ControlType

# Find
Find-FlaUIElement -Window 'Notepad'
Find-FlaUIElement -Window 'Notepad' -Name 'Text Editor' -Type Edit
Find-FlaUIElement -Window 'Calculator' -Id 'num5Button'

# Execute actions
Invoke-FlaUIAction -Action click
Invoke-FlaUIAction -Action type  -Value 'Hello from PowerShell'
Invoke-FlaUIAction -Action gettext
Invoke-FlaUIAction -Action screenshot

# OCR
Invoke-FlaUIOcr
Invoke-FlaUIOcr -Region '0,0,300,50'

# AI
Invoke-FlaUIAi -SubCommand init     -Model 'C:\models\v.gguf' -Proj 'C:\models\p.gguf'
Invoke-FlaUIAi -SubCommand status
Invoke-FlaUIAi -SubCommand describe -Prompt 'What buttons are visible?'
Invoke-FlaUIAi -SubCommand ask      -Prompt 'Is there an error message?'
Invoke-FlaUIAi -SubCommand file     -Value 'C:\screen.png' -Prompt 'Describe this.'

# Send raw JSON (advanced)
Send-FlaUICommand @{ command='find'; window='Notepad'; elementName='Text Editor' }

# Disconnect
Disconnect-FlaUI

PowerShell cmdlet reference

Cmdlet Key Parameters Description
Connect-FlaUI PipeName, TimeoutMs Connect to the pipe server
Disconnect-FlaUI Close the connection
Send-FlaUICommand Request (hashtable) Send a raw JSON command
Get-FlaUIWindows List open window titles
Get-FlaUIStatus Show current window/element
Get-FlaUIHelp Server command reference
Get-FlaUIElements Type List elements in current window
Find-FlaUIElement Window, Id, Name, Type Find a window and element
Invoke-FlaUIAction Action, Value Execute action on current element
Invoke-FlaUIOcr Region OCR current element or region
Invoke-FlaUICapture Target, Value Capture screen/window/element(s); returns base64 PNG in data
Invoke-FlaUIAi SubCommand, Model, Proj, Prompt, Value Multimodal AI sub-commands

The pipe connection is session-based: window and element state are preserved across calls within a single Connect-FlaUI / Disconnect-FlaUI session. Use Find-FlaUIElement to select a target, then call Invoke-FlaUIAction as many times as needed without re-finding.


Usage — cmd.exe

Use Scripts\apex.cmd — a batch helper that wraps the HTTP server with simpler positional syntax. Requires the HTTP server to be started first and curl (built-in on Windows 10+).

:: Optional: override port (default is 8080)
set APEX_HTTP_PORT=8080

:: Discovery
apex windows
apex status
apex elements
apex elements Button
apex help

:: Find a window and element
apex find Notepad
apex find "My App" id=btnOK
apex find Notepad name="Text Editor" type=Edit

:: Execute actions
apex exec click
apex exec type value=Hello
apex exec gettext
apex exec screenshot

:: Capture
apex capture
apex capture action=screen
apex capture action=window
apex capture action=elements value=42,105,106

:: OCR
apex ocr
apex ocr 0,0,300,50

:: AI
apex ai status
apex ai init model=C:\models\v.gguf proj=C:\models\p.gguf
apex ai describe
apex ai describe prompt="What do you see?"
apex ai ask prompt="Is there an error message?"
apex ai file value=C:\screen.png prompt="Describe this."

Add Scripts\ to your PATH (or copy apex.cmd next to your scripts) to use it from any directory.


Usage — AI (Multimodal)

The AI command set is backed by MtmdHelper, which uses LLamaSharp to run a local multimodal (vision + audio) LLM. No cloud API is required.

Setup

Download a vision-capable GGUF model and its multimodal projector (e.g. LFM2.5-VL from LM Studio) and note the paths to both .gguf files, or use Download All on the Model tab. Then call ai init before any inference commands.

AI sub-commands

Sub-action Required params Optional params Description
init model=<path> proj=<path> Load the LLM and projector into memory
status Report whether the model is loaded and which modalities it supports
describe — (uses current element) prompt=<text> Capture the current UI element as an image and ask the vision model to describe it
ask prompt=<text> Ask a specific question about the current UI element (captures element image)
file value=<file path> prompt=<text> Send an image or audio file from disk to the model

Note: describe, ask, and file require a prior find command to select a window/element. The model must be initialized with init before any inference call. Each inference call starts completely fresh — no chat history is retained between calls.

AI Vision in the test console

The HTTP test console (GET /) has a dedicated AI Vision button group (purple-tinted):

Button Endpoint Value field
status GET /ai/status
describe POST /ai/describe Optional prompt (e.g. list all buttons)
ask POST /ai/ask Required question (e.g. what number is shown?)

Select an element in the Elements panel first, then click describe or ask. The console shows a "Running vision model…" notice immediately and updates with the result when inference completes.


UI Map Renderer

The UI Map Renderer scans the current window's accessibility tree and renders every element's bounding rectangle as a colour-coded overlay. Each control type gets a deterministic, visually distinct colour. Element names are drawn inside the bounding box.

Via HTTP API

# Returns base64-encoded PNG of the current window's element tree
curl http://localhost:8080/uimap

Requires a prior find call to select a window. The response data.result field contains the base64 PNG — identical format to the /capture endpoints. In the interactive test console, the UI map button (in the Capture group) renders the result inline in the response log.

Via the desktop UI

Tools → Render UI Map draws the overlay directly on screen for 5 seconds (press Escape to dismiss early) and offers to save it as a PNG file. This also triggers a live screen overlay, which is not available via the HTTP API.

Tools → Output UI Map logs the raw nested JSON element tree to the console tab — useful for inspecting the tree structure or copying it for use with an AI agent.

Element JSON includes bounding rectangles:

{
  "id": 105,
  "controlType": "Button",
  "name": "OK",
  "automationId": "btn_ok",
  "boundingRectangle": { "x": 120, "y": 340, "width": 80, "height": 30 },
  "children": []
}

Available Actions (exec/execute)

General

Action Aliases Value Description
click Smart click: Invoke → Toggle → SelectionItem → mouse fallback
mouse-click mouseclick Force mouse left-click (bypasses smart chain)
middle-click middleclick Middle-mouse-button click
invoke Invoke pattern directly
right-click rightclick Right-click
double-click doubleclick Double-click
click-at clickat x,y Click at pixel offset from element top-left
drag x,y Drag element to screen coordinates
hover Move mouse over element
highlight Draw orange highlight around element for 1 second
focus Set keyboard focus
keys text Send keystrokes; supports {CTRL}, {ALT}, {SHIFT}, {F5}, Ctrl+A, Alt+F4, etc.
screenshot capture Save element image to Desktop\Apex_Captures
describe Return full element property description (UIA properties — not AI vision)
patterns List automation patterns supported by the element
bounds Return bounding rectangle
isenabled Returns True or False
isvisible Returns True or False
wait automationId Wait for element with given AutomationId to appear
wait-page-load waitpageload seconds (default 10) Poll window title until browser page finishes loading; returns page title on success

Visual Studio run buttons: for a test handoff, target name="Debug Target" with type="SplitButton" for the F5/debug path, and name="Start Without Debugging" with type="Button" for the Ctrl+F5/no-debug path. Prefer numeric element IDs after an /elements scan to avoid fuzzy matching entirely.

Wait

Action Aliases Value Description
waitfor see below Poll the current element until predicate satisfied or timeout
wait-window see below Poll the desktop window list until a window title satisfies predicate

waitfor parameters: predicate=<equals|contains|not-empty|visible|gone>, optional property=<value|text|name|isvisible|isenabled>, optional expected=<text>, optional timeout=<ms> (default 10000), optional interval=<ms> (default 200, min 50). visible and gone are element-level — they ignore property and expected. The success response includes elapsed_ms, property, and predicate inside data. On timeout, error_data.last_observed carries the value at the last poll ("offscreen"/"visible" for visible, "present" for gone-while-still-present, otherwise the property string).

wait-window parameters: predicate=<equals|contains|not-empty|gone>, expected=<title-substring> (required for all but not-empty), optional timeout=<ms> (default 10000), optional interval=<ms> (default 250). On match, the new window is registered in the window map and set as the current window — the next /find or /elements call resolves it without needing a window= field. Timeout error_data.last_observed_titles is the array of titles seen at the last poll, useful for debugging.

# Wait for a debug console window to appear after launching an app
curl -X POST http://localhost:8080/exec -H "X-Api-Key: <key>" \
  -d '{"action":"wait-window","predicate":"contains","expected":"Debug Console","timeout":15000}'

# Wait for the current text element to contain a specific value
curl -X POST http://localhost:8080/exec -H "X-Api-Key: <key>" \
  -d '{"action":"waitfor","predicate":"contains","property":"value","expected":"OK","timeout":5000}'

# Wait for the current element to become visible
curl -X POST http://localhost:8080/exec -H "X-Api-Key: <key>" \
  -d '{"action":"waitfor","predicate":"visible","timeout":3000}'

Batch (multiple actions in one /exec call)

Send actions=[...] to /exec to run several commands sequentially in one round trip. Each entry is a full sub-request — cmd defaults to "execute", so simple action lists need only action and (where relevant) value. The optional stop_on_error field defaults to true: the first failing step ends the batch and remaining steps are skipped.

curl -X POST http://localhost:8080/exec -H "X-Api-Key: <key>" \
  -d '{"actions":[
        {"action":"clear"},
        {"action":"type","value":"hello"},
        {"action":"keys","value":"{CTRL}s"}
      ]}'

The response's data.result contains stop_on_error, total_steps, executed, succeeded, and a results array. Each entry has step, cmd, action, success, data, extras (e.g. source for gettext/getvalue steps), and message.

Text / Value

Action Aliases Value Description
type enter text Enter text (smart: Value pattern → keyboard)
insert text Type at current caret position
gettext text Smart read: Text pattern → Value → LegacyIAccessible → Name
getvalue value Smart read: Value → Text → LegacyIAccessible → Name
setvalue text Smart set: Value pattern (if writable) → RangeValue (if numeric) → keyboard
clearvalue Set value to empty string via Value pattern
appendvalue text Append text to current value
getselectedtext Get selected text via Text pattern
selectall Ctrl+A
copy Ctrl+C
cut Ctrl+X
paste Ctrl+V
undo Ctrl+Z
clear Select all and delete

Range / Slider

Action Aliases Value Description
setrange number Set RangeValue pattern
getrange Read current RangeValue
rangeinfo Min / max / smallChange / largeChange

Toggle / CheckBox

Action Aliases Value Description
toggle Toggle CheckBox (cycles state)
toggle-on toggleon Set toggle to On
toggle-off toggleoff Set toggle to Off
gettoggle Read current toggle state (On / Off / Indeterminate)

Expand / Collapse

Action Aliases Value Description
expand Expand via ExpandCollapse pattern
collapse Collapse via ExpandCollapse pattern
expandstate Read current ExpandCollapse state

Selection (SelectionItem / Selection)

Action Aliases Value Description
select item text Select ComboBox/ListBox item by text
select-item selectitem Select current element via SelectionItem pattern
addselect Add element to multi-selection
removeselect Remove element from selection
isselected Returns True or False
getselection Get selected items from a Selection container
select-index selectindex n Select ComboBox/ListBox item by zero-based index
getitems List all items in a ComboBox or ListBox (newline-separated)
getselecteditem Get currently selected item text

Window State

Action Aliases Value Description
minimize Minimize window
maximize Maximize window
restore Restore window to normal state
windowstate Read current window visual state (Normal / Maximized / Minimized)

Transform (Move / Resize)

Action Aliases Value Description
move x,y Move element via Transform pattern
resize w,h Resize element via Transform pattern

Scroll

Mouse scroll actions move the cursor to the element centre before firing the scroll event, so scrolling reliably lands in the browser content area rather than wherever the cursor happens to be.

Action Aliases Value Description
scroll-up scrollup n (optional) Move cursor to element centre, scroll up n clicks (default 3)
scroll-down scrolldown n (optional) Move cursor to element centre, scroll down n clicks (default 3)
scroll-left scrollleft n (optional) Move cursor to element centre, horizontal scroll left n clicks (default 3)
scroll-right scrollright n (optional) Move cursor to element centre, horizontal scroll right n clicks (default 3)
scrollinto scrollintoview Scroll element into view
scrollpercent h,v Scroll to h%/v% position via Scroll pattern (0–100)
getscrollinfo Scroll position and scrollable flags

Grid / Table

Action Aliases Value Description
griditem row,col Get element description at grid cell
gridinfo Row and column counts
griditeminfo Row / column / span for a GridItem element

Capture

Returns a screen capture inline as a base64-encoded PNG in the data field. Supports four targets.

Target Description
element (default) Current element (requires a prior find)
window Current window (requires a prior find)
screen Full display
elements Multiple elements by ID, stitched vertically into one image

For elements, provide comma-separated numeric IDs from a prior elements scan in the value field.

# Current element
curl -X POST http://localhost:8080/capture

# Full screen
curl -X POST http://localhost:8080/capture \
     -H "Content-Type: application/json" \
     -d '{"action":"screen"}'

# Current window
curl -X POST http://localhost:8080/capture \
     -H "Content-Type: application/json" \
     -d '{"action":"window"}'

# Multiple elements stitched into one image
curl -X POST http://localhost:8080/capture \
     -H "Content-Type: application/json" \
     -d '{"action":"elements","value":"42,105,106"}'

Response data field contains the base64 PNG. Decode it to get the image:

curl -s -X POST http://localhost:8080/capture -d '{"action":"screen"}' \
  | python -c "import sys,json,base64; d=json.load(sys.stdin)['data']; open('screen.png','wb').write(base64.b64decode(d))"

Telegram: /capture sends the image as a photo message (not text).

/capture
/capture action=screen
/capture action=window
/capture action=elements value=42,105,106

PowerShell:

$r = Send-FlaUICommand @{ command='capture'; action='screen' }
[IO.File]::WriteAllBytes('screen.png', [Convert]::FromBase64String($r.data))

Note: This is distinct from the screenshot exec action, which saves to Desktop\Apex_Captures and returns only the file path.


OCR

OCR uses Tesseract. Download language files from github.com/tesseract-ocr/tessdata and place them in a tessdata\ folder next to the executable (e.g. tessdata\eng.traineddata). Additional languages work the same way.

Captures saved by OCR Element + Save go to Desktop\Apex_Captures\.


AI (Multimodal)

The AI command set is backed by MtmdHelper using LLamaSharp's multimodal (MTMD) API. Supports vision and audio modalities depending on the model. Every inference call is fully stateless — no chat history is retained between calls.

Download a vision-capable GGUF model and its multimodal projector (e.g. LFM2.5-VL from LM Studio) and note the paths to both .gguf files, or click Download All on the Model tab. Then call ai init before any inference commands.


Project Structure

ApexComputerUse/
├── Program.cs                            — Entry point (`--service`, `--port`, `--pipe`, `--client`)
├── appsettings.json                      — Deployment defaults (Http/pipe/log/shell/test-runner)
├── ai-settings.json                      — AI provider credentials/settings
├── AI/
│   ├── AiChatService.cs                  — Provider-agnostic chat service (streaming + session state)
│   ├── AIDrawingCommand.cs               — GDI+ drawing engine (`/draw`, overlays, built-in demo scene)
│   ├── MtmdHelper.cs                     — Local multimodal model wrapper (LLamaSharp MTMD)
│   ├── MtmdInteractiveModeExecute.cs     — Interactive AI computer-use mode
│   └── SceneChatAgent.cs                 — Scene-oriented assistant logic
├── Automation/
│   ├── FlaUIHelper*.cs                   — UIA wrappers (find, actions, capture, text, keyboard, scrolling)
│   ├── ElementIdGenerator.cs             — Stable hash-based element/window IDs
│   └── UiMapRenderer.cs                  — Colour-coded tree renderer to PNG/overlay
├── Commands/
│   ├── CommandProcessor*.cs              — Core command handlers (find/exec/ocr/capture/ai/scenes/help)
│   ├── CommandLineParser.cs              — cmd.exe command parsing
│   ├── CommandRequest.cs                 — Normalized command DTO
│   └── CommandRequestJsonMapper.cs       — HTTP JSON/query mapping helpers
├── Servers/
│   ├── HttpCommandServer*.cs             — HTTP API + chat/page/scene/system route handlers
│   ├── FormatAdapter.cs                  — Response negotiation (HTML/JSON/text/PDF; includes `PdfWriter`)
│   ├── PipeCommandServer.cs              — Named-pipe server
│   └── TelegramController.cs             — Telegram command surface
├── Scenes/
│   ├── Scene.cs                          — Scene/layer/shape models with stable IDs
│   └── SceneStore.cs                     — Thread-safe scene store (`%LOCALAPPDATA%\ApexComputerUse\scenes`)
├── Clients/
│   ├── RemoteClient.cs                   — Remote endpoint metadata
│   ├── ClientPermissions.cs              — Per-client endpoint permission gates
│   └── ClientStore.cs                    — Persistent client registry (`%LOCALAPPDATA%\ApexComputerUse\clients`)
├── Infrastructure/
│   ├── AppConfig.cs / AppSettings.cs     — Config layering (`appsettings.json` + `APEX_*` + user prefs)
│   ├── AppLog.cs                         — Serilog bootstrap/log sink wiring
│   ├── OcrHelper.cs                      — Tesseract OCR wrapper
│   ├── DownloadManager.cs                — Model/OCR asset download support
│   └── ApexService.cs                    — Windows Service host
└── UI/
    ├── Form1.cs / Form1.Designer.cs      — Main WinForms host
    ├── ServerTabController.cs            — HTTP/pipe/server lifecycle controls
    ├── ChatTabController.cs              — Embedded `/chat` WebView + provider controls
    ├── ModelTabController.cs             — Model/asset management
    ├── ClientsTabController.cs           — Multi-endpoint registry UI
    ├── SceneEditorForm.cs / .Designer.cs — WinForms scene editor
    └── ClientEditForm.cs / .Designer.cs  — Client create/edit dialog

Scripts/                                  — `ApexComputerUse.psm1` (pipe module) and `apex.cmd` (HTTP helper)
restart-apex.bat / restart-apex.ps1       — Restart helpers for local development
AIClients/                                 — AI messaging libraries and harness projects
TestApplications/                          — WPF/WinForms/Web test apps and TestRunner

OCR: place Tesseract language files in a tessdata\ folder next to the executable. Not included in the repo — download from github.com/tesseract-ocr/tessdata.


Development

Build

# Restore and build (Release)
dotnet build -c Release ApexComputerUse/ApexComputerUse.csproj

# Run from source
dotnet run --project ApexComputerUse/ApexComputerUse.csproj

Requires the .NET 10 SDK and the Windows Desktop workload (dotnet workload install windows).

Unit Tests

dotnet test ApexComputerUse.Tests/ApexComputerUse.Tests.csproj

The test suite covers the pure-logic and data-model layers — everything that can be tested without a live desktop session:

Test file Coverage area
ElementIdGeneratorTests.cs Hash mode, incremental mode, reset, thread safety
SceneStoreTests.cs CRUD, disk persistence, concurrent creates
SceneModelTests.cs FlattenForRender, ZIndex ordering, opacity, SceneIds
AIDrawingCommandTests.cs JSON parsing, canvas backgrounds, all 8 shape types
TelegramParseCommandTests.cs Command + key-value parser, DictExtensions.Get
PipeCommandServerTests.cs Named-pipe JSON protocol parser
LevenshteinTests.cs Edit-distance boundary and domain cases
CommandResponseTests.cs ToText / ToJson serialisation
OcrHelperTests.cs CropBitmap region logic, OcrResult.ToString

Components that require an active Windows session (FlaUI UIA, Tesseract, LLamaSharp, WinForms UI) are covered by the existing integration script Scripts/test_controls.py and manual testing.

Integration Test Runner

TestApplications/TestRunner/ is a cycle-based orchestrator that launches the WinForms, WPF, and web test apps, runs the full suite against the live HTTP API, and reports results. Use it whenever changes touch CommandProcessor, FlaUIHelper, or HttpCommandServer.

# Demo mode — human-readable output, 3 cycles
dotnet run --project TestApplications/TestRunner -- --mode demo

# Benchmark mode — JSON-line output, 25 cycles
dotnet run --project TestApplications/TestRunner -- --mode benchmark

Test apps:

  • WinFormsTortureTestForm.cs: textbox, button, checkbox, radio, combo, listbox, slider, menu, grid
  • WPFTortureTestWindow.xaml: same controls plus Expander, ViewModel-driven state
  • Webindex.html: menu, tabs, form controls, scrollable regions

The runner interacts exclusively through the HTTP API, so a failed assertion is reported as the exact curl call that failed. The same suite can also be triggered remotely via POST /run-tests.


Changelog

All notable changes to ApexComputerUse are documented in this file.

[0.16.0] — 2026-05-10

Added

WindowMonitor — desktop window change detection

  • New Infrastructure/WindowMonitor.cs — owns a dedicated background STA thread (UIA3 is COM apartment-affine; thread-pool timers can't safely call into it) that polls the desktop once per second, diffs the window set against the previous snapshot, and fires events:
    • WindowsChanged(IReadOnlyList<WindowSnapshot>) — fires whenever any window opens, closes, or changes title
    • WindowClosed(IntPtr hwnd) — per-HWND closure event, used for cache invalidation
  • WindowSnapshot record — (Hwnd, ProcessId, Title, ElementId). ElementId is generated with excludeName: true so the title can change without rotating the ID.
  • Auto-starts on Form1.Load and is stopped/disposed on OnFormClosed.
  • Tools menu items: Start/Stop Window Monitoring, Watch Elements (slow), Watch Top Window Only, Set Element Window Filter… (substring match against window titles, settable via the inline dialog or programmatically by AI code).

Optional element-level watching

  • WatchElements property — when on, each poll also scans every monitored window's UIA descendants and fires WindowElementsChanged(window, added, removed) with a per-window add/remove diff. Off-screen elements are skipped. Disabled by default (slow).
  • TopWindowOnly (P/Invoke GetForegroundWindow) and ElementWindowFilter (case-insensitive title contains) narrow the element-scan set so it stays tractable.
  • Per-window state is dropped automatically when a window closes; the first scan of a newly-discovered window establishes a baseline (no event), so opens don't dump every control as "added".

Cache invalidation in CommandProcessor

  • New CommandProcessor.InvalidateClosedWindow(IntPtr hwnd) (in CommandProcessor.Windows.cs) — wired from WindowMonitor.WindowClosed. Takes _stateLock, prunes _windowMap entries with the matching HWND, sweeps _elementMap for now-invalid AutomationElement entries via the existing IsElementValid static, removes from all parallel maps (_elementHashes, _elementReverse, _elementParents, _elementDescriptors), clears _currentElement / CurrentWindow if they went stale, and clears _mappedWindowHandle if it matched. _elementReverse.Remove is wrapped in try/catch + LogSwallowed because Dictionary<AutomationElement,_>.Remove calls UIA's CompareElements and can throw COMException on stale proxies.
  • Verified end-to-end: /find Notepad → close Notepad → wait one poll cycle → /status reports Window: (none) and Notepad is gone from /windows.

Inspectable activity buffer

  • WindowMonitor carries a thread-safe ConcurrentQueue<MonitorLogEntry> (default cap 500, FIFO eviction) of recent activity — opens, closes, renames, element add/remove, and internal poll errors. AppendLog, GetLog, ClearLog, plus an IsRunning lifecycle property.
  • New HTTP routes (named /winmon/... to avoid collision with the existing /monitor/{id} RegionMonitor namespace):
    • GET /winmon/log{ count, running, entries: [...] } (append .json for raw JSON)
    • POST /winmon/clear{ cleared: N }
  • Both routes are gated by AllowDiagnostics; loopback callers always pass.
  • 13 new unit tests (WindowMonitorTests, CommandProcessorInvalidateTests) covering the diff logic, log buffer, FIFO eviction, lifecycle, the new properties, and the safe paths of InvalidateClosedWindow. UIA-dependent paths (live element pruning, descendant scan) verified manually via the running app.

[0.15.0] — 2026-05-06

Added

Element Annotations & Filtering

  • New ElementAnnotation model and ElementAnnotationStore — per-element notes and exclusion flags keyed by stable element hash, persisted at %LOCALAPPDATA%\ApexComputerUse\annotations\elements.json. Empty records auto-GC'd.
  • New verbs in CommandProcessor.Annotations.cs: annotate, unannotate, exclude, unexclude, annotations, excluded
  • New HTTP routes: POST /annotate, POST /unannotate, POST /exclude, POST /unexclude, GET /annotations, GET /excluded
  • Notes appear as a note field on /elements output; excluded subtrees are skipped during scan (root never excluded — depth > 0 guard)
  • New query param ?unfiltered=true on /elements bypasses the exclusion filter
  • 7 new unit tests in ElementAnnotationStoreTests.cs

Region Maps

  • New RegionMap model and RegionMapStore — persistent named pixel-coordinate grids tied to a window or stable element hash. One file per map under <exe>/regionmaps/{id}.json
  • Built for AI self-calibration loops on canvas-rendered content (board games, emulators, video timelines) where individual cells are not UIA elements
  • Static helpers: CellToPixel(map, row, col) returns cell center; BuildGridDrawRequest(...) produces a re-usable draw request for both overlay and render paths
  • New verbs in CommandProcessor.RegionMaps.cs: regionmap umbrella with sub-actions list|get|delete|overlay|render|cell
  • New HTTP routes in HttpCommandServer.AnnotationRoutes.cs:
    • GET|POST /regionmap — list/create
    • GET|PUT|PATCH|DELETE /regionmap/{id} — per-map ops
    • POST /regionmap/{id}/overlay — click-through screen overlay
    • POST /regionmap/{id}/render — base64 PNG of screen (or current window) with grid drawn over it; supports {"canvas":"screen"} (default) or {"canvas":"window"} (auto-translates grid coords to window-local)
    • POST /regionmap/{id}/cell{row, col}{x, y} for click-at
  • 10 new unit tests in RegionMapStoreTests.cs (incl. corner-case cell-coord math)

Region Monitors

  • New RegionMonitor model and RegionMonitorStore — persistent per-region screen-change watchers, one file per monitor under %LOCALAPPDATA%\ApexComputerUse\monitors\{id}.json. Each monitor holds an array of MonitorRegion so one logical "watch" can cover multiple indicators (LEDs, status icons, etc.) with independent diffs.
  • New RegionMonitorRunner — background dispatcher; one Task per enabled monitor; per-region capture → diff vs previous → fire SSE event when over threshold. First tick is the baseline (no fire). Disabled monitors are not polled. Region-count changes handled at runtime. Diff via LockBits + Marshal.Copy — per-pixel max-channel-difference > tolerance counts as "changed".
  • New verbs in CommandProcessor.Monitors.cs: monitor umbrella with sub-actions list|get|delete|start|stop|check.
  • New HTTP routes in HttpCommandServer.MonitorRoutes.cs:
    • GET|POST /monitor — list/create
    • GET|PUT|DELETE /monitor/{id} — per-monitor CRUD
    • POST /monitor/{id}/start / /stop — toggle enabled
    • POST /monitor/{id}/check — manual one-shot diff vs current baselines
    • POST /monitor/{id}/snapshot?index=N — base64 PNG of region N right now
  • Notifications via the existing /events SSE stream as monitor.fired events: {monitorId, name, regionIndex, label, x, y, width, height, percentDiff, threshold, seq, time}.
  • Defaults: intervalMs=1000 (floor 100ms), thresholdPct=5.0, tolerance=8, enabled=false.
  • Last-fire telemetry persisted on the monitor: lastFiredUtc, lastPercentDiff, lastRegionIndex, hitCount.
  • 11 new unit tests in RegionMonitorStoreTests.cs covering CRUD, telemetry, persistence, and diff math.

EventBroker generalization

  • EventEnvelope reshaped: int? WindowId plus IReadOnlyDictionary<string, object?> Data replace the fixed WindowId/Title fields. Window events still carry id/title inside Data; non-window subsystems attach arbitrary payloads.
  • New public EventBroker.Emit(string type, IDictionary<string, object?> data, int? windowId = null) for non-window emitters (region monitors today, anything else later).
  • SSE serializer in HttpCommandServer.Events.cs now flattens Data into the frame payload alongside seq/time — both event families render uniformly.
  • JsonElementExtensions.Dbl(name) helper added for parsing thresholdPct.

Public /help Page

  • New optional setting PublicHelpPage (default false): when on, GET /help is reachable without an API key
  • New setting PublicHelpRateLimit (default 30 req/min/IP): sliding 60-second per-IP window protects the unauthenticated route
  • Returns HTTP 429 with Retry-After: 60 when limit exceeded
  • Loopback callers and API-keyed callers always have full access (never rate-limited)
  • New RuntimeFlags static — mutable mirror of AppConfig values seeded at startup, allows GUI changes to take effect without restart
  • GUI controls in Remote Control tab: chkPublicHelp checkbox + numHelpRateLimit numeric input. Persisted in %APPDATA%\ApexComputerUse\settings.json alongside other user prefs.
  • appsettings.json keys: PublicHelpPage, PublicHelpRateLimit. Env: APEX_PUBLIC_HELP_PAGE, APEX_PUBLIC_HELP_RATE_LIMIT.

Changed

UI

  • Remote Control tab cleaned up: lblTelegramStatus moved from (8, 168) — was overlapping the new public-help checkbox — to (465, 104) on the bot-token row. btnStartTelegram shrunk from 120 to 100 wide to make room.
  • Added tooltips to all interactive controls in Remote Control tab (HTTP port/start, API key, Copy, bot token, Start Telegram, allowed chat IDs, public help, rate limit, pipe name, Start Pipe, status labels).

Documentation

  • New LICENSE: PolyForm Noncommercial 1.0.0. Source-available; commercial use requires a separate license.
  • New THIRD_PARTY_NOTICES.md: license attributions for all 9 NuGet dependencies (FlaUI, Serilog, LLamaSharp, Telegram.Bot, Tesseract, etc.). MIT and Apache 2.0 obligations met inline.
  • Merged ACU_AI_CONTROL_GUIDE.md into ACU_CONTROL_GUIDE.md (deleted the former) — single comprehensive guide. Added "Rules of Thumb" section + Annotations + Region Maps coverage.
  • Slimmed ACU_SYSTEM_PROMPT.md from 9 KB → 3.8 KB. Now points at the auto-generated /help page for endpoint reference instead of duplicating tables that drift. Retains auth, mental model, 10 critical rules, minimal control loop.
  • Updated ACU_OPERATIONAL_REFERENCE.md for staleness: added /winrun, annotations, region maps, note field, ?unfiltered, PublicHelpPage/PublicHelpRateLimit config keys.

Fixed

  • CommandProcessor.ScanElementsIntoMap now consults ElementAnnotationStore to skip excluded subtrees and attach notes during scan; existing /elements callers see no behavioral change unless annotations exist.
  • RegionMap.canvas:"window" mode correctly translates screen-absolute grid coords into window-local space before drawing, so the grid lines up with the captured window image.

[0.14.0] — 2026-04-27

Added

Multiple Instance Support

  • Added --port command-line argument to override HTTP listen port for running multiple instances
  • Added --pipe command-line argument to override named-pipe name
  • Added --client command-line argument to mark an instance as a subordinate client (disables Launch Instance button)
  • Port auto-increment in HttpCommandServer.Start() — automatically tries next available port if preferred port is taken
  • New buttons in Clients tab: "Open Web UI" (launches /chat page in default browser) and "Launch Instance" (spawns new instance with incremented port)
  • ClientsTabController.LaunchInstance() auto-registers spawned instance in client list

Client Permissions System

  • New ClientPermissions class with per-client flags: AllowAutomation, AllowCapture, AllowAi, AllowScenes, AllowShellRun, AllowClients
  • Permissions stored in JSON alongside each RemoteClient and loaded on reconnect
  • Permission enforcement in HttpCommandServer: loopback (127.0.0.1) always gets full access; registered clients get their stored permissions; unknown IPs get full access
  • All endpoints gated by appropriate permission: /run requires AllowShellRun, /capture/ocr require AllowCapture, /ai/chat require AllowAi, /scenes/editor require AllowScenes, /clients require AllowClients, everything else requires AllowAutomation
  • ClientEditForm redesigned with two tabs: "Connection" (existing fields) and "Permissions" (6 checkboxes with ShellRun/Clients highlighted in orange)
  • ClientStore.FindByHost(string host) — case-insensitive lookup by hostname

AI Chat with API Tools

  • AiChatService.SetLocalServer(int port, string? apiKey) and ClearLocalServer() — configure local HTTP server context for AI chat
  • Agentic tool loop in AiChatService.SendAsync() — AI can issue ApexComputerUse API calls via apex code blocks
  • System prompt auto-extended with API reference when server context is set, including endpoint list and example calls
  • Loop executes up to 8 turns, executing calls and feeding results back until AI produces clean answer
  • ServerTabController.ToggleHttp() calls SetLocalServer() on start and ClearLocalServer() on stop
  • Parsing and system prompt generation exposed as internal for testing

Security Hardening

  • Timing-safe API key comparison using CryptographicOperations.FixedTimeEquals() (replaced three separate == comparisons)
  • Shell command execution in /run now uses ProcessStartInfo.ArgumentList instead of string concatenation to prevent injection
  • HttpCommandServer.Stop() now explicitly closes HttpListener to immediately release port handles

Bug Fixes

  • Fixed MtmdInteractiveModeExecute infinite loop with hardcoded test path — replaced with proper Console.ReadLine() loop
  • Fixed CommandProcessor element ID lookup to use Equals() instead of ReferenceEquals() (FlaUI uses IUIAutomation.CompareElements)
  • Added 50k-entry cap on CommandProcessor._elementMap to prevent unbounded growth during long sessions
  • Fixed Form1.SetupNetshIfNeeded() blocking UI thread — made async with proper timeout
  • Fixed Form1.AutoLoadModelIfConfigured() fire-and-forget — now logs async exceptions via .ContinueWith()
  • SceneEditorForm canvas paint optimization — eliminated per-paint full-scene bitmap allocation during drag

Changed

Program Structure

  • Program.IsClientInstance — public static property detecting --client flag for UI gating
  • Command-line arg parsing restructured to support flag-only arguments alongside key-value pairs

API & Configuration

  • HttpCommandServer constructor now accepts optional ClientStore? clientStore parameter
  • HttpCommandServer.Port { get; private set; } — made settable internally by Start() for auto-increment
  • RemoteClient.Permissions — new property with ClientPermissions value

UI

  • Form1.Designer.cs — added "Open Web UI" and "Launch Instance" buttons to Clients tab
  • ClientEditForm.Designer.cs — complete redesign with TabControl (Connection / Permissions tabs)
  • ClientsTabController constructor signature expanded with button references and port getter

Testing

  • New test file ApexComputerUse.Tests/AiChatServiceTests.cs with 22 tests covering apex call parsing and system prompt generation
  • ParseApexCalls and BuildApexSystemPrompt exposed as internal via existing InternalsVisibleTo attribute
  • All 171 tests passing (149 existing + 22 new)

Known Limitations

  • AI tool-use loop is non-streaming (full response assembled before delivery)
  • IP-spoofing could bypass permission sandboxing on local network

[0.13.0] — 2026-04-26

Added

  • Clients tab — remote machine registry — a new "Clients" tab (sixth tab in the main UI) lets users and AI maintain a persistent directory of other Apex-enabled machines. Each entry stores a friendly name, host/IP, port, API key, OS version, and description. Entries are listed in a six-column ListView and persisted to <exe>/clients/{id}.json using the same thread-safe JSON store pattern as scenes.
  • ClientStore (Clients/ClientStore.cs) — thread-safe store that loads all client records from disk on startup and writes individual JSON files on every create, update, or delete.
  • RemoteClient (Clients/RemoteClient.cs) — data model with [JsonPropertyName] attributes matching the project's snake_case serialization convention.
  • ClientsTabController (UI/ClientsTabController.cs) — tab logic wired to Add, Edit, Remove, and Test buttons. Test Connection fires an async GET /ping against the selected client's host:port (with its API key if set) and updates a live Status column green/red in-place, with no UI blocking.
  • ClientEditForm (UI/ClientEditForm.cs / ClientEditForm.Designer.cs) — fixed-size dialog for creating and editing client entries, with name/host required-field validation and port range validation.

[0.12.0] — 2026-04-26

Added

  • Embedded HTML chat in the Chat tab — the Chat tab's RichTextBox, input field, and Send button have been replaced by an embedded Microsoft.Web.WebView2 control hosting the existing /chat streaming page directly inside the app. Click Load Chat to navigate the WebView2 to http://localhost:{port}/chat?apiKey=.... The HTML page handles streaming, the "New chat" reset, and provider/model status display natively.
  • HTTP server auto-start on launchHttpAutoStart and HttpBindAll are now true by default in appsettings.json. The HTTP server starts and binds to all interfaces automatically when the app opens; no manual click on the Remote Control tab is required.
  • Model auto-load on launch — if model and projector paths are saved in settings.json, the local vision model is loaded automatically at startup without opening the Model tab.
  • First-run netsh setup — on the very first launch, the app checks whether the HTTP URL ACL (http://+:8081/) and the Windows Firewall inbound rule (ApexComputerUse) exist. If either is missing, a single elevated cmd session (one UAC prompt) runs both netsh commands. The result is persisted to settings.json (NetshConfigured = true) so the check never repeats.
  • Restart scriptsrestart-apex.bat and restart-apex.ps1 at the repo root kill all running instances (taskkill /F /IM ApexComputerUse.exe) and relaunch the app. Both prefer the Release build, fall back to Debug, and fall back to dotnet run if no built exe is found.

Changed

  • ChatTabController — removed _rtbChatHistory, _txtChatInput, _btnChatSend, AppendToChat, AppendColoredText, SendOrCancelAsync, ExecuteCommandsFromResponse, and CurlRx. Constructor now accepts a WebView2 instead. OpenChat() navigates the embedded WebView2; ResetChat() calls Reload().
  • AppSettings — added NetshConfigured bool field (persisted to %APPDATA%\ApexComputerUse\settings.json) for first-run netsh tracking.

[0.11.0] — 2026-04-16

Added

  • /elements?match=<text> — case-insensitive substring search across Name, AutomationId, and Value pattern. Returns only branches containing matches, each wrapped in its ancestor path (non-matching siblings pruned). depth now controls how deep to render under each match, so one call replaces the repeated drill-down pattern of "fetch tree → spot candidate → fetch subtree". Composes with type= and onscreen=true.
  • /elements?collapseChains=true — folds "1-in-1-in-1" wrapper chains that dominate web accessibility trees. A node is skipped only when it has exactly one child, no Name, no AutomationId, and its control type is Pane, Group, or Custom. Named containers and anything with an AutomationId are preserved. IDs of hoisted descendants are unchanged — follow-up /elements?id=<id> and /execute id=<id> calls continue to work against the real (unflattened) tree.
  • /elements?includePath=true — every emitted node gains a path breadcrumb string (e.g. "Chrome > Document > Main > Form") so an agent can orient itself without climbing back up the tree.
  • /elements?properties=extra — opt-in per-node value (via Value pattern, when the element supports it) and helpText properties. Off by default so token budgets don't change silently; needed for web inputs whose Name is empty and whose visible content lives in the Value pattern.
  • descendantCount on truncated nodes — nodes cut off by depth now emit descendantCount: N alongside the existing childCount, so an agent can decide whether a subtree is worth expanding without another round trip.
  • Structured /find response/find now populates a JSON element object on the response (id, controlType, name, automationId, className, frameworkId, isEnabled, isOffscreen, boundingRectangle, plus value/helpText when properties=extra) alongside the existing human-readable string in message. The element's numeric ID is recovered from the most recent /elements scan when available.
  • Tree-shape unit tests (ApexComputerUse.Tests/CommandProcessorTreeTests.cs) — covers FilterTreeByMatch (case-insensitive, AutomationId + Value lookup, sibling pruning), CollapseSingleChildChains (identity-less-only collapse, multi-child preservation, ID stability), and ElementNode JSON round-trip for the new opt-in fields.

Changed

  • CommandProcessor.ElementNode / BoundingRect promoted from private to internal sealed class so the new in-process post-processors (FilterTreeByMatch, CollapseSingleChildChains) and the test project (InternalsVisibleTo) can exercise them directly.
  • ScanElementsIntoMap now accepts a ScanOptions struct (IncludePath + IncludeExtra + depth) and threads the parent breadcrumb through recursion without changing call-site signatures for existing endpoints.

[0.10.0] — 2026-04-16

Added

  • AI Chat window — Tools → AI Chat opens a standalone chat interface powered by the AiMessagingCore library. Supports 8 providers: OpenAI, Anthropic, DeepSeek, Grok, Groq, Duck, LM Studio, and LlamaSharp (local GGUF). Streams tokens in real-time; shows timing metrics (total tokens, tokens/second, time-to-first-token). Provider, model, system prompt, and sample query are persisted to ai-settings.json next to the executable.
  • AIClients solution integrated — both AiMessagingCore (class library) and AIClients (standalone WinForms harness) are now included in ApexComputerUse.sln for single-solution editing. AIClients.sln and AIClients.exe remain fully independent and buildable on their own.
  • ai-settings.json — starter settings file (copied to output on build) with placeholder API keys for all 8 providers. Replace placeholders with real keys to activate each provider.

Fixed

  • ProviderSettings.ApiKey and AiLibrarySettings.DefaultProvider changed from init-only to set so runtime configuration updates (provider switch, API key override) can be applied without reconstructing the settings objects.
  • HandleChatStatus in HttpCommandServer now returns Dictionary<string, string> matching the ApexResult.Data contract; sessionActive is serialized as "True" / "False".

[0.9.0] — 2026-04-07

Added

  • capture command — returns screen captures inline as base64 PNG in the data response field. No file is written to disk. Four targets via action=:
    • screen — full display
    • window — current window (requires prior find)
    • element (default) — current element (requires prior find)
    • elements value=id1,id2,... — multiple elements by numeric ID, stitched vertically into one image
  • HTTP: POST /capture
  • Named pipe / PowerShell: command=capture; new Invoke-FlaUICapture cmdlet in ApexComputerUse.psm1
  • cmd.exe: apex capture [action=...] [value=...] in apex.cmd
  • Telegram: /capture — response delivered as a photo message, not text

[0.8.0] — 2026-04-07

Added

  • Persistent element ID mapelements command now recursively scans the UI tree using ElementIdGenerator (SHA-256 hash-based, deterministic across sessions). Each element receives a stable numeric ID that survives app restarts.
  • Nested JSON element map outputelements returns the full window tree as indented, nested JSON (id, controlType, name, automationId, children), replacing the flat string list.
  • Window map with persistent IDswindows command now returns a JSON array of {id, title} pairs. IDs are hash-based and stable for the same window across sessions.
  • Map-based lookup in find — pass a numeric ID from either windows or elements as the window= or id= parameter; the element is resolved directly from the in-memory map without a fuzzy search.
  • Auto-focus on every find — the matched window is brought into foreground focus automatically; no separate focus action required.
  • "Output UI Map" menu item — Tools menu item captures the UI tree of the currently selected window and prints the nested JSON to the log.
  • Full ElementOperations parity — all UIA patterns now covered by both ApexHelper and CommandProcessor:

New exec actions

Action Description
mouse-click Force mouse left-click (bypasses Invoke/Toggle/SelectionItem)
middle-click Middle-mouse-button click
click-at value=x,y Click at pixel offset from element top-left
drag value=x,y Drag element to screen coordinates
highlight Draw orange highlight around element for 1 second
isenabled Returns True/False
isvisible Returns True/False
clearvalue Set value to empty string (Value pattern)
appendvalue Append text to current value
getselectedtext Selected text via Text pattern
setrange value=n Set RangeValue pattern
getrange Read current RangeValue
rangeinfo Min / max / smallChange / largeChange
toggle-on / toggle-off Set toggle to a specific state
gettoggle Read current toggle state (On / Off / Indeterminate)
expandstate Read ExpandCollapse state
select-item Select via SelectionItem pattern
addselect Add element to multi-selection
removeselect Remove element from selection
isselected Check SelectionItem selected state
getselection Get selected items from a Selection container
select-index value=n Select ComboBox / ListBox item by zero-based index
getitems List all items in a ComboBox or ListBox
getselecteditem Get currently selected item text
minimize / maximize / restore Window visual state
windowstate Read current window visual state
move value=x,y Move element via Transform pattern
resize value=w,h Resize element via Transform pattern
scroll-left / scroll-right value=n Horizontal mouse scroll
scrollpercent value=h,v Scroll to h%/v% via Scroll pattern
getscrollinfo Scroll position and scrollable flags
griditem value=row,col Get element at grid cell
gridinfo Row and column counts
griditeminfo Row / column / span for a GridItem element

Upgraded exec actions

Action Change
click Now smart: Invoke → Toggle → SelectionItem → mouse fallback
gettext Smart chain: Text pattern → Value → LegacyIAccessible → Name
getvalue Smart chain: Value → Text → LegacyIAccessible → Name
setvalue Smart chain: Value (if writable) → RangeValue (if numeric) → keyboard
select Tries SelectionItem on list child first, then FlaUI wrappers
keys Full {KEY} token notation ({CTRL}, {F5}, …) and Ctrl+A / Alt+F4 combo syntax

[0.7.0] — 2026-04-06

Added

  • windows command returns a JSON array of {id, title} for all open windows, enabling the AI to select precisely without relying on fuzzy matching.

[0.6.0] — 2026-04-06

Added

  • Named-pipe server (PipeCommandServer) — exposes the full command set over a Windows named pipe (default name ApexComputerUse). Each client connection is session-based (state is preserved across commands on the same connection). Accepts and returns newline-delimited JSON.
  • Pipe server UI — new row in the Remote Control group box: configurable pipe name, Start/Stop button, and live status label.
  • Scripts\ApexComputerUse.psm1 — PowerShell module providing idiomatic cmdlets over the named pipe: Connect-FlaUI, Disconnect-FlaUI, Send-FlaUICommand, Get-FlaUIWindows, Get-FlaUIStatus, Get-FlaUIHelp, Get-FlaUIElements, Find-FlaUIElement, Invoke-FlaUIAction, Invoke-FlaUIOcr, Invoke-FlaUIAi.
  • Scripts\apex.cmd — cmd.exe batch helper wrapping the HTTP server with simpler positional syntax (e.g. apex find Notepad, apex exec click, apex ai describe). Requires curl (built-in Windows 10+).

[0.5.0] — 2026-04-06

Added

  • AI multimodal command set (MtmdHelper integration) — expose the existing MtmdHelper class through all remote interfaces.
  • CommandRequest extended with ModelPath, MmProjPath, and Prompt fields.
  • ai command in CommandProcessor with five sub-actions:
    • init — load the LLM and multimodal projector from disk (model= + proj= paths).
    • status — report whether the model is loaded and which modalities it supports.
    • describe — capture the current UI element and ask the vision model to describe it (optional prompt=).
    • file — send an image or audio file from disk to the model (value=<path>, optional prompt=).
    • ask — ask an arbitrary question about the current UI element (prompt= required).
  • HTTP endpoints for AI commands: GET /ai/status; POST /ai/init, /ai/describe, /ai/file, /ai/ask.
  • Telegram /ai command — same sub-action set via action=<sub> key-value syntax.
  • Updated help command output to list all ai sub-actions.

[0.4.0] — 2026-04-06

Added

  • HTTP REST server (HttpCommandServer) — control the application via curl on a configurable port (default 8080). Endpoints: GET /status, /windows, /elements, /help; POST /find, /execute, /ocr.
  • Telegram bot (TelegramController) — same command set over Telegram. Supports /find, /exec, /ocr, /status, /windows, /elements, /help. Key=value argument syntax with quoted multi-word values.
  • CommandProcessor — shared command engine used by both remote interfaces. Auto-accepts fuzzy window/element matches (no UI prompts in remote mode). Fires OnLog events forwarded to the form's status box.
  • Remote Control group box in the UI — start/stop HTTP server and Telegram bot with live status indicators.
  • FlaUIHelper.ListWindowTitles() — returns titles of all open windows.
  • FlaUIHelper.ListElements(Window, ControlType?) — lists all elements in a window with optional ControlType filter.
  • README.md — full usage documentation including curl examples and Telegram command reference.
  • CHANGELOG.md — this file.

[0.3.0] — 2026-04-06

Added

  • OCR (OcrHelper) — captures any UI element and runs Tesseract OCR on it.
    • OcrElement — capture and recognise.
    • OcrElementAndSave — capture, save image to disk, then recognise (useful for debugging).
    • OcrElementRegion — OCR a sub-rectangle of the element.
    • OcrFile — OCR an existing image file.
  • tessdata\eng.traineddata bundled in project and copied to output on build.
  • OCR actions available in the Any Element action group in the UI.

[0.2.0] — 2026-04-06

Added

  • Fuzzy window matching — tries exact match, then contains, then Levenshtein closest. Prompts for approval on non-exact matches.
  • Fuzzy element matching — same three-tier logic, applied to AutomationId or Name.
  • Search Type combo — filter element search by ControlType. All searches every type without restriction. All is never passed as a ControlType value to FlaUI.
  • Levenshtein distance implementation in FlaUIHelper.
  • FlaUIHelper.FindWindowFuzzy and FlaUIHelper.FindElementFuzzy returning match metadata (exact vs fuzzy, matched value).

Changed

  • Form height extended to accommodate the new Search Type row.

[0.1.0] — 2026-04-06

Added

  • Initial AI computer use application (WinForms) targeting .NET 10.
  • FlaUIHelper class wrapping FlaUI UIA3 for all common WPF/WinForms control interactions:
    • Button, TextBox, PasswordBox, Label, ComboBox, CheckBox, RadioButton, ListBox, ListView, DataGrid, TreeView, Menu/MenuItem, TabControl, Slider, ProgressBar, Hyperlink.
    • Mouse operations: click, right-click, double-click, hover, drag & drop, scroll.
    • Keyboard: type, send key, shortcuts (Ctrl+A/C/X/V/Z).
    • Text: select all, copy, cut, paste, undo, clear, insert at caret.
    • Value/RangeValue patterns, ExpandCollapse, ScrollItem, Transform.
    • Screenshots via FlaUI.Core.Capturing.
    • Retry.WhileNull for waiting on dynamic elements.
    • Window operations: move, resize, minimize, maximize, restore, close.
    • Focus: SetFocus, GetFocusedElement.
  • Form UI with:
    • Window Name, AutomationId, Element Name fields.
    • Control Type picker (action groups) and Action picker.
    • Value/Index field for parameterised actions.
    • Find Element, Execute Action, Clear Log buttons.
    • Timestamped output log.
  • Designer-compatible Form1.Designer.cs (standard generated format, no lambdas or helpers inside InitializeComponent).

About

ApexComputerUse reads the Windows accessibility tree (the same data the OS exposes to screen readers) and serves it over a plain HTTP REST API. Any AI agent — in any language, on any machine — can find, inspect, and control any desktop app or browser by making simple HTTP requests. No screenshots. No pixel coordinates. No cloud dependency.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages