Skip to content

luSkyl/winctl

Repository files navigation

WinCtl

WinCtl is a Windows desktop control CLI for AI agents.

It gives terminal-based agents a small, auditable tool layer for observing and controlling a Windows desktop: screenshots, active windows, mouse, keyboard, clipboard, launch/focus, optional OCR, and optional UI Automation inspection.

WinCtl is not an autonomous agent. It does not plan tasks. It only exposes desktop control primitives through JSON-first CLI commands.

Features

  • observe, screenshot, screen
  • active-window, windows, focus, launch
  • click, move, type, paste, hotkey, key, scroll, wait
  • ocr, find-text, click-text with optional Tesseract/Pillow dependencies
  • inspect with optional pywinauto
  • tokenize, tokens, resolve-token, click-token, wait-window, wait-text, wait-token
  • Stable token key, signature, score, dedupe, query match modes, and background capability fields
  • Background-first commands: observe-window, inspect-window, compat-window, invoke-token, set-token-text, select-token, scroll-token
  • Token queries with tokens / resolve-token / click-token --text / --source / --type
  • Trace commands for reviewing, exporting, and doctoring action logs
  • Optional MCP server for agent integrations with safe dry-run defaults
  • JSON output and action logging to runs/actions.jsonl
  • --dry-run, action delay, and optional --after screenshots

Install

From PowerShell:

cd E:\javaProject\winctl
python -m pip install -e .

The package also works without installation from the project directory:

cd E:\javaProject\winctl
python -m winctl observe

Optional OCR support:

python -m pip install -e .[ocr]

OCR also requires the Tesseract executable on PATH.

Optional UI Automation inspection:

python -m pip install -e .[inspect]

Quick Start

cd E:\javaProject\winctl
python -m winctl observe
python -m winctl active-window
python -m winctl launch notepad --wait-title Notepad --focus
python -m winctl tokenize --compact --active-window-only
python -m winctl resolve-token --text "Untitled"
python -m winctl click-token 1 --dry-run --expect-title Notepad
python -m winctl paste "hello from agent" --expect-title Notepad --after
python -m winctl observe

Core Commands

python -m winctl observe
python -m winctl screenshot runs\manual.bmp
python -m winctl screen
python -m winctl active-window
python -m winctl windows
python -m winctl focus --title "Notepad"
python -m winctl launch notepad --wait-title "Notepad" --focus

Desktop Actions

python -m winctl click 100 100 --dry-run
python -m winctl click 100 100 --allow-foreground --after
python -m winctl move 800 500 --dry-run
python -m winctl type "hello" --dry-run
python -m winctl paste "safe text" --expect-title "Notepad" --dry-run
python -m winctl hotkey ctrl l --dry-run
python -m winctl key enter --dry-run
python -m winctl scroll -5 --dry-run

Every action outputs JSON. Real foreground actions are blocked unless --allow-foreground is explicit. Executed actions append a line to runs/actions.jsonl.

Background Mode

WinCtl v0.7 is background-first: it prefers UI Automation and window-scoped observation over global mouse, keyboard, focus, or z-order changes.

python -m winctl observe-window --title "Notepad" --background --capture-backend auto
python -m winctl compat-window --title "Notepad"
python -m winctl inspect-window --title "Notepad"
python -m winctl tokenize --title "Notepad" --background --compact --capture-backend auto
python -m winctl invoke-token --text "OK" --dry-run
python -m winctl invoke-token --text "OK" --after
python -m winctl trace doctor

If a token cannot be controlled through UIA patterns, WinCtl returns requires_foreground: true instead of secretly moving the mouse. Real foreground actions such as focus, click, move, type, paste, hotkey, key, scroll, click-token, and click-text require explicit --allow-foreground.

Background token actions with --expect-title or --expect-process guard the token's target window instead of the active foreground window.

--capture-backend auto tries PrintWindow first, then falls back to safe screen-region BitBlt with diagnostics. BitBlt never focuses a window, but it can include occlusion if another window covers the target. Use compat-window to inspect capture and UIA readiness before automating a new app.

Token Workflow

tokenize combines the active window, optional UI Automation data, and optional OCR data into a stable list of tokens. The latest result is saved to runs\tokens.json.

python -m winctl compat-window --title "Notepad"
python -m winctl observe-window --title "Notepad" --background --capture-backend auto
python -m winctl tokenize --title "Notepad" --background --compact --capture-backend auto
python -m winctl resolve-token --text "OK" --match exact
python -m winctl invoke-token --text "OK" --match exact --dry-run
python -m winctl invoke-token --text "OK" --match exact --after
python -m winctl trace doctor
python -m winctl observe

Tokens include id, key, signature, type, text, name, source, rect, center, confidence, clickable, enabled, visible, bounds_valid, score, background_supported, requires_foreground, actions, patterns, and backend. When OCR or UIA dependencies are missing, tokenize still returns active-window tokens and includes diagnostics.

Waiting commands are non-destructive and output JSON:

python -m winctl wait-window "Notepad" --timeout 10
python -m winctl wait-token "OK" --timeout 10
python -m winctl wait-text "Done" --timeout 10

Token Query Workflow

Token queries are the recommended way for agents to operate UI elements without depending on unstable IDs or raw coordinates.

python -m winctl observe
python -m winctl tokenize --compact --active-window-only
python -m winctl tokens --compact --clickable --sort score
python -m winctl resolve-token --text "OK" --match exact --source uia --type button
python -m winctl click-token --text "OK" --match exact --source uia --type button --dry-run
python -m winctl click-token --source uia --type button --text "OK" --match exact --expect-title "Target App" --after
python -m winctl trace doctor
python -m winctl observe

resolve-token is the recommended no-action preflight for ambiguous UI. click-token returns a JSON error with candidates when a query matches multiple clickable tokens. Add filters or pass --first to explicitly select the first match.

python -m winctl resolve-token --text "OK" --first
python -m winctl click-token --text "OK" --first --dry-run

OCR and Text Clicking

python -m winctl ocr
python -m winctl find-text "OK"
python -m winctl click-text "OK" --after

If OCR dependencies are unavailable, the command fails with an installation hint while the rest of WinCtl continues to work.

UI Automation Inspect

python -m winctl inspect --max-depth 3

This uses optional pywinauto. Some apps do not expose useful UI Automation trees; WinCtl reports that gracefully.

Trace

Trace commands read runs\actions.jsonl and never perform desktop actions. Action results and logs include trace_id, session_id, guard details, and target-window context.

python -m winctl trace list
python -m winctl trace show --limit 5
python -m winctl trace doctor
python -m winctl trace export --output runs\trace.json

MCP Server

Install optional MCP support:

python -m pip install -e .[mcp]

Run the server:

python -m winctl.mcp

If the MCP SDK is missing, WinCtl prints a concise installation hint instead of a stack trace.

Safety Protocol

  • Observe before acting.
  • Prefer observe-window, tokenize --background, resolve-token, and UIA background actions over raw coordinates.
  • Prefer text/control selectors over coordinates when reliable.
  • Verify after each action.
  • Stop before destructive actions, payments, account changes, sending messages, deleting files, installing software, auth prompts, password entry, or 2FA.
  • Use --dry-run for risky commands; pass --allow-foreground only when mouse/keyboard/focus interference is acceptable.
  • Use --expect-title or --expect-process before typing, pasting, clicking, or hotkeys in a specific app.
  • Review trace_id, guard audit, and trace doctor output in action results and runs/actions.jsonl.
  • Use a dedicated browser profile and low-privilege account for autonomous workflows.

Why This Exists

Most browser automation tools stop at the web. Most GUI agents are full autonomous systems. WinCtl sits in the middle: a minimal, inspectable Windows control layer that an agent can call repeatedly while keeping planning and safety in the agent itself.

Notepad Smoke Test

cd E:\javaProject\winctl
python -m winctl launch notepad --wait-title "Notepad" --focus
python -m winctl paste "WinCtl smoke test" --expect-title "Notepad" --allow-foreground
python -m winctl observe

Then inspect runs\latest.bmp.

Troubleshooting

  • If screenshots are blank, check Windows privacy/screen capture restrictions and try running from a normal interactive desktop session.
  • If clicks land in the wrong place, check display scaling and multi-monitor coordinates with python -m winctl screen.
  • If typing is unreliable in a target app, use paste instead of type.
  • If focus does not bring a window forward, click the window once manually or run WinCtl from the active desktop session.
  • If OCR fails, install Tesseract and ensure tesseract.exe is on PATH.

Project Layout

src/winctl/       CLI, Windows API wrappers, input, screen, OCR, safety
tests/            Parser and dry-run tests
examples/         Smoke-test workflows
docs/             Token schema and agent policy
runs/             Local screenshots and action logs (ignored)

About

Windows DesktopCtl-like CLI for AI agents

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages