WinCtl is a Windows desktop control CLI for AI agents.
It gives terminal-based agents a small, auditable tool layer for observing and controlling a Windows desktop: screenshots, active windows, mouse, keyboard, clipboard, launch/focus, optional OCR, and optional UI Automation inspection.
WinCtl is not an autonomous agent. It does not plan tasks. It only exposes desktop control primitives through JSON-first CLI commands.
observe,screenshot,screenactive-window,windows,focus,launchclick,move,type,paste,hotkey,key,scroll,waitocr,find-text,click-textwith optional Tesseract/Pillow dependenciesinspectwith optionalpywinautotokenize,tokens,resolve-token,click-token,wait-window,wait-text,wait-token- Stable token
key,signature,score, dedupe, query match modes, and background capability fields - Background-first commands:
observe-window,inspect-window,compat-window,invoke-token,set-token-text,select-token,scroll-token - Token queries with
tokens/resolve-token/click-token --text/--source/--type - Trace commands for reviewing, exporting, and doctoring action logs
- Optional MCP server for agent integrations with safe dry-run defaults
- JSON output and action logging to
runs/actions.jsonl --dry-run, action delay, and optional--afterscreenshots
From PowerShell:
cd E:\javaProject\winctl
python -m pip install -e .The package also works without installation from the project directory:
cd E:\javaProject\winctl
python -m winctl observeOptional OCR support:
python -m pip install -e .[ocr]OCR also requires the Tesseract executable on PATH.
Optional UI Automation inspection:
python -m pip install -e .[inspect]cd E:\javaProject\winctl
python -m winctl observe
python -m winctl active-window
python -m winctl launch notepad --wait-title Notepad --focus
python -m winctl tokenize --compact --active-window-only
python -m winctl resolve-token --text "Untitled"
python -m winctl click-token 1 --dry-run --expect-title Notepad
python -m winctl paste "hello from agent" --expect-title Notepad --after
python -m winctl observepython -m winctl observe
python -m winctl screenshot runs\manual.bmp
python -m winctl screen
python -m winctl active-window
python -m winctl windows
python -m winctl focus --title "Notepad"
python -m winctl launch notepad --wait-title "Notepad" --focuspython -m winctl click 100 100 --dry-run
python -m winctl click 100 100 --allow-foreground --after
python -m winctl move 800 500 --dry-run
python -m winctl type "hello" --dry-run
python -m winctl paste "safe text" --expect-title "Notepad" --dry-run
python -m winctl hotkey ctrl l --dry-run
python -m winctl key enter --dry-run
python -m winctl scroll -5 --dry-runEvery action outputs JSON. Real foreground actions are blocked unless --allow-foreground is explicit. Executed actions append a line to runs/actions.jsonl.
WinCtl v0.7 is background-first: it prefers UI Automation and window-scoped observation over global mouse, keyboard, focus, or z-order changes.
python -m winctl observe-window --title "Notepad" --background --capture-backend auto
python -m winctl compat-window --title "Notepad"
python -m winctl inspect-window --title "Notepad"
python -m winctl tokenize --title "Notepad" --background --compact --capture-backend auto
python -m winctl invoke-token --text "OK" --dry-run
python -m winctl invoke-token --text "OK" --after
python -m winctl trace doctorIf a token cannot be controlled through UIA patterns, WinCtl returns requires_foreground: true instead of secretly moving the mouse. Real foreground actions such as focus, click, move, type, paste, hotkey, key, scroll, click-token, and click-text require explicit --allow-foreground.
Background token actions with --expect-title or --expect-process guard the token's target window instead of the active foreground window.
--capture-backend auto tries PrintWindow first, then falls back to safe screen-region BitBlt with diagnostics. BitBlt never focuses a window, but it can include occlusion if another window covers the target. Use compat-window to inspect capture and UIA readiness before automating a new app.
tokenize combines the active window, optional UI Automation data, and optional OCR data into a stable list of tokens. The latest result is saved to runs\tokens.json.
python -m winctl compat-window --title "Notepad"
python -m winctl observe-window --title "Notepad" --background --capture-backend auto
python -m winctl tokenize --title "Notepad" --background --compact --capture-backend auto
python -m winctl resolve-token --text "OK" --match exact
python -m winctl invoke-token --text "OK" --match exact --dry-run
python -m winctl invoke-token --text "OK" --match exact --after
python -m winctl trace doctor
python -m winctl observeTokens include id, key, signature, type, text, name, source, rect, center, confidence, clickable, enabled, visible, bounds_valid, score, background_supported, requires_foreground, actions, patterns, and backend. When OCR or UIA dependencies are missing, tokenize still returns active-window tokens and includes diagnostics.
Waiting commands are non-destructive and output JSON:
python -m winctl wait-window "Notepad" --timeout 10
python -m winctl wait-token "OK" --timeout 10
python -m winctl wait-text "Done" --timeout 10Token queries are the recommended way for agents to operate UI elements without depending on unstable IDs or raw coordinates.
python -m winctl observe
python -m winctl tokenize --compact --active-window-only
python -m winctl tokens --compact --clickable --sort score
python -m winctl resolve-token --text "OK" --match exact --source uia --type button
python -m winctl click-token --text "OK" --match exact --source uia --type button --dry-run
python -m winctl click-token --source uia --type button --text "OK" --match exact --expect-title "Target App" --after
python -m winctl trace doctor
python -m winctl observeresolve-token is the recommended no-action preflight for ambiguous UI. click-token returns a JSON error with candidates when a query matches multiple clickable tokens. Add filters or pass --first to explicitly select the first match.
python -m winctl resolve-token --text "OK" --first
python -m winctl click-token --text "OK" --first --dry-runpython -m winctl ocr
python -m winctl find-text "OK"
python -m winctl click-text "OK" --afterIf OCR dependencies are unavailable, the command fails with an installation hint while the rest of WinCtl continues to work.
python -m winctl inspect --max-depth 3This uses optional pywinauto. Some apps do not expose useful UI Automation trees; WinCtl reports that gracefully.
Trace commands read runs\actions.jsonl and never perform desktop actions. Action results and logs include trace_id, session_id, guard details, and target-window context.
python -m winctl trace list
python -m winctl trace show --limit 5
python -m winctl trace doctor
python -m winctl trace export --output runs\trace.jsonInstall optional MCP support:
python -m pip install -e .[mcp]Run the server:
python -m winctl.mcpIf the MCP SDK is missing, WinCtl prints a concise installation hint instead of a stack trace.
- Observe before acting.
- Prefer
observe-window,tokenize --background,resolve-token, and UIA background actions over raw coordinates. - Prefer text/control selectors over coordinates when reliable.
- Verify after each action.
- Stop before destructive actions, payments, account changes, sending messages, deleting files, installing software, auth prompts, password entry, or 2FA.
- Use
--dry-runfor risky commands; pass--allow-foregroundonly when mouse/keyboard/focus interference is acceptable. - Use
--expect-titleor--expect-processbefore typing, pasting, clicking, or hotkeys in a specific app. - Review
trace_id, guard audit, andtrace doctoroutput in action results andruns/actions.jsonl. - Use a dedicated browser profile and low-privilege account for autonomous workflows.
Most browser automation tools stop at the web. Most GUI agents are full autonomous systems. WinCtl sits in the middle: a minimal, inspectable Windows control layer that an agent can call repeatedly while keeping planning and safety in the agent itself.
cd E:\javaProject\winctl
python -m winctl launch notepad --wait-title "Notepad" --focus
python -m winctl paste "WinCtl smoke test" --expect-title "Notepad" --allow-foreground
python -m winctl observeThen inspect runs\latest.bmp.
- If screenshots are blank, check Windows privacy/screen capture restrictions and try running from a normal interactive desktop session.
- If clicks land in the wrong place, check display scaling and multi-monitor coordinates with
python -m winctl screen. - If typing is unreliable in a target app, use
pasteinstead oftype. - If
focusdoes not bring a window forward, click the window once manually or run WinCtl from the active desktop session. - If OCR fails, install Tesseract and ensure
tesseract.exeis onPATH.
src/winctl/ CLI, Windows API wrappers, input, screen, OCR, safety
tests/ Parser and dry-run tests
examples/ Smoke-test workflows
docs/ Token schema and agent policy
runs/ Local screenshots and action logs (ignored)