WinCtl

WinCtl is a Windows desktop control CLI for AI agents.

It gives terminal-based agents a small, auditable tool layer for observing and controlling a Windows desktop: screenshots, active windows, mouse, keyboard, clipboard, launch/focus, optional OCR, and optional UI Automation inspection.

WinCtl is not an autonomous agent. It does not plan tasks. It only exposes desktop control primitives through JSON-first CLI commands.

Features

observe, screenshot, screen
active-window, windows, focus, launch
click, move, type, paste, hotkey, key, scroll, wait
ocr, find-text, click-text with optional Tesseract/Pillow dependencies
inspect with optional pywinauto
tokenize, tokens, resolve-token, click-token, wait-window, wait-text, wait-token
Stable token key, signature, score, dedupe, query match modes, and background capability fields
Background-first commands: observe-window, inspect-window, compat-window, invoke-token, set-token-text, select-token, scroll-token
Token queries with tokens / resolve-token / click-token --text / --source / --type
Trace commands for reviewing, exporting, and doctoring action logs
Optional MCP server for agent integrations with safe dry-run defaults
JSON output and action logging to runs/actions.jsonl
--dry-run, action delay, and optional --after screenshots

Install

From PowerShell:

cd E:\javaProject\winctl
python -m pip install -e .

The package also works without installation from the project directory:

cd E:\javaProject\winctl
python -m winctl observe

Optional OCR support:

python -m pip install -e .[ocr]

OCR also requires the Tesseract executable on PATH.

Optional UI Automation inspection:

python -m pip install -e .[inspect]

Quick Start

cd E:\javaProject\winctl
python -m winctl observe
python -m winctl active-window
python -m winctl launch notepad --wait-title Notepad --focus
python -m winctl tokenize --compact --active-window-only
python -m winctl resolve-token --text "Untitled"
python -m winctl click-token 1 --dry-run --expect-title Notepad
python -m winctl paste "hello from agent" --expect-title Notepad --after
python -m winctl observe

Core Commands

python -m winctl observe
python -m winctl screenshot runs\manual.bmp
python -m winctl screen
python -m winctl active-window
python -m winctl windows
python -m winctl focus --title "Notepad"
python -m winctl launch notepad --wait-title "Notepad" --focus

Desktop Actions

python -m winctl click 100 100 --dry-run
python -m winctl click 100 100 --allow-foreground --after
python -m winctl move 800 500 --dry-run
python -m winctl type "hello" --dry-run
python -m winctl paste "safe text" --expect-title "Notepad" --dry-run
python -m winctl hotkey ctrl l --dry-run
python -m winctl key enter --dry-run
python -m winctl scroll -5 --dry-run

Every action outputs JSON. Real foreground actions are blocked unless --allow-foreground is explicit. Executed actions append a line to runs/actions.jsonl.

Background Mode

WinCtl v0.7 is background-first: it prefers UI Automation and window-scoped observation over global mouse, keyboard, focus, or z-order changes.

python -m winctl observe-window --title "Notepad" --background --capture-backend auto
python -m winctl compat-window --title "Notepad"
python -m winctl inspect-window --title "Notepad"
python -m winctl tokenize --title "Notepad" --background --compact --capture-backend auto
python -m winctl invoke-token --text "OK" --dry-run
python -m winctl invoke-token --text "OK" --after
python -m winctl trace doctor

If a token cannot be controlled through UIA patterns, WinCtl returns requires_foreground: true instead of secretly moving the mouse. Real foreground actions such as focus, click, move, type, paste, hotkey, key, scroll, click-token, and click-text require explicit --allow-foreground.

Background token actions with --expect-title or --expect-process guard the token's target window instead of the active foreground window.

--capture-backend auto tries PrintWindow first, then falls back to safe screen-region BitBlt with diagnostics. BitBlt never focuses a window, but it can include occlusion if another window covers the target. Use compat-window to inspect capture and UIA readiness before automating a new app.

Token Workflow

tokenize combines the active window, optional UI Automation data, and optional OCR data into a stable list of tokens. The latest result is saved to runs\tokens.json.

python -m winctl compat-window --title "Notepad"
python -m winctl observe-window --title "Notepad" --background --capture-backend auto
python -m winctl tokenize --title "Notepad" --background --compact --capture-backend auto
python -m winctl resolve-token --text "OK" --match exact
python -m winctl invoke-token --text "OK" --match exact --dry-run
python -m winctl invoke-token --text "OK" --match exact --after
python -m winctl trace doctor
python -m winctl observe

Tokens include id, key, signature, type, text, name, source, rect, center, confidence, clickable, enabled, visible, bounds_valid, score, background_supported, requires_foreground, actions, patterns, and backend. When OCR or UIA dependencies are missing, tokenize still returns active-window tokens and includes diagnostics.

Waiting commands are non-destructive and output JSON:

python -m winctl wait-window "Notepad" --timeout 10
python -m winctl wait-token "OK" --timeout 10
python -m winctl wait-text "Done" --timeout 10

Token Query Workflow

Token queries are the recommended way for agents to operate UI elements without depending on unstable IDs or raw coordinates.

python -m winctl observe
python -m winctl tokenize --compact --active-window-only
python -m winctl tokens --compact --clickable --sort score
python -m winctl resolve-token --text "OK" --match exact --source uia --type button
python -m winctl click-token --text "OK" --match exact --source uia --type button --dry-run
python -m winctl click-token --source uia --type button --text "OK" --match exact --expect-title "Target App" --after
python -m winctl trace doctor
python -m winctl observe

resolve-token is the recommended no-action preflight for ambiguous UI. click-token returns a JSON error with candidates when a query matches multiple clickable tokens. Add filters or pass --first to explicitly select the first match.

python -m winctl resolve-token --text "OK" --first
python -m winctl click-token --text "OK" --first --dry-run

OCR and Text Clicking

python -m winctl ocr
python -m winctl find-text "OK"
python -m winctl click-text "OK" --after

If OCR dependencies are unavailable, the command fails with an installation hint while the rest of WinCtl continues to work.

UI Automation Inspect

python -m winctl inspect --max-depth 3

This uses optional pywinauto. Some apps do not expose useful UI Automation trees; WinCtl reports that gracefully.

Trace

Trace commands read runs\actions.jsonl and never perform desktop actions. Action results and logs include trace_id, session_id, guard details, and target-window context.

python -m winctl trace list
python -m winctl trace show --limit 5
python -m winctl trace doctor
python -m winctl trace export --output runs\trace.json

MCP Server

Install optional MCP support:

python -m pip install -e .[mcp]

Run the server:

python -m winctl.mcp

If the MCP SDK is missing, WinCtl prints a concise installation hint instead of a stack trace.

Safety Protocol

Observe before acting.
Prefer observe-window, tokenize --background, resolve-token, and UIA background actions over raw coordinates.
Prefer text/control selectors over coordinates when reliable.
Verify after each action.
Stop before destructive actions, payments, account changes, sending messages, deleting files, installing software, auth prompts, password entry, or 2FA.
Use --dry-run for risky commands; pass --allow-foreground only when mouse/keyboard/focus interference is acceptable.
Use --expect-title or --expect-process before typing, pasting, clicking, or hotkeys in a specific app.
Review trace_id, guard audit, and trace doctor output in action results and runs/actions.jsonl.
Use a dedicated browser profile and low-privilege account for autonomous workflows.

Why This Exists

Most browser automation tools stop at the web. Most GUI agents are full autonomous systems. WinCtl sits in the middle: a minimal, inspectable Windows control layer that an agent can call repeatedly while keeping planning and safety in the agent itself.

Notepad Smoke Test

cd E:\javaProject\winctl
python -m winctl launch notepad --wait-title "Notepad" --focus
python -m winctl paste "WinCtl smoke test" --expect-title "Notepad" --allow-foreground
python -m winctl observe

Then inspect runs\latest.bmp.

Troubleshooting

If screenshots are blank, check Windows privacy/screen capture restrictions and try running from a normal interactive desktop session.
If clicks land in the wrong place, check display scaling and multi-monitor coordinates with python -m winctl screen.
If typing is unreliable in a target app, use paste instead of type.
If focus does not bring a window forward, click the window once manually or run WinCtl from the active desktop session.
If OCR fails, install Tesseract and ensure tesseract.exe is on PATH.

Project Layout

src/winctl/       CLI, Windows API wrappers, input, screen, OCR, safety
tests/            Parser and dry-run tests
examples/         Smoke-test workflows
docs/             Token schema and agent policy
runs/             Local screenshots and action logs (ignored)

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
src/winctl		src/winctl
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WinCtl

Features

Install

Quick Start

Core Commands

Desktop Actions

Background Mode

Token Workflow

Token Query Workflow

OCR and Text Clicking

UI Automation Inspect

Trace

MCP Server

Safety Protocol

Why This Exists

Notepad Smoke Test

Troubleshooting

Project Layout

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

WinCtl

Features

Install

Quick Start

Core Commands

Desktop Actions

Background Mode

Token Workflow

Token Query Workflow

OCR and Text Clicking

UI Automation Inspect

Trace

MCP Server

Safety Protocol

Why This Exists

Notepad Smoke Test

Troubleshooting

Project Layout

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages