Skip to content

feat: add agent-friendly Modly CLI#144

Open
jokrasno wants to merge 1 commit into
lightningpixel:devfrom
jokrasno:cli-tool-dev-pr
Open

feat: add agent-friendly Modly CLI#144
jokrasno wants to merge 1 commit into
lightningpixel:devfrom
jokrasno:cli-tool-dev-pr

Conversation

@jokrasno
Copy link
Copy Markdown

Adds a simple CLI tool so AI agents can have an easier time interacting with modly. Includes a SKILL.md file as well.

@DrHepa
Copy link
Copy Markdown
Contributor

DrHepa commented May 22, 2026

I think the idea is useful: Modly does need an agent-friendly, JSON-first way to automate common operations without driving the UI manually.

After reviewing this against the current dev branch and the existing external MCP/CLI direction, my main concern is not the usefulness of the tool, but the contract it creates.

Right now this PR adds a new Python CLI surface that talks mostly to existing REST endpoints such as:

  • /health
  • /model/*
  • /generate/from-image
  • /generate/status/{job_id}
  • /generate/cancel/{job_id}
  • /export/{format}

That is practical, but it also risks becoming a third parallel automation path next to:

  1. the in-app Agent Chat (/agent/chat + Ollama tool-calling),
  2. the existing MCP surfaces / modly-cli-mcp integration shown in Settings,
  3. the newer workflow-runs / process/capability direction.

I would suggest keeping the useful CLI ergonomics, but aligning the contract before treating this as an official automation interface.

Concretely, I think the CLI should be shaped around canonical primitives:

health
model
workflow-run
process-run
capability

And keep the current /generate/* flow as legacy compatibility rather than the primary path.

1. Prefer workflow runs for generation

Instead of making generate primarily wrap:

POST /generate/from-image
GET /generate/status/{job_id}

prefer:

POST /workflow-runs/from-image
GET /workflow-runs/{run_id}
POST /workflow-runs/{run_id}/cancel

The JSON output should include recovery metadata so agents can resume/poll safely:

{
  "ok": true,
  "run": {
    "kind": "workflowRun",
    "id": "..."
  },
  "meta": {
    "status_command": "workflow-run status ...",
    "legacy": false
  }
}

generate can still exist as a friendly alias, but it should be clearly documented as either canonical-through-workflow-run or legacy if it falls back to /generate/*.

2. Use process/capability discovery instead of heuristics

The current approach appears to infer some behavior from model names / labels, especially around texture/refine-style flows.

For agents, that is fragile. The CLI should avoid guessing hidden capabilities from strings and instead use explicit discovery when available, e.g.:

GET /automation/capabilities
POST /process-runs
GET /process-runs/{run_id}

If a process is not available through a canonical contract, it should fail closed:

{
  "ok": false,
  "code": "UNSUPPORTED_PROCESS",
  "message": "This process is not available through the canonical process-run contract."
}

3. Validate model IDs canonically

Any model_id should be validated against:

GET /model/all

If --model auto remains, it should use verified defaults/capability metadata, not string matching or first available model behavior.

4. Clarify readiness boundaries

GET /health should happen before business operations.

Also, starting FastAPI through serve / ensure-server should be clearly marked as developer-only. A FastAPI-only process does not imply Electron/Desktop bridge readiness, scene operations, extension process execution, or full workflow support.

I would suggest moving these commands under something like:

dev serve-api
dev ensure-server

or documenting them explicitly as non-canonical dev helpers.

5. Keep ComfyUI out of the canonical Modly agent contract

The ComfyUI integration may be useful, but it feels like an external orchestration helper rather than part of the core Modly automation contract.

I would suggest moving it under:

experimental comfy-image

or splitting it from the main supported CLI surface.

6. Align with existing MCP direction

Settings > Agent already points users toward modly-cli-mcp / modly-mcp for external agents. To avoid confusing users and future maintainers, this CLI should either:

  • explicitly stay as a repo-local/dev helper, or
  • align its naming and JSON semantics with the existing MCP concepts: workflowRun, processRun, capability, recovery metadata, and fail-closed unsupported operations.

The important part is avoiding a second canonical-looking automation contract with different semantics.

7. PR hygiene

Since the feature is Python-only, the package-lock.json changes look unrelated unless there is a specific reason for them. I would recommend removing that noise from this PR.

Overall: I like the goal and I think the CLI can be valuable. I would just prefer to land it with a clearer contract boundary:

  • canonical: workflow-run, process-run, capability
  • legacy: /generate/* and old job polling
  • dev-only: server startup helpers
  • experimental: ComfyUI orchestration

That would preserve the useful agent ergonomics while keeping Modly automation aligned with Agent Chat, MCP, and future workflow/process execution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants