Skip to content

Field report: Affinity MCP worked, but was not reliable for complex existing layout edits via Codex #2

@ardjo-s

Description

@ardjo-s

Context

This is a field report from trying to use the native Affinity MCP from Codex on macOS for a real production layout pass.

I am filing it here because this repository is the most relevant public GitHub resource I found for Affinity's native MCP server and Codex-style workflows. This may ultimately be an upstream Affinity MCP limitation rather than an issue in this repository, so feel free to close or redirect if this is out of scope.

No private document content, meeting notes, screenshots, or client text are included here.

Setup

  • macOS
  • Affinity app running with the native MCP connector enabled
  • MCP endpoint available at http://localhost:6767/sse
  • Codex using the MCP directly / through the usual SSE-compatible path
  • Existing multi-page Affinity publication, not a blank canvas
  • Task type: production layout cleanup across many pages

What worked

These parts were real and repeatable:

  • The MCP server was reachable.
  • The open Affinity document could be detected.
  • execute_script worked.
  • Basic SDK-driven insertions worked.
  • Targeted insertion on a spread worked for simple text-frame markers.
  • Rendering/screenshot proof of spreads worked.
  • Saving the document worked.

So this was not a connection failure.

What did not work well enough for the job

The MCP was not practically useful for a complex existing publication where the required work was mostly page/layout/editorial changes, for example:

  • edit existing text without losing or disturbing typography
  • preserve paragraph/character styles while inserting or replacing text
  • identify the exact object/frame/artefact on a specific page
  • delete a small unwanted visual artefact confidently
  • move text from one page to another while preserving flow and layout
  • replace an existing placed illustration/image in its current layout slot
  • modify a diagram without rebuilding the page by hand
  • inspect linked text frames and understand what content was hidden, overflowed, or shifted
  • work safely with grouped/stacked objects and complex z-order
  • make a change and know programmatically whether it was visually correct

The result was that the agent could technically write to the document, but the edits were not reliable enough to trust as production edits. In practice, the safest output became visible review markers plus proof screenshots, which is useful for handoff but not the same as actually completing layout work.

The core gap

For simple canvas creation, proofing, and scripted inserts, the MCP is promising.

For an existing editorial/layout document, the agent needs a much stronger document model exposed through MCP:

  • stable node IDs
  • page/spread-scoped object listing
  • object type, bounds, z-order, lock/visibility state
  • selected object inspection
  • linked text-frame inspection
  • text range editing that preserves styles
  • style lookup/application APIs
  • placed asset replacement APIs
  • safe delete APIs for exact objects
  • transaction/undo grouping
  • structured errors that identify the failing node/path
  • before/after render helpers or visual diff support

Without these, the agent has to infer too much from scripts and screenshots.

Reproduction shape

A minimal public repro would be:

  1. Open an existing multi-page Affinity document with several text frames, placed images, grouped objects, and linked flows.
  2. Connect an MCP client through http://localhost:6767/sse.
  3. Ask the agent to perform these generic layout tasks:
    • remove one small visual artefact from a specific page
    • add one phrase to an existing styled text frame without changing the surrounding typography
    • move one text block from the next page back to the previous page
    • replace one existing image/diagram while preserving its layout area
    • render the page and save
  4. Compare the result visually.

In my run, the transport and scripting layer worked, but these document-level edits were too brittle to trust.

Suggested improvements / docs that would help

It would help a lot to have canonical examples for agent workflows like:

  • list all nodes on the current spread with IDs, type, bounds, visibility, lock state, parent/group path
  • find a text frame by page and approximate bounds
  • replace a text range while preserving local styles
  • detect text overflow / hidden text
  • replace a placed image inside an existing frame
  • delete a specific artefact by stable ID
  • move a node between pages/spreads safely
  • create a grouped diagram with editable labels
  • run an edit as one transaction and render proof afterward

It would also help to document the boundary clearly:

  • good fit: simple scripted generation, inserting markers, exports, proofs, scripted shapes
  • not yet a good fit: complex production edits in an existing publication with linked text, precise typography, and page-to-page layout dependencies

Why this matters

From the user's point of view, the MCP appeared connected and capable, but the actual workflow still fell back to manual design work. That gap is important: a connected MCP is not enough if agents cannot safely reason over and modify existing document structure.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions