Skip to content

bug(server): GET/PUT/DELETE /api/workflows/{name} miss home-scoped (~/.archon/workflows/) — workflow detail page hangs #1524

@blankse

Description

@blankse

Bug Description

The single-workflow REST endpoints (`GET`, `PUT`, `DELETE /api/workflows/{name}`) skip the home-scoped layer entirely. Discovery (`GET /api/workflows`) reads home-scoped workflows correctly via `discoverWorkflowsWithConfig`, so the workflow is listed in the dashboard — but `GET /api/workflows/{name}` returns 404, which makes the workflow execution detail page hang on "Loading graph..." indefinitely.

CLAUDE.md documents the intended priority as `bundled < global < project`. The list endpoint walks all three; the single-workflow endpoint walks only project + bundled.

This is not the same as #958 (closed, fixed) — that bug was an `enabled` gate on the React Query call. The query now fires correctly; the server just answers 404 for any home-scoped (`~/.archon/workflows/...`) workflow.

Steps to Reproduce

  1. Drop a workflow file at `~/.archon/workflows/my-workflow.yaml` (or symlink one there).
  2. Confirm it appears in `GET /api/workflows` (list) and shows in the dashboard.
  3. Run it (`archon workflow run my-workflow ...`) so a run row exists.
  4. Open the run detail page in the Web UI: `/workflows/runs/{runId}`.
  5. Expected: the DAG graph renders.
  6. Actual: the page is stuck on "Loading graph..." forever. Behind the scenes, `GET /api/workflows/my-workflow` returns 404 with `{"error":"Workflow not found: my-workflow"}`.

Root Cause

`packages/server/src/routes/api.ts:2264` — the `getWorkflowRoute` handler walks:

```ts
// 1. Try user-defined workflow in cwd
if (workingDir) {
const [workflowFolder] = getWorkflowFolderSearchPaths();
const filePath = join(workingDir, workflowFolder, filename);
// ... readFile, return on success
}

// 2. Fall back to bundled defaults (binary: embedded map; dev: also check filesystem)
if (Object.hasOwn(BUNDLED_WORKFLOWS, name)) { /* ... */ }

if (!isBinaryBuild()) {
const defaultFilePath = join(getDefaultWorkflowsPath(), filename);
// ... readFile, return on success
}

return apiError(c, 404, Workflow not found: ${name});
```

Step 1 only looks in `/.archon/workflows/`. Steps 2-3 are bundled-defaults paths. Nothing checks `getHomeWorkflowsPath()` (`~/.archon/workflows/`). `@archon/paths` exports `getHomeWorkflowsPath()` and `@archon/workflows/workflow-discovery.ts:255` uses it correctly for the list endpoint — the single-workflow handler simply forgot the layer.

The `PUT` and `DELETE` handlers (lines 2348, 2404) have a related but distinct bug: when no codebase is registered they fall back to `workingDir = getArchonHome()` and then `join(workingDir, workflowFolder, ...)` ⇒ they read/write `~/.archon/.archon/workflows/...`. That is the legacy path the migration warning explicitly tells users to move away from (`getLegacyHomeWorkflowsPath()` is intentionally kept only as a deprecation signal). So a user creating a workflow from the dashboard with no project selected silently lands their YAML in the deprecated location, and the same `PUT`/`DELETE` happily round-trips it there forever.

Concrete Evidence

Real local reproduction:

```bash
$ ls -la ~/.archon/workflows/
... df-implement-with-preview-fast.yaml -> /home/.../df-archon-workflows/df-implement-with-preview-fast.yaml

$ curl -s -o /dev/null -w "HTTP %{http_code}\n" \
'http://localhost:3000/api/workflows/df-implement-with-preview-fast'
HTTP 404
```

The workflow exists, is symlinked into `~/.archon/workflows/`, appears in the list endpoint and the dashboard, but the detail endpoint says it doesn't exist. Run-detail page hangs.

Why This Matters

Home-scoped workflows are the documented mechanism for sharing workflows across all projects on a machine (CLAUDE.md, the migration warning, the path helpers). The list endpoint advertises them; the dashboard shows them; runs against them succeed. Only the single-workflow endpoints are blind to them, which makes home-scoped workflows feel half-supported and breaks the run detail page that depends on the GET endpoint to render the DAG graph.

Proposed Fix

  1. Insert a home-scoped lookup in `getWorkflowRoute` between project and bundled, mirroring the documented priority (priority `project > home > bundled` when reading; reverse of the merge order):

    ```ts
    // 1. Project-scoped (existing)
    // 2. Home-scoped — NEW
    const homeFilePath = join(getHomeWorkflowsPath(), filename);
    try {
    const content = await readFile(homeFilePath, 'utf-8');
    const result = parseWorkflow(content, filename);
    if (result.error) return apiError(c, 500, Workflow file is invalid: ...);
    return c.json({ workflow: result.workflow, filename, source: 'global' as WorkflowSource });
    } catch (err) {
    if ((err as NodeJS.ErrnoException).code !== 'ENOENT') { /* log + 500 */ }
    }
    // 3. Bundled (existing)
    ```

  2. `PUT` and `DELETE` handlers: when no codebase is registered and no `cwd` is provided, write to `getHomeWorkflowsPath()` directly (`/.archon/workflows/`), not `join(getArchonHome(), '.archon', 'workflows')` (`/.archon/.archon/workflows/`, the legacy path).

  3. `WorkflowSource` schema may need a `'global'` variant to round-trip the new source label cleanly. (Discovery already uses `source: 'global'`.)

  4. Tests:

    • `GET` returns 200 with `source: 'global'` for a home-scoped file.
    • `GET` precedence: project shadows home shadows bundled (regression cover for the documented priority).
    • `PUT` with no codebase writes under `~/.archon/workflows/`, not under the legacy path.
    • `DELETE` for a home-scoped file removes it from `~/.archon/workflows/`.

Optional Follow-up (Web UI — out of this PR)

Even with the API fix, `WorkflowExecution.tsx:546-560` renders "Loading graph..." with no timeout and no error path. If the API ever returns 404 for any reason (legitimately missing workflow, network failure, transient server error) the page hangs forever. Suggest:

  • Read the React Query error state and render it.
  • Add a fallback that uses `initialData?.dagNodes` to render the live status as a graph even when the static definition can't be fetched.

Happy to bundle this with the API fix if reviewers prefer; otherwise it's a clean separate issue.

Environment

  • Archon v0.3.10
  • Platform: WSL2 / Ubuntu, SQLite database
  • Reproduces deterministically with any workflow placed under `~/.archon/workflows/` (file or symlink).

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium priority - Backlog, when time permitsarea: servicesBackground servicesbugSomething is broken

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions