Skip to content

Add an end-to-end agent diagnostics command #485

Description

@Edison-A-N

Summary

OpenAgents currently exposes several useful health and status signals, but there does not appear to be a single diagnostic command or API that confirms an agent is truly usable end to end.

A dedicated diagnostics flow would help users distinguish between:

  • the daemon process is running
  • an agent adapter is marked as running
  • the workspace sees the agent as online via fresh heartbeats
  • the configured LLM/provider is reachable
  • the specific agent instance can receive a workspace task and successfully send a response

Current behavior

From reading the codebase, the existing checks are spread across multiple layers:

  • agn status reports daemon/adapter process state from daemon.status.json. This is useful liveness information, but it does not prove the agent can process a task.
  • /v1/health confirms the workspace backend is reachable.
  • /v1/join, /v1/heartbeat, and /v1/discover confirm workspace presence and can mark stale agents offline after heartbeat timeout.
  • session rotation / session_revoked helps detect duplicate or ghost adapters.
  • /status can make an agent post uptime/version/network information back into a channel, which is closer to a runtime response check but is not exposed as a unified diagnostic flow.
  • agn test-llm <type> validates a generic LLM provider configuration, but it is not tied to a specific agent instance or workspace round trip.

Expected behavior

It would be helpful to have a command and/or API such as:

agn diagnose <agent-name>

or a workspace-level diagnostic endpoint/control action that reports a structured verdict, for example:

{
  "daemon": "ok",
  "adapter_state": "running",
  "workspace_health": "ok",
  "workspace_presence": "online",
  "heartbeat_age_seconds": 12,
  "session": "valid",
  "llm": "ok",
  "workspace_round_trip": "ok",
  "agent_response": "ok"
}

Suggested checks

A first version could combine existing building blocks:

  1. Check daemon liveness and local adapter state.
  2. Check runtime readiness from the registry check_ready / health check metadata.
  3. Check workspace backend /v1/health.
  4. Check /v1/discover for the target agent and verify fresh heartbeat.
  5. Check session validity / detect session_revoked where possible.
  6. Optionally run the existing LLM smoke test for that agent type.
  7. Send a lightweight diagnostic control action to the target agent and verify it posts a response within a timeout.

Benefits

  • Makes "agent is running but not responding" problems easier to debug.
  • Gives users a single command to confirm whether an agent is actually usable.
  • Helps separate local daemon issues from workspace connectivity, session conflicts, credential problems, and runtime/tool failures.
  • Provides a better support artifact for bug reports than screenshots of multiple status surfaces.

Notes

This is a feature request rather than a bug report. The existing status surfaces are useful, but they currently need to be manually combined to answer the question: "Can this agent actually receive and complete work right now?"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions