Skip to content

feat(tools): boot-diagnosis pack v1 + dmesg fix + filter args#3

Merged
gjolly merged 2 commits into
mainfrom
feat/fleet-mode
May 13, 2026
Merged

feat(tools): boot-diagnosis pack v1 + dmesg fix + filter args#3
gjolly merged 2 commits into
mainfrom
feat/fleet-mode

Conversation

@tejender-upadhyay

Copy link
Copy Markdown
Collaborator

Closes the headline gaps from a live boot-triage exercise against the fleet. The user reported that "why is boot slow?" was unanswerable because read_dmesg was broken, the journal couldn't reach across boots, and there was no systemd-analyze / systemctl surface at all.

read_dmesg / read_journal

  • Drop the broken --read-clear=false argv from read_dmesg (the flag has no boolean form; this was always erroring out).
  • read_journal gains boot int (mapped to journalctl -b N, validated -10..0) so "show me the previous boot" is a one-field call instead of arithmetic on uptime.
  • read_journal gains match string (mapped to journalctl --grep, validated as printable-ASCII, length-capped, must compile in Go).

systemd-analyze suite (new boot.go)

  • boot_time parses systemd-analyze time into typed seconds per phase (firmware/loader/kernel/initrd/userspace/total) plus the target-reached line. Phases that don't apply on a given host are reported as 0 rather than failing the call.
  • boot_blame parses systemd-analyze blame and sorts descending; supports top:N (default 50, max 500).
  • boot_critical_chain parses the unit tree, returning a flat list of {unit, active_at_seconds, startup_seconds} alongside the raw text.
  • Composite durations like "1min 23.456s" parse correctly; the duration regex is shared via a durRE constant.

systemctl surface (new systemd.go)

  • list_systemd_units calls systemctl list-units --output=json with validated state/type filters and a configurable limit (default 500, cap 2000).
  • unit_status combines systemctl show (parsed as key=value, embedded '=' values preserved) with a tail of the unit's journal.
  • list_timers parses systemctl list-timers --output=json. The wire format emits next/left/last/passed as epoch microseconds (numbers, not strings); fields are typed int64 and renamed _micros for clarity.

Filter ergonomics (Sig-9)

  • list_processes: top:N (alias for limit, smaller wins), name_regex (comm match), state (R/S/D/Z/T/I).
  • list_mounts: fstype, mount_point_regex.
  • list_block_devices: name_regex (top-level), fields []string (project device entries to a subset of keys, applied recursively to children).

Tools registry / catalogue

  • AllToolNames grows 19 → 25 so fleet peers publish the new surface.
  • README catalogue table extended; the six DBus-using tools are flagged with a snap-confinement caveat: inside the strictly-confined snap they return isError with "Failed to connect to bus: Permission denied" because the current plug list (system-observe etc.) does not grant DBus to systemd's system bus. They work normally when the daemon is run as a plain binary outside the snap. Adding a narrow DBus interface is deferred per the prior plug-strictness call.

Tests

  • New boot_test.go covers parseDurationSecs (composite + unknown suffix), parseBootTime (typical, no-firmware-VM, with-initrd), parseBlame (sort + ms/s/min mix), parseCriticalChain (depth + startup time + zero-startup leaf).
  • New systemd_test.go covers listUnitsArgs (filters + validation) and parseSystemctlShow (value containing '=').
  • Extended logs_test.go for the new boot/match validations.

Verification: go build / go vet / gofmt / go test -race ./... clean. Smoke-tested every new tool end-to-end via MCP tools/call against a local binary and against the live fleetmind-a/b LXD VMs.

Closes the headline gaps from a live boot-triage exercise against the
fleet. The user reported that "why is boot slow?" was unanswerable
because read_dmesg was broken, the journal couldn't reach across boots,
and there was no systemd-analyze / systemctl surface at all.

read_dmesg / read_journal
- Drop the broken `--read-clear=false` argv from read_dmesg (the flag
  has no boolean form; this was always erroring out).
- read_journal gains `boot int` (mapped to journalctl -b N, validated
  -10..0) so "show me the previous boot" is a one-field call instead
  of arithmetic on uptime.
- read_journal gains `match string` (mapped to journalctl --grep,
  validated as printable-ASCII, length-capped, must compile in Go).

systemd-analyze suite (new boot.go)
- boot_time parses `systemd-analyze time` into typed seconds per phase
  (firmware/loader/kernel/initrd/userspace/total) plus the
  target-reached line. Phases that don't apply on a given host are
  reported as 0 rather than failing the call.
- boot_blame parses `systemd-analyze blame` and sorts descending;
  supports `top:N` (default 50, max 500).
- boot_critical_chain parses the unit tree, returning a flat list of
  {unit, active_at_seconds, startup_seconds} alongside the raw text.
- Composite durations like "1min 23.456s" parse correctly; the
  duration regex is shared via a `durRE` constant.

systemctl surface (new systemd.go)
- list_systemd_units calls `systemctl list-units --output=json` with
  validated state/type filters and a configurable limit (default 500,
  cap 2000).
- unit_status combines `systemctl show` (parsed as key=value, embedded
  '=' values preserved) with a tail of the unit's journal.
- list_timers parses `systemctl list-timers --output=json`. The wire
  format emits next/left/last/passed as epoch microseconds (numbers,
  not strings); fields are typed int64 and renamed _micros for clarity.

Filter ergonomics (Sig-9)
- list_processes: top:N (alias for limit, smaller wins), name_regex
  (comm match), state (R/S/D/Z/T/I).
- list_mounts: fstype, mount_point_regex.
- list_block_devices: name_regex (top-level), fields []string (project
  device entries to a subset of keys, applied recursively to children).

Tools registry / catalogue
- AllToolNames grows 19 → 25 so fleet peers publish the new surface.
- README catalogue table extended; the six DBus-using tools are
  flagged with a snap-confinement caveat: inside the strictly-confined
  snap they return isError with "Failed to connect to bus: Permission
  denied" because the current plug list (system-observe etc.) does
  not grant DBus to systemd's system bus. They work normally when
  the daemon is run as a plain binary outside the snap. Adding a
  narrow DBus interface is deferred per the prior plug-strictness call.

Tests
- New boot_test.go covers parseDurationSecs (composite + unknown
  suffix), parseBootTime (typical, no-firmware-VM, with-initrd),
  parseBlame (sort + ms/s/min mix), parseCriticalChain (depth +
  startup time + zero-startup leaf).
- New systemd_test.go covers listUnitsArgs (filters + validation) and
  parseSystemctlShow (value containing '=').
- Extended logs_test.go for the new boot/match validations.

Verification: go build / go vet / gofmt / go test -race ./... clean.
Smoke-tested every new tool end-to-end via MCP tools/call against a
local binary and against the live fleetmind-a/b LXD VMs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands FleetMind’s boot/triage tool surface by fixing read_dmesg, enhancing read_journal for cross-boot and server-side filtering, and adding new systemd-facing tools (systemd-analyze + systemctl) plus additional filtering/projection options on existing inventory tools.

Changes:

  • Fix read_dmesg invocation and add read_journal support for boot offsets and match grep filtering.
  • Add new boot diagnosis tools (boot_time, boot_blame, boot_critical_chain) and new systemd inspection tools (list_systemd_units, unit_status, list_timers), including unit-tested parsers/argv builders.
  • Add filter ergonomics to existing tools (list_processes, list_mounts, list_block_devices) and update the tool catalogue/registry.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
README.md Extends the tool catalogue and documents snap confinement caveats for systemd-related tools.
internal/tools/tools.go Registers new boot/systemd tools and expands the published tool name list.
internal/tools/systemd.go Implements systemctl-based tools: list units, unit status + journal tail, list timers.
internal/tools/systemd_test.go Adds tests for systemctl argv building and systemctl show parsing.
internal/tools/process.go Adds top alias, regex/name filtering, and state filtering for list_processes.
internal/tools/mount.go Adds fstype and mount-point regex filtering for list_mounts.
internal/tools/logs.go Fixes read_dmesg args; adds boot and match filtering to read_journal.
internal/tools/logs_test.go Extends journal arg tests to cover boot and match validation.
internal/tools/boot.go Adds systemd-analyze tools and parsers for time/blame/critical-chain.
internal/tools/boot_test.go Adds unit tests covering duration parsing and systemd-analyze output parsing.
internal/tools/block.go Adds top-level device name regex filtering and recursive field projection for lsblk output.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread README.md Outdated
Comment on lines +39 to +40
plug to expose them in the snap is tracked as a follow-up — see the
deferred-items section of the plan in `.claude/plans/`.
Comment thread internal/tools/logs.go
Priority string `json:"priority,omitempty" jsonschema:"minimum priority: emerg|alert|crit|err|warning|notice|info|debug"`
Since string `json:"since,omitempty" jsonschema:"timestamp accepted by --since (e.g. '1h ago', '2026-05-13 09:00')"`
Boot int `json:"boot,omitempty" jsonschema:"boot offset for journalctl -b (0 = current boot, -1 = previous boot, …). Valid range -10..0."`
Match string `json:"match,omitempty" jsonschema:"PCRE regex passed to journalctl --grep (max 200 chars)"`
Comment thread internal/tools/boot.go Outdated
Comment on lines +219 to +221
// Unknown suffixes (e.g. "y", "month") are tolerated but contribute 0 — boot
// timings should never reach those scales, and if they do the user can read
// the Raw field directly.
Comment thread internal/tools/systemd.go Outdated

type listUnitsIn struct {
State string `json:"state,omitempty" jsonschema:"filter by active state: active|inactive|failed|activating|deactivating|reloading"`
Type string `json:"type,omitempty" jsonschema:"filter by unit type: service|timer|socket|mount|target|path|swap|slice|scope|device"`

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 12 changed files in this pull request and generated 6 comments.

Comment thread go.mod
require github.com/modelcontextprotocol/go-sdk v1.6.0

require (
github.com/godbus/dbus/v5 v5.2.2 // indirect
Comment thread internal/tools/logs.go
Priority string `json:"priority,omitempty" jsonschema:"minimum priority: emerg|alert|crit|err|warning|notice|info|debug"`
Since string `json:"since,omitempty" jsonschema:"timestamp accepted by --since (e.g. '1h ago', '2026-05-13 09:00')"`
Boot int `json:"boot,omitempty" jsonschema:"boot offset for journalctl -b (0 = current boot, -1 = previous boot, …). Valid range -10..0."`
Match string `json:"match,omitempty" jsonschema:"PCRE regex passed to journalctl --grep (max 200 chars)"`
Comment thread internal/sysd/manager.go
FinishMonotonicUsec uint64 // microseconds AFTER kernel boot, when default target became active
}

// BootTimes reads the relevant Manager properties in one trip.
Comment thread internal/sysd/manager.go
Comment on lines +155 to +156
// UnitAfter returns the After= dependencies of the named unit. Returns an
// empty slice if the unit doesn't exist or has no After= deps.
Comment thread internal/tools/systemd.go
Comment on lines +138 to +143
mcp.AddTool(s, &mcp.Tool{
Name: "unit_status",
Description: "Detail view of a single systemd unit: every property the unit " +
"publishes on org.freedesktop.systemd1.Unit, plus a tail of the unit's recent " +
"journal (best-effort; requires log-observe).",
}, func(ctx context.Context, _ *mcp.CallToolRequest, in unitStatusIn) (*mcp.CallToolResult, unitStatusOut, error) {
Comment thread internal/tools/block.go
Comment on lines +12 to +15
type listBlockIn struct {
NameRegex string `json:"name_regex,omitempty" jsonschema:"keep only top-level devices whose name matches this regex (max 200 chars). Children are kept unfiltered."`
Fields []string `json:"fields,omitempty" jsonschema:"project each device entry to only these keys (max 32 entries). Empty = return every key."`
}
@gjolly gjolly force-pushed the feat/fleet-mode branch from 4eab0fe to fded476 Compare May 13, 2026 13:52
@gjolly gjolly merged commit 3e7e644 into main May 13, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants