feat(tools): boot-diagnosis pack v1 + dmesg fix + filter args#3
Merged
Conversation
Closes the headline gaps from a live boot-triage exercise against the
fleet. The user reported that "why is boot slow?" was unanswerable
because read_dmesg was broken, the journal couldn't reach across boots,
and there was no systemd-analyze / systemctl surface at all.
read_dmesg / read_journal
- Drop the broken `--read-clear=false` argv from read_dmesg (the flag
has no boolean form; this was always erroring out).
- read_journal gains `boot int` (mapped to journalctl -b N, validated
-10..0) so "show me the previous boot" is a one-field call instead
of arithmetic on uptime.
- read_journal gains `match string` (mapped to journalctl --grep,
validated as printable-ASCII, length-capped, must compile in Go).
systemd-analyze suite (new boot.go)
- boot_time parses `systemd-analyze time` into typed seconds per phase
(firmware/loader/kernel/initrd/userspace/total) plus the
target-reached line. Phases that don't apply on a given host are
reported as 0 rather than failing the call.
- boot_blame parses `systemd-analyze blame` and sorts descending;
supports `top:N` (default 50, max 500).
- boot_critical_chain parses the unit tree, returning a flat list of
{unit, active_at_seconds, startup_seconds} alongside the raw text.
- Composite durations like "1min 23.456s" parse correctly; the
duration regex is shared via a `durRE` constant.
systemctl surface (new systemd.go)
- list_systemd_units calls `systemctl list-units --output=json` with
validated state/type filters and a configurable limit (default 500,
cap 2000).
- unit_status combines `systemctl show` (parsed as key=value, embedded
'=' values preserved) with a tail of the unit's journal.
- list_timers parses `systemctl list-timers --output=json`. The wire
format emits next/left/last/passed as epoch microseconds (numbers,
not strings); fields are typed int64 and renamed _micros for clarity.
Filter ergonomics (Sig-9)
- list_processes: top:N (alias for limit, smaller wins), name_regex
(comm match), state (R/S/D/Z/T/I).
- list_mounts: fstype, mount_point_regex.
- list_block_devices: name_regex (top-level), fields []string (project
device entries to a subset of keys, applied recursively to children).
Tools registry / catalogue
- AllToolNames grows 19 → 25 so fleet peers publish the new surface.
- README catalogue table extended; the six DBus-using tools are
flagged with a snap-confinement caveat: inside the strictly-confined
snap they return isError with "Failed to connect to bus: Permission
denied" because the current plug list (system-observe etc.) does
not grant DBus to systemd's system bus. They work normally when
the daemon is run as a plain binary outside the snap. Adding a
narrow DBus interface is deferred per the prior plug-strictness call.
Tests
- New boot_test.go covers parseDurationSecs (composite + unknown
suffix), parseBootTime (typical, no-firmware-VM, with-initrd),
parseBlame (sort + ms/s/min mix), parseCriticalChain (depth +
startup time + zero-startup leaf).
- New systemd_test.go covers listUnitsArgs (filters + validation) and
parseSystemctlShow (value containing '=').
- Extended logs_test.go for the new boot/match validations.
Verification: go build / go vet / gofmt / go test -race ./... clean.
Smoke-tested every new tool end-to-end via MCP tools/call against a
local binary and against the live fleetmind-a/b LXD VMs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR expands FleetMind’s boot/triage tool surface by fixing read_dmesg, enhancing read_journal for cross-boot and server-side filtering, and adding new systemd-facing tools (systemd-analyze + systemctl) plus additional filtering/projection options on existing inventory tools.
Changes:
- Fix
read_dmesginvocation and addread_journalsupport forbootoffsets andmatchgrep filtering. - Add new boot diagnosis tools (
boot_time,boot_blame,boot_critical_chain) and new systemd inspection tools (list_systemd_units,unit_status,list_timers), including unit-tested parsers/argv builders. - Add filter ergonomics to existing tools (
list_processes,list_mounts,list_block_devices) and update the tool catalogue/registry.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| README.md | Extends the tool catalogue and documents snap confinement caveats for systemd-related tools. |
| internal/tools/tools.go | Registers new boot/systemd tools and expands the published tool name list. |
| internal/tools/systemd.go | Implements systemctl-based tools: list units, unit status + journal tail, list timers. |
| internal/tools/systemd_test.go | Adds tests for systemctl argv building and systemctl show parsing. |
| internal/tools/process.go | Adds top alias, regex/name filtering, and state filtering for list_processes. |
| internal/tools/mount.go | Adds fstype and mount-point regex filtering for list_mounts. |
| internal/tools/logs.go | Fixes read_dmesg args; adds boot and match filtering to read_journal. |
| internal/tools/logs_test.go | Extends journal arg tests to cover boot and match validation. |
| internal/tools/boot.go | Adds systemd-analyze tools and parsers for time/blame/critical-chain. |
| internal/tools/boot_test.go | Adds unit tests covering duration parsing and systemd-analyze output parsing. |
| internal/tools/block.go | Adds top-level device name regex filtering and recursive field projection for lsblk output. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+39
to
+40
| plug to expose them in the snap is tracked as a follow-up — see the | ||
| deferred-items section of the plan in `.claude/plans/`. |
| Priority string `json:"priority,omitempty" jsonschema:"minimum priority: emerg|alert|crit|err|warning|notice|info|debug"` | ||
| Since string `json:"since,omitempty" jsonschema:"timestamp accepted by --since (e.g. '1h ago', '2026-05-13 09:00')"` | ||
| Boot int `json:"boot,omitempty" jsonschema:"boot offset for journalctl -b (0 = current boot, -1 = previous boot, …). Valid range -10..0."` | ||
| Match string `json:"match,omitempty" jsonschema:"PCRE regex passed to journalctl --grep (max 200 chars)"` |
Comment on lines
+219
to
+221
| // Unknown suffixes (e.g. "y", "month") are tolerated but contribute 0 — boot | ||
| // timings should never reach those scales, and if they do the user can read | ||
| // the Raw field directly. |
|
|
||
| type listUnitsIn struct { | ||
| State string `json:"state,omitempty" jsonschema:"filter by active state: active|inactive|failed|activating|deactivating|reloading"` | ||
| Type string `json:"type,omitempty" jsonschema:"filter by unit type: service|timer|socket|mount|target|path|swap|slice|scope|device"` |
| require github.com/modelcontextprotocol/go-sdk v1.6.0 | ||
|
|
||
| require ( | ||
| github.com/godbus/dbus/v5 v5.2.2 // indirect |
| Priority string `json:"priority,omitempty" jsonschema:"minimum priority: emerg|alert|crit|err|warning|notice|info|debug"` | ||
| Since string `json:"since,omitempty" jsonschema:"timestamp accepted by --since (e.g. '1h ago', '2026-05-13 09:00')"` | ||
| Boot int `json:"boot,omitempty" jsonschema:"boot offset for journalctl -b (0 = current boot, -1 = previous boot, …). Valid range -10..0."` | ||
| Match string `json:"match,omitempty" jsonschema:"PCRE regex passed to journalctl --grep (max 200 chars)"` |
| FinishMonotonicUsec uint64 // microseconds AFTER kernel boot, when default target became active | ||
| } | ||
|
|
||
| // BootTimes reads the relevant Manager properties in one trip. |
Comment on lines
+155
to
+156
| // UnitAfter returns the After= dependencies of the named unit. Returns an | ||
| // empty slice if the unit doesn't exist or has no After= deps. |
Comment on lines
+138
to
+143
| mcp.AddTool(s, &mcp.Tool{ | ||
| Name: "unit_status", | ||
| Description: "Detail view of a single systemd unit: every property the unit " + | ||
| "publishes on org.freedesktop.systemd1.Unit, plus a tail of the unit's recent " + | ||
| "journal (best-effort; requires log-observe).", | ||
| }, func(ctx context.Context, _ *mcp.CallToolRequest, in unitStatusIn) (*mcp.CallToolResult, unitStatusOut, error) { |
Comment on lines
+12
to
+15
| type listBlockIn struct { | ||
| NameRegex string `json:"name_regex,omitempty" jsonschema:"keep only top-level devices whose name matches this regex (max 200 chars). Children are kept unfiltered."` | ||
| Fields []string `json:"fields,omitempty" jsonschema:"project each device entry to only these keys (max 32 entries). Empty = return every key."` | ||
| } |
gjolly
approved these changes
May 13, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes the headline gaps from a live boot-triage exercise against the fleet. The user reported that "why is boot slow?" was unanswerable because read_dmesg was broken, the journal couldn't reach across boots, and there was no systemd-analyze / systemctl surface at all.
read_dmesg / read_journal
--read-clear=falseargv from read_dmesg (the flag has no boolean form; this was always erroring out).boot int(mapped to journalctl -b N, validated -10..0) so "show me the previous boot" is a one-field call instead of arithmetic on uptime.match string(mapped to journalctl --grep, validated as printable-ASCII, length-capped, must compile in Go).systemd-analyze suite (new boot.go)
systemd-analyze timeinto typed seconds per phase (firmware/loader/kernel/initrd/userspace/total) plus the target-reached line. Phases that don't apply on a given host are reported as 0 rather than failing the call.systemd-analyze blameand sorts descending; supportstop:N(default 50, max 500).durREconstant.systemctl surface (new systemd.go)
systemctl list-units --output=jsonwith validated state/type filters and a configurable limit (default 500, cap 2000).systemctl show(parsed as key=value, embedded '=' values preserved) with a tail of the unit's journal.systemctl list-timers --output=json. The wire format emits next/left/last/passed as epoch microseconds (numbers, not strings); fields are typed int64 and renamed _micros for clarity.Filter ergonomics (Sig-9)
Tools registry / catalogue
Tests
Verification: go build / go vet / gofmt / go test -race ./... clean. Smoke-tested every new tool end-to-end via MCP tools/call against a local binary and against the live fleetmind-a/b LXD VMs.