From 7421d92b99df426cff6a730fcc5d86eb7220a8a9 Mon Sep 17 00:00:00 2001 From: Shubham Malik Date: Thu, 18 Jun 2026 10:25:48 +0530 Subject: [PATCH 1/3] docs: add macOS launchd troubleshooting guide --- docs/launchd-troubleshooting.md | 141 ++++++++++++++++++++++++++++++++ 1 file changed, 141 insertions(+) create mode 100644 docs/launchd-troubleshooting.md diff --git a/docs/launchd-troubleshooting.md b/docs/launchd-troubleshooting.md new file mode 100644 index 0000000..b5b45a0 --- /dev/null +++ b/docs/launchd-troubleshooting.md @@ -0,0 +1,141 @@ +# launchd Troubleshooting (macOS) + +Command reference for the Dev Machine Guard launchd job (label +`com.stepsecurity.agent`). + +**How periodic runs work:** the MDM loader installs a launchd plist with a +`StartInterval` (default 4h). Each tick launchd re-runs the **loader script**, +which auto-updates the binary, then runs `send-telemetry`. `RunAtLoad` is +`false`, so loading the plist (login / boot / install) never triggers a scan — +only the interval does. The one-off initial scan runs explicitly at install +time. To force an out-of-cycle run, `kickstart` it (or run the loader by hand). + +## Variants + +Almost always a per-user **LaunchAgent** running as the console user — that's +what the MDM loader installs. A root **LaunchDaemon** exists only if someone ran +`sudo install` directly; check for it, but don't expect it. + +| | Per-user **LaunchAgent** (expected) | Root **LaunchDaemon** (rare) | +| ------- | ----------------------------------------------------- | ----------------------------------------------------- | +| Plist | `~/Library/LaunchAgents/com.stepsecurity.agent.plist` | `/Library/LaunchDaemons/com.stepsecurity.agent.plist` | +| Domain | `gui/$(id -u)` | `system` | +| Runs as | console user | root | +| Logs | `~/.stepsecurity/agent.log`, `agent.error.log` | `/var/log/stepsecurity/agent.log`, `agent.error.log` | +| `sudo` | no | yes (use the `system` domain) | + +Loader-managed (MDM, auto-updates) vs binary-managed (manual `install`) — tell +them apart by what the plist runs: + +```bash +plutil -p "$PLIST" | grep -A4 ProgramArguments +# /bin/bash …/stepsecurity-loader.sh send-telemetry -> loader-managed (auto-updates each tick) +# …/stepsecurity-dev-machine-guard send-telemetry -> binary-managed (no auto-update) +``` + +## Setup + +```bash +LABEL=com.stepsecurity.agent + +# Expected: per-user LaunchAgent +DOMAIN="gui/$(id -u)" +PLIST="$HOME/Library/LaunchAgents/$LABEL.plist" +LOGDIR="$HOME/.stepsecurity" + +# Check whether a root LaunchDaemon is also present (rare). If it is, redo with +# sudo and: DOMAIN=system PLIST=/Library/LaunchDaemons/$LABEL.plist LOGDIR=/var/log/stepsecurity +ls -la "$HOME/Library/LaunchAgents/$LABEL.plist" 2>&1 +ls -la "/Library/LaunchDaemons/$LABEL.plist" 2>&1 +``` + +## Status + +```bash +launchctl list | grep stepsec # loaded? PID + last exit +launchctl list "$LABEL" # one-job summary +launchctl print "$DOMAIN/$LABEL" # full state, schedule, last exit +launchctl print-disabled "$DOMAIN" | grep stepsec # disabled override? (loads but never runs) +launchctl enable "$DOMAIN/$LABEL" # clear a disable override +``` + +## Inspect plist + +```bash +plutil -p "$PLIST" # readable dump +plutil -lint "$PLIST" # validate XML +plutil -p "$PLIST" | grep -A4 ProgramArguments # loader script vs binary (see Variants) +/usr/libexec/PlistBuddy -c "Print :StartInterval" "$PLIST" # seconds (14400 = 4h) +/usr/libexec/PlistBuddy -c "Print :EnvironmentVariables" "$PLIST" # baked HOME / STEPSECURITY_HOME +``` + +## Config & version + +```bash +cat "$HOME/.stepsecurity/config.json" # effective config (contains api_key) +cat "$HOME/.stepsecurity/.current_version" # version the loader last installed +"$HOME/.stepsecurity/bin/stepsecurity-dev-machine-guard" --version # running binary version +ls -la "$HOME/.stepsecurity" "$HOME/.stepsecurity/bin" # owner should be the console user, not root +``` + +## Logs + +```bash +tail -n 100 "$LOGDIR/agent.log" # scheduled-run stdout +tail -n 100 "$LOGDIR/agent.error.log" # scheduled-run stderr (rotates to .prev at 5 MiB) +tail -f "$LOGDIR"/agent.log "$LOGDIR"/agent.error.log # watch live +tail -n 50 "$HOME/.stepsecurity/ai-agent-hook-errors.jsonl" # AI-agent hook errors +stat -f '%Sm' "$LOGDIR/agent.log" # last scheduled-run time +log show --predicate 'process == "launchd"' --last 2h | grep -i stepsec # launchd's own view +``` + +## Force a run + +```bash +launchctl kickstart -k "$DOMAIN/$LABEL" # run now (-k restarts if in-flight) +/bin/bash "$HOME/.stepsecurity/bin/stepsecurity-loader.sh" send-telemetry # loader by hand (update + scan) +``` + +## Reload (after editing the plist) + +```bash +launchctl bootout "$DOMAIN/$LABEL" 2>/dev/null +launchctl bootstrap "$DOMAIN" "$PLIST" +launchctl print "$DOMAIN/$LABEL" | head -20 +``` + +`config.json` changes need no reload — they're read at run time; just `kickstart`. +(The loader logs `launchctl load`/`unload`; the modern verbs above work regardless.) + +## Uninstall + +```bash +/bin/bash "$HOME/.stepsecurity/bin/stepsecurity-loader.sh" uninstall # loader-managed (MDM) +"$HOME/.stepsecurity/bin/stepsecurity-dev-machine-guard" uninstall # binary-managed + +# Manual fallback: +launchctl bootout "$DOMAIN/$LABEL" 2>/dev/null || launchctl unload "$PLIST" 2>/dev/null +rm -f "$PLIST" + +# Verify +launchctl list | grep stepsec # expect no output +ls -la "$PLIST" 2>&1 # expect not found +rm -rf "$HOME/.stepsecurity" # wipe local state (optional) +``` + +## Reinstall + +```bash +/bin/bash "$HOME/.stepsecurity/bin/stepsecurity-loader.sh" install # or re-push loader via MDM +launchctl print "$DOMAIN/$LABEL" | grep -iE 'state|last exit' +launchctl kickstart -k "$DOMAIN/$LABEL" && tail -n 20 "$LOGDIR/agent.log" +``` + +## Gotchas + +- **config.json is rewritten every tick.** The loader's `write_config()` keeps only a fixed set (customer_id, api_endpoint, api_key, scan_frequency_hours + optional install_dir / max_execution_duration / scan toggles); any other hand-edited or profile-pushed field (e.g. `include_tcc_protected`) is wiped within one interval. Make it stick by editing the loader heredoc before deploy. +- **Runs only in a live GUI session.** No console user (login window, headless, SSH) → not loaded, won't fire; the loader's initial run errors `no_user`, and `launchctl … gui/` over SSH can return `Bootstrap failed: 5`. +- **TCC prompts are real.** It runs in the user's GUI session, so scanning Documents/Downloads/etc. pops permission dialogs; skipped by default. Grant Full Disk Access (PPPC profile), then set `include_tcc_protected`. +- **A wedged run blocks every tick.** The binary's lock file makes overlapping runs exit; a hung run holds the lock until the loader SIGKILLs processes older than `MAX_PROCESS_AGE_HOURS` on a later tick. Self-heals, but loses up to that window. +- **`StartInterval` quirks.** Missed fires during sleep coalesce into one run on wake; the timer also restarts on each load/login, so short sessions on a long interval can starve it. +- **`Bootstrap failed: 5`** most often means already loaded — `bootout` first, then `bootstrap`. From e21c0ef4f5bc306975d5143159645297cf5f4ea8 Mon Sep 17 00:00:00 2001 From: Shubham Malik Date: Thu, 18 Jun 2026 13:50:33 +0530 Subject: [PATCH 2/3] docs: enhance launchd troubleshooting guide with detailed scheduling information --- docs/launchd-troubleshooting.md | 39 +++++++++++++++++++++++++++++++-- 1 file changed, 37 insertions(+), 2 deletions(-) diff --git a/docs/launchd-troubleshooting.md b/docs/launchd-troubleshooting.md index b5b45a0..4ff6f20 100644 --- a/docs/launchd-troubleshooting.md +++ b/docs/launchd-troubleshooting.md @@ -10,11 +10,46 @@ which auto-updates the binary, then runs `send-telemetry`. `RunAtLoad` is only the interval does. The one-off initial scan runs explicitly at install time. To force an out-of-cycle run, `kickstart` it (or run the loader by hand). +## Scheduling: `RunAtLoad` and `StartInterval` + +`RunAtLoad` controls one thing — whether launchd runs the job **once, +immediately, the moment the plist loads**: + +- `` — runs as soon as the job loads. "Load" means boot (LaunchDaemon) / + login (LaunchAgent), **and** every manual `launchctl bootstrap` / `load`. So a + LaunchAgent would re-run on every login and every reload. +- `` (our setting, and the default) — does **not** run at load. The job + sits idle until another trigger fires it: here that's `StartInterval`, or a + manual `launchctl kickstart`. + +```xml +StartInterval +14400 +RunAtLoad + +``` + +The cadence therefore comes entirely from `StartInterval`. `RunAtLoad=false` +avoids a redundant scan on every login/reboot (and a fleet-wide boot-time +stampede); the installer instead runs one explicit `send-telemetry` at install, +then lets the interval pace the rest. + +**Consequence:** after a `bootstrap` / `load` / reload, **nothing runs on its +own** — use `kickstart` (see Force a run) to trigger a scan immediately. +(`RunAtLoad` is a one-shot-at-load trigger, unrelated to `KeepAlive`, which +continuously restarts a long-running daemon — a short-lived scan uses neither.) + ## Variants Almost always a per-user **LaunchAgent** running as the console user — that's -what the MDM loader installs. A root **LaunchDaemon** exists only if someone ran -`sudo install` directly; check for it, but don't expect it. +what the loader installs. The loader (and every version-specific loader script) +**never** creates a root daemon: even when MDM runs it as root it resolves the +console user and installs a per-user LaunchAgent, and aborts (`no_user`) if no +one is logged in rather than falling back to root. A root **LaunchDaemon** under +`/Library/LaunchDaemons/` only appears from a **legacy (≤1.8.x) agent script** +installed as root (pre-loader), or a manual `sudo install` (the Go +binary's installer has a root path the loader never invokes). Check for one — to +clean up a leftover — but current tooling won't create it. | | Per-user **LaunchAgent** (expected) | Root **LaunchDaemon** (rare) | | ------- | ----------------------------------------------------- | ----------------------------------------------------- | From 4abd891091696bc439bed5ae5131f7aa9a33b1ac Mon Sep 17 00:00:00 2001 From: Shubham Malik Date: Wed, 1 Jul 2026 07:23:07 +0530 Subject: [PATCH 3/3] docs(tcc): configure Full Disk Access for all devices only --- docs/macos-tcc-permissions.md | 54 ++++++++++++++--------------------- 1 file changed, 22 insertions(+), 32 deletions(-) diff --git a/docs/macos-tcc-permissions.md b/docs/macos-tcc-permissions.md index bc075c3..ba6b4aa 100644 --- a/docs/macos-tcc-permissions.md +++ b/docs/macos-tcc-permissions.md @@ -107,7 +107,7 @@ self-censor**. macOS still enforces TCC: without a grant, reads in protected dirs will silently fail with `EACCES`. For the agent to actually see the contents, it needs Full Disk Access (FDA). -Two paths to grant FDA: +There are two ways to grant FDA. ### Option A — MDM-pushed PPPC profile (recommended for fleets) @@ -120,19 +120,15 @@ This is the only way to grant FDA at scale without per-user clicks. #### Inputs you need -- **The install path of the binary.** The loader installs at - `~/.stepsecurity/bin/stepsecurity-dev-machine-guard` — that's - per-user (`/Users//.stepsecurity/bin/...`). PPPC's - `Identifier` field always takes an absolute filesystem path when - `IdentifierType` is `path` (it has no `$HOME`/variable expansion), - so you either: - - scope a per-user profile that substitutes each user's home path, - using your MDM's per-user variables (Jamf's `$HOME`-substituting - profile payload variables, Kandji's user-context blueprints, - Intune's per-user assignment, etc.), or - - have the operator install the binary at a fixed system-wide path - (for example `/usr/local/bin/stepsecurity-dev-machine-guard`) so - the same profile applies to every user on the device. +- **The install path of the binary.** By default the loader installs at + `~/.stepsecurity/bin/stepsecurity-dev-machine-guard`, which is + per-user. Because PPPC's `Identifier` field takes an absolute + filesystem path when `IdentifierType` is `path` (it has no + `$HOME`/variable expansion), set a **fixed system-wide install + directory** (under the loader's Advanced Configuration) so one profile + applies to every user on the device — for example + `/usr/local/stepsecurity`, which installs the binary at + `/usr/local/stepsecurity/bin/stepsecurity-dev-machine-guard`. - **The code requirement string** derived from the binary's signature. PPPC pairs the install path with this requirement so an impostor @@ -145,7 +141,7 @@ This is the only way to grant FDA at scale without per-user clicks. You'll get a line like: ``` - identifier "stepsecurity-dev-machine-guard" and anchor apple generic and certificate 1[field.1.2.840.113635.100.6.2.6] /* exists */ and certificate leaf[field.1.2.840.113635.100.6.1.13] /* exists */ and certificate leaf[subject.OU] = "" + identifier "stepsecurity-dev-machine-guard" and anchor apple generic and certificate 1[field.1.2.840.113635.100.6.2.6] /* exists */ and certificate leaf[field.1.2.840.113635.100.6.1.13] /* exists */ and certificate leaf[subject.OU] = "D63S9HLM4L" ``` #### PPPC profile XML @@ -190,11 +186,11 @@ granting **SystemPolicyAllFiles** (Full Disk Access) to the agent: Identifier - /Users/REPLACE_USERNAME/.stepsecurity/bin/stepsecurity-dev-machine-guard + REPLACE_INSTALL_DIR/bin/stepsecurity-dev-machine-guard IdentifierType path CodeRequirement - identifier "stepsecurity-dev-machine-guard" and anchor apple generic and certificate 1[field.1.2.840.113635.100.6.2.6] /* exists */ and certificate leaf[field.1.2.840.113635.100.6.1.13] /* exists */ and certificate leaf[subject.OU] = "REPLACE_TEAM_ID" + anchor apple generic and certificate 1[field.1.2.840.113635.100.6.2.6] /* exists */ and certificate leaf[field.1.2.840.113635.100.6.1.13] /* exists */ and certificate leaf[subject.OU] = "D63S9HLM4L" Allowed Comment @@ -211,16 +207,12 @@ granting **SystemPolicyAllFiles** (Full Disk Access) to the agent: Replace: - Both `REPLACE-WITH-UUIDGEN-OUTPUT` values with fresh UUIDs (`uuidgen` on macOS). -- `REPLACE_USERNAME` with the target user's short username so the - `Identifier` resolves to the actual on-disk binary path. For - per-user MDM scoping, use your MDM's per-user variable instead of a - literal username (e.g., Jamf's `$USERNAME`, Kandji's user-context - variable). For a fixed system-wide install, replace the whole - `Identifier` value with the absolute path you chose - (e.g., `/usr/local/bin/stepsecurity-dev-machine-guard`). -- `REPLACE_TEAM_ID` with the Apple Developer Team ID embedded in - the binary's code requirement (the trailing `subject.OU` field - from the `codesign -d -r-` output above). +- `REPLACE_INSTALL_DIR` with the fixed system-wide install directory you + configured (for example `/usr/local/stepsecurity`), so the `Identifier` + resolves to `/bin/stepsecurity-dev-machine-guard`. + +The `CodeRequirement` is already pinned to StepSecurity's Apple Developer +Team ID (`D63S9HLM4L`) — leave it as-is. #### Push the profile @@ -288,11 +280,9 @@ If a popup appears after deploying the PPPC profile and setting string must match the binary's actual signing. Re-run `codesign -d -r-` against the deployed binary and update the profile. - **Binary path mismatch.** If `IdentifierType=path` is used, the - `Identifier` must match the absolute path of the binary on disk. - Different per-user install dirs can require deploying the profile - with a wildcard-friendly identifier (use the code requirement - alone, with `IdentifierType=bundleID`-style matching, or push the - profile per user). + `Identifier` must match the absolute path of the binary on disk. Set a + fixed system-wide install directory so a single path applies to every + device. - **TCC.db cache.** TCC caches decisions; after changing a profile, reset the relevant service: