step-security · shubham-stepsecurity · Jun 18, 2026 · Jun 18, 2026 · Jul 1, 2026
diff --git a/docs/launchd-troubleshooting.md b/docs/launchd-troubleshooting.md
@@ -0,0 +1,176 @@
+# launchd Troubleshooting (macOS)
+
+Command reference for the Dev Machine Guard launchd job (label
+`com.stepsecurity.agent`).
+
+**How periodic runs work:** the MDM loader installs a launchd plist with a
+`StartInterval` (default 4h). Each tick launchd re-runs the **loader script**,
+which auto-updates the binary, then runs `send-telemetry`. `RunAtLoad` is
+`false`, so loading the plist (login / boot / install) never triggers a scan —
+only the interval does. The one-off initial scan runs explicitly at install
+time. To force an out-of-cycle run, `kickstart` it (or run the loader by hand).
+
+## Scheduling: `RunAtLoad` and `StartInterval`
+
+`RunAtLoad` controls one thing — whether launchd runs the job **once,
+immediately, the moment the plist loads**:
+
+- `<true/>` — runs as soon as the job loads. "Load" means boot (LaunchDaemon) /
+  login (LaunchAgent), **and** every manual `launchctl bootstrap` / `load`. So a
+  LaunchAgent would re-run on every login and every reload.
+- `<false/>` (our setting, and the default) — does **not** run at load. The job
+  sits idle until another trigger fires it: here that's `StartInterval`, or a
+  manual `launchctl kickstart`.
+
+```xml
+<key>StartInterval</key>
+<integer>14400</integer>   <!-- fire every 4h -->
+<key>RunAtLoad</key>
+<false/>                    <!-- but NOT at load -->
+```
+
+The cadence therefore comes entirely from `StartInterval`. `RunAtLoad=false`
+avoids a redundant scan on every login/reboot (and a fleet-wide boot-time
+stampede); the installer instead runs one explicit `send-telemetry` at install,
+then lets the interval pace the rest.
+
+**Consequence:** after a `bootstrap` / `load` / reload, **nothing runs on its
+own** — use `kickstart` (see Force a run) to trigger a scan immediately.
+(`RunAtLoad` is a one-shot-at-load trigger, unrelated to `KeepAlive`, which
+continuously restarts a long-running daemon — a short-lived scan uses neither.)
+
+## Variants
+
+Almost always a per-user **LaunchAgent** running as the console user — that's
+what the loader installs. The loader (and every version-specific loader script)
+**never** creates a root daemon: even when MDM runs it as root it resolves the
+console user and installs a per-user LaunchAgent, and aborts (`no_user`) if no
+one is logged in rather than falling back to root. A root **LaunchDaemon** under
+`/Library/LaunchDaemons/` only appears from a **legacy (≤1.8.x) agent script**
+installed as root (pre-loader), or a manual `sudo <binary> install` (the Go
+binary's installer has a root path the loader never invokes). Check for one — to
+clean up a leftover — but current tooling won't create it.
+
+|         | Per-user **LaunchAgent** (expected)                   | Root **LaunchDaemon** (rare)                          |
+| ------- | ----------------------------------------------------- | ----------------------------------------------------- |
+| Plist   | `~/Library/LaunchAgents/com.stepsecurity.agent.plist` | `/Library/LaunchDaemons/com.stepsecurity.agent.plist` |
+| Domain  | `gui/$(id -u)`                                        | `system`                                              |
+| Runs as | console user                                          | root                                                  |
+| Logs    | `~/.stepsecurity/agent.log`, `agent.error.log`        | `/var/log/stepsecurity/agent.log`, `agent.error.log`  |
+| `sudo`  | no                                                    | yes (use the `system` domain)                         |
+
+Loader-managed (MDM, auto-updates) vs binary-managed (manual `install`) — tell
+them apart by what the plist runs:
+
+```bash
+plutil -p "$PLIST" | grep -A4 ProgramArguments
+# /bin/bash …/stepsecurity-loader.sh send-telemetry   -> loader-managed (auto-updates each tick)
+# …/stepsecurity-dev-machine-guard send-telemetry      -> binary-managed (no auto-update)
+```
+
+## Setup
+
+```bash
+LABEL=com.stepsecurity.agent
+
+# Expected: per-user LaunchAgent
+DOMAIN="gui/$(id -u)"
+PLIST="$HOME/Library/LaunchAgents/$LABEL.plist"
+LOGDIR="$HOME/.stepsecurity"
+
+# Check whether a root LaunchDaemon is also present (rare). If it is, redo with
+# sudo and: DOMAIN=system  PLIST=/Library/LaunchDaemons/$LABEL.plist  LOGDIR=/var/log/stepsecurity
+ls -la "$HOME/Library/LaunchAgents/$LABEL.plist" 2>&1
+ls -la "/Library/LaunchDaemons/$LABEL.plist" 2>&1
+```
+
+## Status
+
+```bash
+launchctl list | grep stepsec                          # loaded? PID + last exit
+launchctl list "$LABEL"                                 # one-job summary
+launchctl print "$DOMAIN/$LABEL"                        # full state, schedule, last exit
+launchctl print-disabled "$DOMAIN" | grep stepsec       # disabled override? (loads but never runs)
+launchctl enable "$DOMAIN/$LABEL"                        # clear a disable override
+```
+
+## Inspect plist
+
+```bash
+plutil -p "$PLIST"                                      # readable dump
+plutil -lint "$PLIST"                                   # validate XML
+plutil -p "$PLIST" | grep -A4 ProgramArguments          # loader script vs binary (see Variants)
+/usr/libexec/PlistBuddy -c "Print :StartInterval" "$PLIST"          # seconds (14400 = 4h)
+/usr/libexec/PlistBuddy -c "Print :EnvironmentVariables" "$PLIST"   # baked HOME / STEPSECURITY_HOME
+```
+
+## Config & version
+
+```bash
+cat "$HOME/.stepsecurity/config.json"                   # effective config (contains api_key)
+cat "$HOME/.stepsecurity/.current_version"              # version the loader last installed
+"$HOME/.stepsecurity/bin/stepsecurity-dev-machine-guard" --version   # running binary version
+ls -la "$HOME/.stepsecurity" "$HOME/.stepsecurity/bin"  # owner should be the console user, not root
+```
+
+## Logs
+
+```bash
+tail -n 100 "$LOGDIR/agent.log"                         # scheduled-run stdout
+tail -n 100 "$LOGDIR/agent.error.log"                   # scheduled-run stderr (rotates to .prev at 5 MiB)
+tail -f "$LOGDIR"/agent.log "$LOGDIR"/agent.error.log   # watch live
+tail -n 50 "$HOME/.stepsecurity/ai-agent-hook-errors.jsonl"   # AI-agent hook errors
+stat -f '%Sm' "$LOGDIR/agent.log"                       # last scheduled-run time
+log show --predicate 'process == "launchd"' --last 2h | grep -i stepsec   # launchd's own view
+```
+
+## Force a run
+
+```bash
+launchctl kickstart -k "$DOMAIN/$LABEL"                 # run now (-k restarts if in-flight)
+/bin/bash "$HOME/.stepsecurity/bin/stepsecurity-loader.sh" send-telemetry   # loader by hand (update + scan)
+```
+
+## Reload (after editing the plist)
+
+```bash
+launchctl bootout   "$DOMAIN/$LABEL" 2>/dev/null
+launchctl bootstrap "$DOMAIN" "$PLIST"
+launchctl print     "$DOMAIN/$LABEL" | head -20
+```
+
+`config.json` changes need no reload — they're read at run time; just `kickstart`.
+(The loader logs `launchctl load`/`unload`; the modern verbs above work regardless.)
+
+## Uninstall
+
+```bash
+/bin/bash "$HOME/.stepsecurity/bin/stepsecurity-loader.sh" uninstall   # loader-managed (MDM)
+"$HOME/.stepsecurity/bin/stepsecurity-dev-machine-guard" uninstall     # binary-managed
+
+# Manual fallback:
+launchctl bootout "$DOMAIN/$LABEL" 2>/dev/null || launchctl unload "$PLIST" 2>/dev/null
+rm -f "$PLIST"
+
+# Verify
+launchctl list | grep stepsec                          # expect no output
+ls -la "$PLIST" 2>&1                                    # expect not found
+rm -rf "$HOME/.stepsecurity"                            # wipe local state (optional)
+```
+
+## Reinstall
+
+```bash
+/bin/bash "$HOME/.stepsecurity/bin/stepsecurity-loader.sh" install   # or re-push loader via MDM
+launchctl print "$DOMAIN/$LABEL" | grep -iE 'state|last exit'
+launchctl kickstart -k "$DOMAIN/$LABEL" && tail -n 20 "$LOGDIR/agent.log"
+```
+
+## Gotchas
+
+- **config.json is rewritten every tick.** The loader's `write_config()` keeps only a fixed set (customer_id, api_endpoint, api_key, scan_frequency_hours + optional install_dir / max_execution_duration / scan toggles); any other hand-edited or profile-pushed field (e.g. `include_tcc_protected`) is wiped within one interval. Make it stick by editing the loader heredoc before deploy.
+- **Runs only in a live GUI session.** No console user (login window, headless, SSH) → not loaded, won't fire; the loader's initial run errors `no_user`, and `launchctl … gui/<uid>` over SSH can return `Bootstrap failed: 5`.
+- **TCC prompts are real.** It runs in the user's GUI session, so scanning Documents/Downloads/etc. pops permission dialogs; skipped by default. Grant Full Disk Access (PPPC profile), then set `include_tcc_protected`.
+- **A wedged run blocks every tick.** The binary's lock file makes overlapping runs exit; a hung run holds the lock until the loader SIGKILLs processes older than `MAX_PROCESS_AGE_HOURS` on a later tick. Self-heals, but loses up to that window.
+- **`StartInterval` quirks.** Missed fires during sleep coalesce into one run on wake; the timer also restarts on each load/login, so short sessions on a long interval can starve it.
+- **`Bootstrap failed: 5`** most often means already loaded — `bootout` first, then `bootstrap`.
diff --git a/docs/macos-tcc-permissions.md b/docs/macos-tcc-permissions.md
@@ -107,7 +107,7 @@ self-censor**. macOS still enforces TCC: without a grant, reads in
 protected dirs will silently fail with `EACCES`. For the agent to
 actually see the contents, it needs Full Disk Access (FDA).
 
-Two paths to grant FDA:
+There are two ways to grant FDA.
 
 ### Option A — MDM-pushed PPPC profile (recommended for fleets)
 
@@ -120,19 +120,15 @@ This is the only way to grant FDA at scale without per-user clicks.
 
 #### Inputs you need
 
-- **The install path of the binary.** The loader installs at
-  `~/.stepsecurity/bin/stepsecurity-dev-machine-guard` — that's
-  per-user (`/Users/<username>/.stepsecurity/bin/...`). PPPC's
-  `Identifier` field always takes an absolute filesystem path when
-  `IdentifierType` is `path` (it has no `$HOME`/variable expansion),
-  so you either:
-  - scope a per-user profile that substitutes each user's home path,
-    using your MDM's per-user variables (Jamf's `$HOME`-substituting
-    profile payload variables, Kandji's user-context blueprints,
-    Intune's per-user assignment, etc.), or
-  - have the operator install the binary at a fixed system-wide path
-    (for example `/usr/local/bin/stepsecurity-dev-machine-guard`) so
-    the same profile applies to every user on the device.
+- **The install path of the binary.** By default the loader installs at
+  `~/.stepsecurity/bin/stepsecurity-dev-machine-guard`, which is
+  per-user. Because PPPC's `Identifier` field takes an absolute
+  filesystem path when `IdentifierType` is `path` (it has no
+  `$HOME`/variable expansion), set a **fixed system-wide install
+  directory** (under the loader's Advanced Configuration) so one profile
+  applies to every user on the device — for example
+  `/usr/local/stepsecurity`, which installs the binary at
+  `/usr/local/stepsecurity/bin/stepsecurity-dev-machine-guard`.
 
 - **The code requirement string** derived from the binary's signature.
   PPPC pairs the install path with this requirement so an impostor
@@ -145,7 +141,7 @@ This is the only way to grant FDA at scale without per-user clicks.
   You'll get a line like:
 
   ```
-  identifier "stepsecurity-dev-machine-guard" and anchor apple generic and certificate 1[field.1.2.840.113635.100.6.2.6] /* exists */ and certificate leaf[field.1.2.840.113635.100.6.1.13] /* exists */ and certificate leaf[subject.OU] = "<TEAM_ID>"
+  identifier "stepsecurity-dev-machine-guard" and anchor apple generic and certificate 1[field.1.2.840.113635.100.6.2.6] /* exists */ and certificate leaf[field.1.2.840.113635.100.6.1.13] /* exists */ and certificate leaf[subject.OU] = "D63S9HLM4L"
   ```
 
 #### PPPC profile XML
@@ -190,11 +186,11 @@ granting **SystemPolicyAllFiles** (Full Disk Access) to the agent:
                 <array>
                     <dict>
                         <key>Identifier</key>
-                        <string>/Users/REPLACE_USERNAME/.stepsecurity/bin/stepsecurity-dev-machine-guard</string>
+                        <string>REPLACE_INSTALL_DIR/bin/stepsecurity-dev-machine-guard</string>
                         <key>IdentifierType</key>
                         <string>path</string>
                         <key>CodeRequirement</key>
-                        <string>identifier "stepsecurity-dev-machine-guard" and anchor apple generic and certificate 1[field.1.2.840.113635.100.6.2.6] /* exists */ and certificate leaf[field.1.2.840.113635.100.6.1.13] /* exists */ and certificate leaf[subject.OU] = "REPLACE_TEAM_ID"</string>
+                        <string>anchor apple generic and certificate 1[field.1.2.840.113635.100.6.2.6] /* exists */ and certificate leaf[field.1.2.840.113635.100.6.1.13] /* exists */ and certificate leaf[subject.OU] = "D63S9HLM4L"</string>
                         <key>Allowed</key>
                         <true/>
                         <key>Comment</key>
@@ -211,16 +207,12 @@ granting **SystemPolicyAllFiles** (Full Disk Access) to the agent:
 Replace:
 - Both `REPLACE-WITH-UUIDGEN-OUTPUT` values with fresh UUIDs
   (`uuidgen` on macOS).
-- `REPLACE_USERNAME` with the target user's short username so the
-  `Identifier` resolves to the actual on-disk binary path. For
-  per-user MDM scoping, use your MDM's per-user variable instead of a
-  literal username (e.g., Jamf's `$USERNAME`, Kandji's user-context
-  variable). For a fixed system-wide install, replace the whole
-  `Identifier` value with the absolute path you chose
-  (e.g., `/usr/local/bin/stepsecurity-dev-machine-guard`).
-- `REPLACE_TEAM_ID` with the Apple Developer Team ID embedded in
-  the binary's code requirement (the trailing `subject.OU` field
-  from the `codesign -d -r-` output above).
+- `REPLACE_INSTALL_DIR` with the fixed system-wide install directory you
+  configured (for example `/usr/local/stepsecurity`), so the `Identifier`
+  resolves to `<install-dir>/bin/stepsecurity-dev-machine-guard`.
+
+The `CodeRequirement` is already pinned to StepSecurity's Apple Developer
+Team ID (`D63S9HLM4L`) — leave it as-is.
 
 #### Push the profile
 
@@ -288,11 +280,9 @@ If a popup appears after deploying the PPPC profile and setting
   string must match the binary's actual signing. Re-run `codesign -d
   -r-` against the deployed binary and update the profile.
 - **Binary path mismatch.** If `IdentifierType=path` is used, the
-  `Identifier` must match the absolute path of the binary on disk.
-  Different per-user install dirs can require deploying the profile
-  with a wildcard-friendly identifier (use the code requirement
-  alone, with `IdentifierType=bundleID`-style matching, or push the
-  profile per user).
+  `Identifier` must match the absolute path of the binary on disk. Set a
+  fixed system-wide install directory so a single path applies to every
+  device.
 - **TCC.db cache.** TCC caches decisions; after changing a profile,
   reset the relevant service: