Overview
Make the existing safety architecture visible to users and close remaining security gaps. The goal: every defense layer should be something users can see, verify, and trust.
Tasks
1. obk safety status command
Add a CLI command that shows the user their current security posture in one view:
- Status of all defense layers (prompt hardening, content boundaries, injection scanning, approval gates, auto-approve rules, tool tiers, bash filtering, sandboxing, scheduled restrictions, audit logging)
- Recent activity summary (last approval, injections blocked, approvals today)
- Recommendations for improvements (e.g. encryption not enabled)
This is the single highest-impact UX feature for differentiation. No other AI assistant has anything like it.
2. Approval receipts
After every approved action, show a receipt to the user:
✓ Email sent to sarah@example.com
Subject: "Meeting tomorrow"
Approved: 14:32 UTC | Audit: obk audit show 847
Every action gets a paper trail the user can see and verify.
3. Injection attempt alerts (user-visible)
When ScanForInjection() detects a pattern, alert the user with context instead of silently filtering:
⚠ Prompt injection detected in web_fetch output
Pattern: "ignore previous instructions"
Source: https://example.com/page
Action: Content quarantined
Most agents silently filter injections. Showing the user builds trust and educates them about threats.
4. Audit log integrity (hash chain)
Make audit logs tamper-evident:
- Add a
prev_hash field to audit entries
- Each entry includes SHA-256 hash of the previous entry
obk audit verify command walks the chain and reports breaks
obk audit list shows recent tool calls with approval status
obk audit show <id> shows full details of a specific action
obk audit stats shows summary counts
5. Path traversal guards on file tools
The learnings tool already has path traversal protection (b5b49ca). Apply the same pattern to file_read and dir_explore:
filepath.EvalSymlinks() before any file operation
- Reject paths containing
.. after cleaning
- Optionally enforce a workspace boundary (configurable root)
6. Error message scrubbing
Audit all fmt.Errorf calls in internal/server/, provider/, oauth/ for information leakage:
- Remove env var names from error messages
- Remove file paths from user-facing HTTP responses
- Keep detailed errors in logs, return generic messages to clients
7. SECURITY.md
Create a security policy file:
- Responsible disclosure process with security contact email
- Scope (all code in the repo)
- 90-day disclosure timeline
- Credit given to reporters with consent
8. Autonomy dial (control modes)
Let users calibrate agent independence with four modes:
| Mode |
Behavior |
| Observe |
Agent suggests actions but never executes. User confirms everything. |
| Supervised |
Reads are free. Writes need approval. (Current default.) |
| Trusted |
Auto-approve low and medium risk. Only high-risk needs approval. |
| Autonomous |
Everything auto-approved. Post-execution notifications only. |
- Mode is session-scoped (not persistent), matching existing auto-approve philosophy
- Changeable mid-session via command
- Default is
supervised (no breaking change)
autonomous requires explicit opt-in with one-time risk warning
- Per-tool overrides in config for granular control
Priority Order
P1: obk safety status (highest user impact)
P2: Approval receipts + injection alerts (quick wins, high trust signal)
P3: Audit integrity + path traversal guards (hardening)
P4: Error scrubbing + SECURITY.md (production readiness)
P5: Autonomy dial (requires more design work)
Overview
Make the existing safety architecture visible to users and close remaining security gaps. The goal: every defense layer should be something users can see, verify, and trust.
Tasks
1.
obk safety statuscommandAdd a CLI command that shows the user their current security posture in one view:
This is the single highest-impact UX feature for differentiation. No other AI assistant has anything like it.
2. Approval receipts
After every approved action, show a receipt to the user:
Every action gets a paper trail the user can see and verify.
3. Injection attempt alerts (user-visible)
When
ScanForInjection()detects a pattern, alert the user with context instead of silently filtering:Most agents silently filter injections. Showing the user builds trust and educates them about threats.
4. Audit log integrity (hash chain)
Make audit logs tamper-evident:
prev_hashfield to audit entriesobk audit verifycommand walks the chain and reports breaksobk audit listshows recent tool calls with approval statusobk audit show <id>shows full details of a specific actionobk audit statsshows summary counts5. Path traversal guards on file tools
The learnings tool already has path traversal protection (
b5b49ca). Apply the same pattern tofile_readanddir_explore:filepath.EvalSymlinks()before any file operation..after cleaning6. Error message scrubbing
Audit all
fmt.Errorfcalls ininternal/server/,provider/,oauth/for information leakage:7. SECURITY.md
Create a security policy file:
8. Autonomy dial (control modes)
Let users calibrate agent independence with four modes:
supervised(no breaking change)autonomousrequires explicit opt-in with one-time risk warningPriority Order
P1:
obk safety status(highest user impact)P2: Approval receipts + injection alerts (quick wins, high trust signal)
P3: Audit integrity + path traversal guards (hardening)
P4: Error scrubbing + SECURITY.md (production readiness)
P5: Autonomy dial (requires more design work)