Skip to content

Safety and control improvements: visibility, hardening, and trust UX #124

Description

@priyanshujain

Overview

Make the existing safety architecture visible to users and close remaining security gaps. The goal: every defense layer should be something users can see, verify, and trust.

Tasks

1. obk safety status command

Add a CLI command that shows the user their current security posture in one view:

  • Status of all defense layers (prompt hardening, content boundaries, injection scanning, approval gates, auto-approve rules, tool tiers, bash filtering, sandboxing, scheduled restrictions, audit logging)
  • Recent activity summary (last approval, injections blocked, approvals today)
  • Recommendations for improvements (e.g. encryption not enabled)

This is the single highest-impact UX feature for differentiation. No other AI assistant has anything like it.

2. Approval receipts

After every approved action, show a receipt to the user:

✓ Email sent to sarah@example.com
  Subject: "Meeting tomorrow"
  Approved: 14:32 UTC | Audit: obk audit show 847

Every action gets a paper trail the user can see and verify.

3. Injection attempt alerts (user-visible)

When ScanForInjection() detects a pattern, alert the user with context instead of silently filtering:

⚠ Prompt injection detected in web_fetch output
  Pattern: "ignore previous instructions"
  Source: https://example.com/page
  Action: Content quarantined

Most agents silently filter injections. Showing the user builds trust and educates them about threats.

4. Audit log integrity (hash chain)

Make audit logs tamper-evident:

  • Add a prev_hash field to audit entries
  • Each entry includes SHA-256 hash of the previous entry
  • obk audit verify command walks the chain and reports breaks
  • obk audit list shows recent tool calls with approval status
  • obk audit show <id> shows full details of a specific action
  • obk audit stats shows summary counts

5. Path traversal guards on file tools

The learnings tool already has path traversal protection (b5b49ca). Apply the same pattern to file_read and dir_explore:

  • filepath.EvalSymlinks() before any file operation
  • Reject paths containing .. after cleaning
  • Optionally enforce a workspace boundary (configurable root)

6. Error message scrubbing

Audit all fmt.Errorf calls in internal/server/, provider/, oauth/ for information leakage:

  • Remove env var names from error messages
  • Remove file paths from user-facing HTTP responses
  • Keep detailed errors in logs, return generic messages to clients

7. SECURITY.md

Create a security policy file:

  • Responsible disclosure process with security contact email
  • Scope (all code in the repo)
  • 90-day disclosure timeline
  • Credit given to reporters with consent

8. Autonomy dial (control modes)

Let users calibrate agent independence with four modes:

Mode Behavior
Observe Agent suggests actions but never executes. User confirms everything.
Supervised Reads are free. Writes need approval. (Current default.)
Trusted Auto-approve low and medium risk. Only high-risk needs approval.
Autonomous Everything auto-approved. Post-execution notifications only.
  • Mode is session-scoped (not persistent), matching existing auto-approve philosophy
  • Changeable mid-session via command
  • Default is supervised (no breaking change)
  • autonomous requires explicit opt-in with one-time risk warning
  • Per-tool overrides in config for granular control

Priority Order

P1: obk safety status (highest user impact)
P2: Approval receipts + injection alerts (quick wins, high trust signal)
P3: Audit integrity + path traversal guards (hardening)
P4: Error scrubbing + SECURITY.md (production readiness)
P5: Autonomy dial (requires more design work)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions