Failsafe guards: approval gating, capability, token budget, hardened scheduling by oxyc · Pull Request #25 · generoi/gds-assistant

oxyc · 2026-05-26T15:05:32Z

Summary

Production-safety failsafes for the AI assistant. The principle: content is safe because it's recoverable (revisions + trash); this PR brings comparable guardrails to operations that aren't recoverable, and adds containment + observability. Companion to generoi/gds-mcp#18 (the data-layer guards these rely on).

Human-approval gating for irreversible writes (`DestructiveGuard`)

Approval previously fired only for dangerous tools or moderate + destructive-annotated ones. Edits that are irreversible in practice but flagged non-destructive slipped through and executed silently. Now gated:

forms-update when it rewrites the field structure
terms-update, feeds-update, redirects-manage (create)
translations-link, strings-update, translations-machine
memory-save (it persists into every future system prompt — anti-poisoning)

On approval it injects confirm_destructive, which the matching gds-mcp guards consume, so an approved change passes the data layer while non-chat callers stay protected.

Other layers

Scope & escalation in the system prompt: content-only scope; recognise site bugs (colours/layout/logic) and offer /report-bug instead of papering over them with content edits.
Hardened scheduled skills: ran via WP-Cron with no user and no approver. Now run as the skill author, skipped unless the author is an admin, require manage_options to schedule, and surface tools left awaiting approval instead of silently no-op-ing.
Per-user daily token budget (TokenBudget): runaway-cost backstop alongside the per-request rate limiter. Filterable / env-configurable; 0 disables.
Audit logging of approvals & denials: approved (gated, destructive) actions execute in the approval path, not MessageLoop, so they were never being logged. Denials now leave a trace too.
Memory cap + ToolRestrictor fix (terms-delete now classifies as dangerous).

Tests

DestructiveGuardTest, TokenBudgetTest (new); extended ToolRestrictorTest, MemoryToolProviderTest, SkillSchedulerTest, ChatEndpointApprovalTest.

🤖 Generated with Claude Code

…ogging DestructiveGuard forces human approval for writes that are irreversible in practice but not flagged destructive: Gravity Forms field changes, taxonomy term edits, feed edits, redirect creation, memory saves, and Polylang translation relinks. On approval it injects confirm_destructive so the matching data-layer guards in gds-mcp let the change through. ToolRestrictor: fix a shadowed branch so terms-delete classifies as dangerous. System prompt: content-only scope + bug-report escalation, plus warnings that forms/terms/translation edits are unrevisioned. TokenBudget: per-user daily token cap (filterable) as a runaway-cost backstop. Audit-log approvals and denials in the approval path — approved (gated, destructive) actions execute there, not in MessageLoop, so they were previously never logged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Scheduled skills ran via WP-Cron with no current user and no approver, so (combined with __return_true abilities) they executed with no capability enforcement. They now run as the skill author, are skipped unless the author is an administrator, require manage_options to set a schedule, and surface tools left awaiting approval instead of silently doing nothing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

assistant__memory-save content is injected into every future system prompt. Cap the total entry count (filterable, matching the load limit) to bound prompt bloat and the prompt-injection surface; per-entry length limits already existed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

oxyc and others added 3 commits May 26, 2026 12:04

oxyc mentioned this pull request May 26, 2026

Failsafe write guards for AI-driven abilities generoi/gds-mcp#18

Merged

oxyc merged commit 9ca15d6 into main May 26, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failsafe guards: approval gating, capability, token budget, hardened scheduling#25

Failsafe guards: approval gating, capability, token budget, hardened scheduling#25
oxyc merged 3 commits into
mainfrom
failsafe-write-guards

oxyc commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

oxyc commented May 26, 2026

Summary

Human-approval gating for irreversible writes (DestructiveGuard)

Other layers

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Human-approval gating for irreversible writes (`DestructiveGuard`)