Failsafe guards: approval gating, capability, token budget, hardened scheduling#25
Merged
Conversation
…ogging DestructiveGuard forces human approval for writes that are irreversible in practice but not flagged destructive: Gravity Forms field changes, taxonomy term edits, feed edits, redirect creation, memory saves, and Polylang translation relinks. On approval it injects confirm_destructive so the matching data-layer guards in gds-mcp let the change through. ToolRestrictor: fix a shadowed branch so terms-delete classifies as dangerous. System prompt: content-only scope + bug-report escalation, plus warnings that forms/terms/translation edits are unrevisioned. TokenBudget: per-user daily token cap (filterable) as a runaway-cost backstop. Audit-log approvals and denials in the approval path — approved (gated, destructive) actions execute there, not in MessageLoop, so they were previously never logged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Scheduled skills ran via WP-Cron with no current user and no approver, so (combined with __return_true abilities) they executed with no capability enforcement. They now run as the skill author, are skipped unless the author is an administrator, require manage_options to set a schedule, and surface tools left awaiting approval instead of silently doing nothing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
assistant__memory-save content is injected into every future system prompt. Cap the total entry count (filterable, matching the load limit) to bound prompt bloat and the prompt-injection surface; per-entry length limits already existed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Production-safety failsafes for the AI assistant. The principle: content is safe because it's recoverable (revisions + trash); this PR brings comparable guardrails to operations that aren't recoverable, and adds containment + observability. Companion to generoi/gds-mcp#18 (the data-layer guards these rely on).
Human-approval gating for irreversible writes (
DestructiveGuard)Approval previously fired only for
dangeroustools ormoderate+ destructive-annotated ones. Edits that are irreversible in practice but flagged non-destructive slipped through and executed silently. Now gated:forms-updatewhen it rewrites the field structureterms-update,feeds-update,redirects-manage(create)translations-link,strings-update,translations-machinememory-save(it persists into every future system prompt — anti-poisoning)On approval it injects
confirm_destructive, which the matching gds-mcp guards consume, so an approved change passes the data layer while non-chat callers stay protected.Other layers
/report-buginstead of papering over them with content edits.manage_optionsto schedule, and surface tools left awaiting approval instead of silently no-op-ing.TokenBudget): runaway-cost backstop alongside the per-request rate limiter. Filterable / env-configurable;0disables.MessageLoop, so they were never being logged. Denials now leave a trace too.dangerous).Tests
DestructiveGuardTest,TokenBudgetTest(new); extendedToolRestrictorTest,MemoryToolProviderTest,SkillSchedulerTest,ChatEndpointApprovalTest.🤖 Generated with Claude Code