Skip to content

feat(expert): human-in-the-loop tool permissions (#421)#7639

Open
andypalmi wants to merge 34 commits into
feat/408-expert-plan-modefrom
feat/421-expert-tool-permissions
Open

feat(expert): human-in-the-loop tool permissions (#421)#7639
andypalmi wants to merge 34 commits into
feat/408-expert-plan-modefrom
feat/421-expert-tool-permissions

Conversation

@andypalmi

@andypalmi andypalmi commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Human-in-the-loop tool permissions for the Expert

Implements per-tool human-in-the-loop permissions for the Expert's flow-building tools, in the immersive editor, as described in FlowFuse/product#421. The builder (and their team role) controls which flow-building actions the Expert may run, which need approval, and which are off limits, so it never makes a change they would not have allowed.

Stacked on #7635 (feat/408-expert-plan-mode), which is the base of this PR and should merge first.

What it does

  • Inline approval card in chat when a tool's policy is Ask: friendly tool name, action type (Read / Write / Delete) and the concrete call parameters as prettified JSON, with Allow / Always allow / Deny / Always deny. "Always allow" and "Always deny" apply for the rest of the current chat and reset on Start Over and on refresh; after a choice the card shows exactly what was picked and collapses the payload. The agent pauses on the round-trip with no session timeout, however long the user takes; the chat stop button cancels it (treated as denied).
  • Per-team settings (in the Expert settings dialog): permissions are saved per team. Each tool group (flow-building, platform) has its own always-visible default permissions — a per-action-type default (Always allow / Ask / Always deny) for read, write and delete — and collapses its individual-tool overrides behind an accordion. The policy control is a fast three-button toggle rather than a dropdown.
  • Per-tool overrides that stay put: a permission set on an individual tool overrides its type default and keeps that setting until reset. Each type default shows an "N set individually" count of the tools in that scope carrying their own saved permission, and a Reset action beside it returns those tools to the default. Session-only grants are excluded from the count — they appear per-tool and reset on their own.
  • Make permanent: a tool granted only for the current chat shows a "Make permanent" action to save that choice for the team.
  • Section ordering by context: flow-building tools lead in the immersive editor; platform tools lead in the app.
  • Role inheritance, fail-closed: read-only team members cannot enable or trigger write/delete tools and see why; the agent also fails closed server-side.
  • Version gating: each tool carries a min/max nr-assistant version. Tools render as available, "update required", or deprecated against the instance's nr-assistant version. Versioned variants (e.g. Manage Groups v1/v2) collapse into one row and resolve to the in-range variant, nudging an update to the newest variant's min version when behind.

Architecture

  • The agent decides policy at the toolsNode seam (sibling of the plan-mode gate): role check first, then per-tool policy. allow runs, deny feeds the denial back to the model so it adapts and explains, ask publishes expert:tool-approval and awaits the browser's decision.
  • The flow-building tool catalog is served over the agent's GET /mcp/flow-tools endpoint (friendly name + scope + version window only). Forge exposes GET /api/v1/expert/mcp/tools, which proxies that endpoint and returns the merged catalog: FlowFuse platform tools are curated into the same array (tagged as a platform group) — wired and commented out until the platform-tool work is merged, at which point it's a one-line switch. Every chat response carries a hash of the flow-building catalog; the browser refetches only when the hash diverges, so it stays correct across rolling deploys where instances can be on different versions.
  • Saved per-team choices and per-chat session grants live in the existing product-assistant / product-expert Pinia stores.

UI

The settings panel follows existing FlowFuse patterns: FormHeading for section titles, ff-data-table for the defaults and tool lists, ff-accordion to collapse the per-tool detail, and the shared three-button toggle for each policy control, so each tool name lines up with its own control across the row border.

Out of scope (follow-ups)

  • Admin-configurable team-wide default policy (DB migration + admin UI + server enforcement).
  • Enabling platform (non-flow-building) tools in the catalog, once the platform-tool work is merged into the agent. The forge endpoint has the curation wired and commented, ready to switch on.

Testing

  • Build + color/eslint lint green.
  • Automated unit tests:
    • forge/ee/routes/expert/index_spec.js (new MCP tools Endpoint block): GET /mcp/tools auth (401 for instance/device tokens), team-access (404 for non-members), missing teamId (400 from the querystring schema), the flow-tools catalog + hash proxy (asserting the upstream /mcp/flow-tools URL and service token), the empty-response defaults (catalog: [], hash: null), and upstream error-status propagation.
    • frontend/src/stores/product-assistant.spec.js (new tool-permissions block): the permission-resolution engine, i.e. the classOf/groupOf helpers, per-team class defaults, saved vs. session policy resolution, resolvedToolPermissions, version gating (toolAvailabilityFor), catalog/preference/override mutations, resetGroupClassPreferences, promoteSessionOverride, and the pending-approval registry.
    • frontend/src/stores/product-expert-tool-permissions.spec.js (new): catalog fetch (success / no-team / error) and the approval round-trip (session short-circuit, resolve, always-allow, always-deny, cancel).
  • Manual: catalog populates on opening the immersive editor; Ask shows the card and pauses with no timeout (Allow applies to canvas, Deny explains gracefully); "Always allow"/"Always deny" apply for the chat and reset on Start Over / refresh; "Make permanent" saves the choice per team; a per-tool override holds against changing the type default, the "N set individually" count reflects it, and Reset returns those tools to the default; per-team settings stay separate across team switches and survive reload; read-only role sees write/delete disabled with a reason and cannot trigger them; leaving immersive hides the panel and stops sending permissions; chat stop while a card is open recovers cleanly.

Requires matching agent-side changes.

Refs FlowFuse/product#421

Screenshots

Expert Settings with Tools permissions

Screenshot 2026-07-01 at 17 24 10

Counter of how many tools have different permissions than their scope's permission

Screenshot 2026-07-01 at 17 24 32

Permissions reset and change behaviour per scope

Screen.Recording.2026-07-01.at.17.24.46.mov

Option to save a permission set for the current session

Screenshot 2026-07-01 at 17 32 07
Screen.Recording.2026-07-01.at.17.32.09.mov

Approval Cards

Screen.Recording.2026-07-01.at.17.30.45.mov
Screenshot 2026-07-01 at 17 31 43

cstns and others added 6 commits June 22, 2026 19:28
Add per-tool approval for the Expert's flow-building tools in the immersive
editor. The agent gates each tool call at the toolsNode seam by class
(read/write/delete) and per-tool preference; write/delete default to Ask and
surface an inline approval card (Allow / Always allow / Never) that holds the
call open with no session timeout, while read defaults to allow.

- Catalog delivered over HTTP (GET /api/v1/expert/mcp/tools), curated to
  friendly names so raw tool identifiers never reach the browser; a per-response
  hash triggers a background refetch when the catalog drifts.
- HITL state consolidated into the product-assistant store (defaults,
  per-tool preferences, pending-approval map) with SemVer version gating.
- Settings panel groups versioned tool variants into one family and points
  update hints at the newest variant's required version.
- Role inheritance is fail-closed: read-only members cannot enable or trigger
  write/delete tools and are shown why.
Use FormHeading for the section titles and ff-data-table for both the
action-type defaults and the flow-building tool list, replacing the
bespoke section/group styling and the non-standard uppercase scope
headers. Bordered table rows pair each tool with its permission control
across the row rather than leaving them to float across whitespace; tool
scope moves into a Type column.

The approval card no longer sends or renders a tool summary; the tool
name, scope and call parameters describe the action.
@andypalmi andypalmi force-pushed the feat/421-expert-tool-permissions branch from 8a97b8e to bdb36fb Compare June 30, 2026 13:46
@codecov

codecov Bot commented Jun 30, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 91.66667% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 76.59%. Comparing base (e857547) to head (50de406).

Files with missing lines Patch % Lines
forge/ee/routes/expert/index.js 91.66% 1 Missing ⚠️
Additional details and impacted files
@@                    Coverage Diff                     @@
##           feat/408-expert-plan-mode    #7639   +/-   ##
==========================================================
  Coverage                      76.58%   76.59%           
==========================================================
  Files                            413      413           
  Lines                          21849    21856    +7     
  Branches                        5760     5763    +3     
==========================================================
+ Hits                           16733    16740    +7     
  Misses                          5116     5116           
Flag Coverage Δ
backend 76.59% <91.66%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…nd platform tools

Fetch the tool catalog when the Expert panel mounts (not only in the
editor) so the permissions settings render wherever the Expert is.

Split the settings into a Flow Building Tools section, with its own
per-action-type default permissions, and a separate FlowFuse Platform
Tools section (a placeholder until those tools ship, with TODOs marking
where they get mapped in). Flow-building tools are listed everywhere but
noted as usable only from an instance editor.
- Show plain Read / Write / Delete scope instead of phrases like Read only
- Stop the Setup Guide badge rendering above the approval card
- Disable the action buttons as soon as a choice is made
andypalmi added 2 commits June 30, 2026 16:35
…rmissions

# Conflicts:
#	frontend/src/components/expert/components/ExpertChatInput.vue
Raise the conversation-history expiry from 28 to 30 minutes (warning at
27), so the human-in-the-loop tool-approval wait, which is bounded by the
session lifetime, has the full 30-minute window the agent now allows.
Revert the 30-minute expiry back to 28 (warning at 25). The agent clears
old transactions/context at 30 minutes, so the chat must expire a moment
earlier to avoid referencing backend history that has already been purged.
The tool-approval wait is bounded by this 28-minute session lifetime.
Replace the flat key/value list on the approval card with a prettified
JSON view of the call payload. Adds a small single-value JsonViewer that
reuses the prettify + word-wrap + horizontal-scroll presentation of the
snapshot comparison diff panel, without its two-sided diff machinery. The
payload is prettified by default; an ff-button Wrap toggle appears for
long lines and reflects its on/off state rather than changing its label.
andypalmi added 3 commits July 1, 2026 00:31
Harden the JSON payload viewer against malformed input and collapse the
payload once a decision is made.

- JsonViewer stringify can no longer throw: circular refs, BigInt and any
  other non-serialisable value fall back to a circular-safe pass, then to a
  plain coercion, so a bad payload never breaks the approval card.
- Add a live collapse toggle to JsonViewer (collapsible + defaultCollapsed).
  The header caret expands/collapses at any time; the parent can seed the
  initial state.
- ToolApprovalCard collapses the payload once the call is allowed, always
  allowed or denied (local decision or round-tripped status), while leaving
  the toggle live so the user can re-expand it.
Drop the circular-safe/BigInt fallback machinery from the payload viewer.
Tool-call params are plain JSON; if they somehow can't be serialised, show
a simple 'Could not display the payload.' message rather than placeholder
markers.
- Replace the unicode caret on the JSON payload collapse toggle with the
  standard rotating ChevronRightIcon (matches ToolCallItem section headers).
- Add a 'bare' prop to MessageBubble that strips the bubble background and
  padding, and use it for tool-approval answers so the approval card renders
  as a standalone card instead of a card nested inside an AI bubble.
Co-authored-by: ppawlowski <piotr@flowfuse.com>
…rmissions

# Conflicts:
#	frontend/src/components/expert/components/ExpertChatInput.vue
#	frontend/src/components/expert/components/messages/components/AnswerWrapper.vue
…ttings redesign (#421)

- Scope saved tool permissions per team (defaults and per-tool overrides),
  replacing the single global store.
- Make approval-card 'Always allow' / 'Always deny' apply to the current chat
  only: they reset on Start Over and refresh, and the card shows exactly what
  was chosen. Add an 'Always deny' button. A session grant can be made permanent
  from the settings dialog.
- Redesign the settings panel: always-visible read/write/delete defaults, the
  per-tool list collapsed in an accordion, and a three-button toggle in place of
  the dropdown ('Always allow' / 'Ask' / 'Always deny'). Order flow-building
  first in the editor and platform first in the app. Surface any session grant
  with a 'Make permanent' action.
- Add a size prop to ToggleButtonGroup so dense contexts set button size instead
  of reaching into it with :deep.
- Rename the agent flow-tools endpoint to /mcp/flow-tools; the forge route
  returns the merged catalog and is prepared to fold in platform tools (wired
  and disabled).
- Match the approval card width to the tool-calls summary strip.
…e sizing

- Split read/write/delete defaults per tool group (flow-building and platform)
- Honour a session Always allow/deny without re-prompting later in the same loop
- Show the pressed decision on the approval card immediately
- Fix policy toggle label overflow and widen the settings dialog
…ly-set tools

Add a Reset action beside each read/write/delete class default that clears
the saved per-tool preferences for that scope, so tools detached from the
default can be returned to it. A "N set individually" counter shows how many
tools in the scope carry their own saved permission (session-only grants are
excluded — they already appear per-tool and reset on their own). The intro now
documents that a per-tool setting overrides its type default until reset.
@andypalmi andypalmi self-assigned this Jul 1, 2026
Comment thread frontend/src/stores/product-expert.js Outdated
for (const m of this._agentStore.messages) {
if (!Array.isArray(m.answer)) continue
for (const a of m.answer) {
if (a.kind === 'tool-approval' && a.status === 'pending') a.status = 'denied'

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't test this without the agent side, but reading the code: on Stop, cancelPendingToolApprovals sets status='denied' on the store answer, but the card renders a shallow copy of it useStreamingList({ shallow: true }), so its status prop never update. Worth confirming, but looks like Stop won't resolve an open approval card.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, confirmed. The card renders a detached streaming copy of the answer (AiMessage uses useStreamingList with shallow: true), so writing the status onto the store message never reached it. On Stop the buttons stayed live.

Fixed by recording the outcome in a reactive per-id map (toolApprovalStatuses) on the product-assistant store. AnswerWrapper now feeds the card its status from that map, so an external resolution (Stop / Start Over) updates a card the user never pressed. localStatus stays for instant feedback on the user's own press. Added store and product-expert tests covering the denied-on-cancel path.


// A new chat drops the per-session tool grants ("Always allow/deny for this chat").
useProductAssistantStore().clearSessionToolOverrides()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want a cancelPendingToolApprovals() here too?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. startOver now calls cancelPendingToolApprovals() first, so any approval still awaiting a decision resolves (as denied) and the agent's paused tool call unblocks instead of hanging on a message we are about to drop. It also clears toolApprovalStatuses alongside the session overrides.

@n-lark n-lark left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey so I cannot test the approve/deny part of this in that chat due to the staging env not having posthog synced up. The permissions page under settings UI looks fine to me but I don't feel comfortable approving this since I cannot test and am unfamiliar with this feature. I'd recommend @cstns or @Steve-Mcl to takes a look.

…421)

Add automated coverage for the human-in-the-loop tool-permission work:

- forge GET /mcp/tools: auth (401 instance/device), team-access (404),
  missing teamId (400), catalog+hash proxy, empty-response defaults and
  upstream error propagation.
- product-assistant store: permission-resolution engine (class/group
  helpers, per-team defaults, per-tool and session overrides, resolved
  permissions, version gating, and the pending-approval registry).
- product-expert store: catalog fetch and the approval round-trip
  (session short-circuit, resolve/always-allow/always-deny, cancel).
The approval card renders a detached streaming copy of its answer, so
writing a resolved status onto the store message never reached it. On
chat stop the card stayed on its Allow/Deny buttons even though the
pending call had been denied.

Record approval outcomes in a reactive per-id map on the product-assistant
store and have AnswerWrapper feed the card its status from that map, so an
external resolution (chat stop / Start Over) updates a card the user never
pressed. Start Over now also cancels open approvals and clears the map.
@andypalmi

Copy link
Copy Markdown
Contributor Author

Thanks for the review. Pushed a fix for the Stop issue you spotted.

Root cause: the approval card renders a detached streaming copy of its answer (AiMessage uses useStreamingList with shallow: true), so a status written onto the store message never reached the card. Clicking Allow/Deny worked only because the card tracks its own localStatus; the external Stop path had no way in, so the buttons stayed live.

Fix: approval outcomes are now recorded in a reactive per-id map (toolApprovalStatuses) on the product-assistant store, and AnswerWrapper feeds the card its status from that map. That covers external resolutions, Stop and Start Over, on a card the user never pressed. Start Over also cancels open approvals first so the paused tool call unblocks. This is in-memory session state only, not persisted, same lifecycle as the session overrides. Added store and product-expert tests for the denied-on-cancel path.

On testing: understood you cannot exercise approve/deny on staging without the agent side synced. @cstns or @Steve-Mcl, a second look would be welcome given the reactivity change.

cstns and others added 4 commits July 1, 2026 22:53
#7598)

Co-authored-by: Steve-Mcl <sdmclaughlin@gmail.com>
Co-authored-by: Stephen McLaughlin <44235289+Steve-Mcl@users.noreply.github.com>
Co-authored-by: Andrea Palmieri <76187074+andypalmi@users.noreply.github.com>
Co-authored-by: andypalmi <andrea@flowfuse.com>
…omationsHandler integration

# Conflicts:
#	frontend/src/stores/context.js
…421)

Curate the FlowFuse platform automation tools from the handler singleton
(app.comms.platformAutomation) into the /mcp/tools catalog alongside the
flow-building tools, tagged group:'platform' so the UI routes them to their
own section with their own read/write/delete defaults. Read/write/delete
class is derived from each tool's MCP annotations; platform tools carry no
nr-assistant version window.
Replace the mid-turn approval round-trip with a stateless defer/resume flow so
the agent never stays resident waiting on a human. When a turn needs approval it
returns the approval card(s) and ends; the browser collects the decisions and
sends them back in one resume message that continues the turn.

- product-expert: track the open approval batch, resume once every card is
  answered, transport-agnostic (MQTT push or awaited HTTP reply).
- product-assistant: drop the promise-based pending-approval registry; the store
  now only records per-card outcome statuses and session grants.
- tests updated for the batch model.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants