feat: add get_documentation tool to global mode ai chat by centdix · Pull Request #9489 · windmill-labs/windmill

centdix · 2026-06-09T14:33:06Z

Summary

The global-mode AI chat could not look up Windmill product documentation, while navigator (and ask/api) modes have a get_documentation tool for exactly that. This PR registers the same shared tool in global mode so the global assistant can answer "how does X work in Windmill" questions from authoritative platform docs instead of hallucinating.

Changes

Import the existing exported getDocumentationTool from navigator/core (the same tool already shared by ask and api modes — no duplication) and register it in the globalTools array, alongside the other informational tools.
Add a system-prompt rule describing when to use get_documentation, explicitly distinguishing it from the existing get_instructions tool (product docs vs. authoring guidance for a specific item type) so the model doesn't confuse the two.
Extend the existing globalTools presence test to assert get_documentation is registered.
Add three ai_evals boundary cases to the global suite guarding the get_documentation / get_instructions split (tool-use validation, skipJudge where no draft is produced):
- global-test27 — vocabulary trap: "difference between a resource and a variable?" must use get_documentation, must not call get_instructions(subject: resource).
- global-test28 — pure platform-mechanics question ("how does Windmill decide which worker runs a job?"): must use get_documentation, must not call get_instructions.
- global-test29 — authoring (create a script): must use get_instructions + write_script, must not call get_documentation.

AI eval verification

Ran the three new cases across four providers via ai_evals (bun run cli -- run global <ids> --models sonnet,gpt-5.5,gemini-3.1-pro-preview,deepseek-v4-pro). 12/12 pass.

Case	sonnet	gpt-5.5	gemini-3.1-pro	deepseek-v4-pro
test27 — resource vs variable (vocab trap)	✅	✅	✅	✅
test28 — worker dispatch	✅	✅	✅	✅
test29 — authoring	✅	✅	✅	✅

The multi-model run caught a false-negative in an earlier draft of test28: its original prompt ("how do retries/error handling work for a step") is authoring-adjacent, so gemini and deepseek correctly consulted docs but also pulled get_instructions — single-model (sonnet) testing had hidden this. test28 was reworked to a pure platform-mechanics question where forbidding get_instructions is fair, and the boundary now holds across all four providers.

Caveat: evals are single-run per model, so this is not a flakiness guarantee.

Test plan

npm run check passes (frontend type-check) — not yet run locally
globalTools unit test asserting get_documentation is present passes — added but the frontend vitest suite was not executed locally (validated by reasoning)
get_documentation / get_instructions boundary verified via ai_evals across sonnet / gpt-5.5 / gemini-3.1-pro / deepseek-v4-pro (12/12)
Manual: in the global-mode AI chat, ask a "how does X work in Windmill" question → assistant invokes get_documentation and answers from the returned docs (requires a configured AI provider + the /api/inkeep docs backend)

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cloudflare-workers-and-pages · 2026-06-09T14:36:24Z

Deploying windmill with Cloudflare Pages

Latest commit:	`588575d`
Status:	✅ Deploy successful!
Preview URL:	https://b269e74e.windmill.pages.dev
Branch Preview URL:	https://add-get-documentation-global.windmill.pages.dev

View logs

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

claude · 2026-06-09T15:13:07Z

Claude finished @centdix's task in 2m 31s —— View job

PR Review

Good to merge — clean, minimal, well-tested addition. No blocking issues.

What I verified

Tool reuse & typing — getDocumentationTool is imported from navigator/core.ts (where it's already shared by ask/api modes) and is typed Tool<{}>, matching globalTools: Tool<{}>[]. No duplication, no type mismatch. ✅ (global/core.ts:75, navigator/core.ts:321)
No circular import — navigator/core.ts does not import from global/core.ts, so the new global → navigator import introduces no cycle. ✅
Context independence — the underlying getDocumentation() only does a fetch('/api/inkeep', …) with the request arg; it carries no navigator-specific page context, so it behaves identically in global mode. ✅ (navigator/core.ts:237)
Tool name consistency — def name get_documentation matches the unit-test assertion and all three eval cases. ✅
Eval schema — requiredToolsUsed / forbiddenToolsUsed / draftCountExactly / skipJudge are all fields already used in global.yaml, flow.yaml, and script.yaml, so the new cases are schema-valid. ✅

Notes (non-blocking)

The system-prompt rule does a good job disambiguating get_documentation (product docs) from get_instructions (authoring guidance) — that distinction is the main risk with overlapping informational tools, and the three boundary evals (vocab trap, pure-mechanics, authoring) cover both directions of confusion. The write-up's note about reworking test28 after the multi-model run caught a false-negative is exactly the right instinct.
error handling in getDocumentationTool already degrades gracefully (returns a "pursuing with the user request…" string rather than throwing), so a flaky /api/inkeep won't break a global-mode turn. Good that this is inherited rather than reimplemented.

Test coverage

Frontend (TS) — appropriate. The pure-logic surface (globalTools registration) gets a unit-test assertion in the existing core.test.ts presence test; this is the right kind of test for this file (it already has a sibling .test.ts). No Svelte component tests expected here.
AI evals — three boundary cases added across both directions; PR reports 12/12 across four providers. Reasonable; the single-run-per-model flakiness caveat is acknowledged.
Manual verification still worth doing before merge: run npm run check (the PR checklist marks it not-yet-run locally) and the core.test.ts vitest suite to confirm the type-check and presence assertion pass in CI. Optionally, in a global-mode chat with a configured AI provider + /api/inkeep backend, ask a "how does X work in Windmill" question and confirm the assistant invokes get_documentation and answers from the returned docs.

github-actions · 2026-06-09T15:15:16Z

Pi Review

Good to merge

What this PR does

Registers the existing getDocumentationTool (already used by navigator/ask/api chat modes) in the global-mode AI chat, so the global assistant can look up Windmill product documentation. Adds a system-prompt rule clearly distinguishing get_documentation (product docs) from get_instructions (authoring guidance), extends the existing unit test assertion, and adds three ai_evals boundary cases verifying the distinction holds across four LLM providers.

Analysis

Import and registration — getDocumentationTool is a named export from navigator/core (line 321), used there and in navigatorTools. Importing it into global/core.ts and appending it to globalTools is correct and avoids duplication. The tool def name (get_documentation) matches the system-prompt reference.

System prompt rule (line 605) — placed right after the existing get_instructions rule, naturally grouped with the authoring-guidance rules. The wording "This is distinct from get_instructions, which returns authoring guidance for writing a specific item type" directly addresses the confusion the eval cases guard against.

No new public surfaces — the only user-facing change is tool availability in global chat. No new APIs, no auth changes, no Svelte components.

No AGENTS.md violations — no svelte components were modified, no banned patterns, no stale generated files.

Test coverage

Frontend — one line added to an existing unit test (expect(names).toContain('get_documentation')), which is appropriate for this kind of registration change. No new pure-logic utilities were introduced.

AI evals — three well-designed boundary cases (global-test27, global-test28, global-test29) guard the get_documentation / get_instructions split across four providers (12/12 pass). The cases use skipJudge with tool-use validation since no draft is produced, which is the correct pattern for informational-only interactions.

Manual verification — should be performed before merge: open the global-mode AI chat with a configured AI provider and docs backend (/api/inkeep), ask "how does Windmill decide which worker runs a job?", and confirm the assistant invokes get_documentation and answers from the returned docs.

github-actions · 2026-06-09T15:19:31Z

Codex Review

Good to merge

No issues found. Checked for bugs, security, public-surface concerns, and AGENTS.md compliance.

Test Coverage

Frontend coverage is appropriate: the existing global tool test now asserts get_documentation is exposed, and the AI evals cover the intended get_documentation / get_instructions boundary.

Manual verification before merge: ask global mode a Windmill concept question and confirm it calls get_documentation without creating drafts.

cubic-dev-ai

No issues found across 3 files

_{Re-trigger cubic}

…-global-chat # Conflicts: # ai_evals/cases/global.yaml # frontend/src/lib/components/copilot/chat/global/core.test.ts

github-actions · 2026-06-09T22:02:00Z

Pi Review

Good to merge

What this PR does

Registers the existing getDocumentationTool (already shared across navigator, ask, and api chat modes) in the global-mode AI chat, adds a system-prompt rule distinguishing get_documentation (product docs) from get_instructions (authoring guidance), extends the existing unit test, and adds three ai_evals boundary cases covering the disambiguation in both directions across four LLM providers.

Analysis

Import and registration — getDocumentationTool is a named export from navigator/core.ts (line 321), already used there and in navigator/ask/api tool arrays. Importing it into global/core.ts and appending it to globalTools avoids duplication and is type-correct (Tool<{}>[]). No circular dependency exists — navigator/core.ts has no import from global/core.ts.

System-prompt rule (global/core.ts:634) — placed immediately after the existing get_instructions rule, naturally grouped with the authoring-guidance instructions. The wording "This is distinct from get_instructions, which returns authoring guidance for writing a specific item type" directly addresses the confusion the eval cases guard against.

Underlying behavior — getDocumentationTool.fn calls getDocumentation(args) which does a standalone fetch('/api/inkeep', …) with no navigator-specific page context. It degrades gracefully on error (returns a string rather than throwing), inherited from the shared implementation.

No new public surfaces — no pub fn, no exported Svelte components, no API changes. No AGENTS.md violations — no svelte files were modified, no $bindable(default_value) patterns, no backend changes.

Eval schema — the new cases use requiredToolsUsed, forbiddenToolsUsed, draftCountExactly, and skipJudge — all already established in global.yaml and other eval suites. The eval ids use a consistent naming convention (global-test29-doc-resource-vs-variable-concept, global-test30-doc-feature-explanation, global-test31-authoring-uses-instructions-not-doc).

Test coverage

Frontend (TS) — one line added to the existing core.test.ts presence test asserting get_documentation is in the global tools array. Appropriate for a registration change — no new pure-logic utilities were introduced. The file already has a sibling .test.ts.
AI evals — three well-designed boundary cases verifying the get_documentation / get_instructions split across four providers (12/12 pass). The cases correctly use skipJudge with tool-use validation since these interactions produce no draft. The multi-model run (not just sonnet) caught a false-negative in an earlier draft of test28, which the PR description acknowledges.
Manual verification still worth doing before merge: run npm run check (the PR checklist marks it not-yet-run locally) and the core.test.ts vitest suite to confirm the type-check and presence assertion pass. Optionally, in a global-mode chat with a configured AI provider and /api/inkeep backend, ask "how does Windmill decide which worker runs a job?" and confirm the assistant invokes get_documentation and answers from the returned docs.

github-actions · 2026-06-09T22:02:53Z

Codex Review

Good to merge

No issues found. Checked for bugs, security, public-surface concerns, AGENTS.md compliance, and prior PR discussion.

Test coverage

Frontend coverage is appropriate: the existing global tool presence test now asserts get_documentation, and the added AI evals cover the get_documentation / get_instructions boundary. No Svelte component tests are expected for this change.

Manual verification still worth doing before merge: in global-mode AI chat with an AI provider and /api/inkeep available, ask a Windmill documentation question and confirm the assistant calls get_documentation without creating drafts.

feat: add get_documentation tool to global mode ai chat

2a05424

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

centdix force-pushed the add-get-documentation-global-chat branch from 324434f to 35ab6ad Compare June 9, 2026 14:51

test(ai_evals): add get_documentation vs get_instructions boundary cases

8391dba

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

centdix force-pushed the add-get-documentation-global-chat branch from 35ab6ad to 8391dba Compare June 9, 2026 15:10

centdix marked this pull request as ready for review June 9, 2026 15:12

centdix requested review from alpetric, hugocasa and rubenfiszel as code owners June 9, 2026 15:12

cubic-dev-ai Bot reviewed Jun 9, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into add-get-documentation…

588575d

…-global-chat # Conflicts: # ai_evals/cases/global.yaml # frontend/src/lib/components/copilot/chat/global/core.test.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add get_documentation tool to global mode ai chat#9489

feat: add get_documentation tool to global mode ai chat#9489
centdix wants to merge 3 commits into
mainfrom
add-get-documentation-global-chat

centdix commented Jun 9, 2026 •

edited

Loading

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

claude Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 9, 2026

Uh oh!

github-actions Bot commented Jun 9, 2026

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

github-actions Bot commented Jun 9, 2026

Uh oh!

github-actions Bot commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

centdix commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

AI eval verification

Test plan

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying windmill with Cloudflare Pages

Uh oh!

claude Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review

What I verified

Notes (non-blocking)

Test coverage

Uh oh!

github-actions Bot commented Jun 9, 2026

Pi Review

What this PR does

Analysis

Test coverage

Uh oh!

github-actions Bot commented Jun 9, 2026

Codex Review

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 9, 2026

Pi Review

What this PR does

Analysis

Test coverage

Uh oh!

github-actions Bot commented Jun 9, 2026

Codex Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

centdix commented Jun 9, 2026 •

edited

Loading

cloudflare-workers-and-pages Bot commented Jun 9, 2026 •

edited

Loading

claude Bot commented Jun 9, 2026 •

edited

Loading