Skip to content

feat: add get_documentation tool to global mode ai chat#9489

Open
centdix wants to merge 3 commits into
mainfrom
add-get-documentation-global-chat
Open

feat: add get_documentation tool to global mode ai chat#9489
centdix wants to merge 3 commits into
mainfrom
add-get-documentation-global-chat

Conversation

@centdix

@centdix centdix commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Summary

The global-mode AI chat could not look up Windmill product documentation, while navigator (and ask/api) modes have a get_documentation tool for exactly that. This PR registers the same shared tool in global mode so the global assistant can answer "how does X work in Windmill" questions from authoritative platform docs instead of hallucinating.

Changes

  • Import the existing exported getDocumentationTool from navigator/core (the same tool already shared by ask and api modes — no duplication) and register it in the globalTools array, alongside the other informational tools.
  • Add a system-prompt rule describing when to use get_documentation, explicitly distinguishing it from the existing get_instructions tool (product docs vs. authoring guidance for a specific item type) so the model doesn't confuse the two.
  • Extend the existing globalTools presence test to assert get_documentation is registered.
  • Add three ai_evals boundary cases to the global suite guarding the get_documentation / get_instructions split (tool-use validation, skipJudge where no draft is produced):
    • global-test27 — vocabulary trap: "difference between a resource and a variable?" must use get_documentation, must not call get_instructions(subject: resource).
    • global-test28 — pure platform-mechanics question ("how does Windmill decide which worker runs a job?"): must use get_documentation, must not call get_instructions.
    • global-test29 — authoring (create a script): must use get_instructions + write_script, must not call get_documentation.

AI eval verification

Ran the three new cases across four providers via ai_evals (bun run cli -- run global <ids> --models sonnet,gpt-5.5,gemini-3.1-pro-preview,deepseek-v4-pro). 12/12 pass.

Case sonnet gpt-5.5 gemini-3.1-pro deepseek-v4-pro
test27 — resource vs variable (vocab trap)
test28 — worker dispatch
test29 — authoring

The multi-model run caught a false-negative in an earlier draft of test28: its original prompt ("how do retries/error handling work for a step") is authoring-adjacent, so gemini and deepseek correctly consulted docs but also pulled get_instructions — single-model (sonnet) testing had hidden this. test28 was reworked to a pure platform-mechanics question where forbidding get_instructions is fair, and the boundary now holds across all four providers.

Caveat: evals are single-run per model, so this is not a flakiness guarantee.

Test plan

  • npm run check passes (frontend type-check) — not yet run locally
  • globalTools unit test asserting get_documentation is present passes — added but the frontend vitest suite was not executed locally (validated by reasoning)
  • get_documentation / get_instructions boundary verified via ai_evals across sonnet / gpt-5.5 / gemini-3.1-pro / deepseek-v4-pro (12/12)
  • Manual: in the global-mode AI chat, ask a "how does X work in Windmill" question → assistant invokes get_documentation and answers from the returned docs (requires a configured AI provider + the /api/inkeep docs backend)

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 9, 2026

Copy link
Copy Markdown

Deploying windmill with  Cloudflare Pages  Cloudflare Pages

Latest commit: 588575d
Status: ✅  Deploy successful!
Preview URL: https://b269e74e.windmill.pages.dev
Branch Preview URL: https://add-get-documentation-global.windmill.pages.dev

View logs

@centdix centdix force-pushed the add-get-documentation-global-chat branch from 324434f to 35ab6ad Compare June 9, 2026 14:51
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@centdix centdix force-pushed the add-get-documentation-global-chat branch from 35ab6ad to 8391dba Compare June 9, 2026 15:10
@centdix centdix marked this pull request as ready for review June 9, 2026 15:12
@claude

claude Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Claude finished @centdix's task in 2m 31s —— View job


PR Review

Good to merge — clean, minimal, well-tested addition. No blocking issues.

What I verified

  • Tool reuse & typinggetDocumentationTool is imported from navigator/core.ts (where it's already shared by ask/api modes) and is typed Tool<{}>, matching globalTools: Tool<{}>[]. No duplication, no type mismatch. ✅ (global/core.ts:75, navigator/core.ts:321)
  • No circular importnavigator/core.ts does not import from global/core.ts, so the new global → navigator import introduces no cycle. ✅
  • Context independence — the underlying getDocumentation() only does a fetch('/api/inkeep', …) with the request arg; it carries no navigator-specific page context, so it behaves identically in global mode. ✅ (navigator/core.ts:237)
  • Tool name consistency — def name get_documentation matches the unit-test assertion and all three eval cases. ✅
  • Eval schemarequiredToolsUsed / forbiddenToolsUsed / draftCountExactly / skipJudge are all fields already used in global.yaml, flow.yaml, and script.yaml, so the new cases are schema-valid. ✅

Notes (non-blocking)

  • The system-prompt rule does a good job disambiguating get_documentation (product docs) from get_instructions (authoring guidance) — that distinction is the main risk with overlapping informational tools, and the three boundary evals (vocab trap, pure-mechanics, authoring) cover both directions of confusion. The write-up's note about reworking test28 after the multi-model run caught a false-negative is exactly the right instinct.
  • error handling in getDocumentationTool already degrades gracefully (returns a "pursuing with the user request…" string rather than throwing), so a flaky /api/inkeep won't break a global-mode turn. Good that this is inherited rather than reimplemented.

Test coverage

  • Frontend (TS) — appropriate. The pure-logic surface (globalTools registration) gets a unit-test assertion in the existing core.test.ts presence test; this is the right kind of test for this file (it already has a sibling .test.ts). No Svelte component tests expected here.
  • AI evals — three boundary cases added across both directions; PR reports 12/12 across four providers. Reasonable; the single-run-per-model flakiness caveat is acknowledged.
  • Manual verification still worth doing before merge: run npm run check (the PR checklist marks it not-yet-run locally) and the core.test.ts vitest suite to confirm the type-check and presence assertion pass in CI. Optionally, in a global-mode chat with a configured AI provider + /api/inkeep backend, ask a "how does X work in Windmill" question and confirm the assistant invokes get_documentation and answers from the returned docs.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Pi Review

Good to merge

What this PR does

Registers the existing getDocumentationTool (already used by navigator/ask/api chat modes) in the global-mode AI chat, so the global assistant can look up Windmill product documentation. Adds a system-prompt rule clearly distinguishing get_documentation (product docs) from get_instructions (authoring guidance), extends the existing unit test assertion, and adds three ai_evals boundary cases verifying the distinction holds across four LLM providers.

Analysis

Import and registrationgetDocumentationTool is a named export from navigator/core (line 321), used there and in navigatorTools. Importing it into global/core.ts and appending it to globalTools is correct and avoids duplication. The tool def name (get_documentation) matches the system-prompt reference.

System prompt rule (line 605) — placed right after the existing get_instructions rule, naturally grouped with the authoring-guidance rules. The wording "This is distinct from get_instructions, which returns authoring guidance for writing a specific item type" directly addresses the confusion the eval cases guard against.

No new public surfaces — the only user-facing change is tool availability in global chat. No new APIs, no auth changes, no Svelte components.

No AGENTS.md violations — no svelte components were modified, no banned patterns, no stale generated files.

Test coverage

Frontend — one line added to an existing unit test (expect(names).toContain('get_documentation')), which is appropriate for this kind of registration change. No new pure-logic utilities were introduced.

AI evals — three well-designed boundary cases (global-test27, global-test28, global-test29) guard the get_documentation / get_instructions split across four providers (12/12 pass). The cases use skipJudge with tool-use validation since no draft is produced, which is the correct pattern for informational-only interactions.

Manual verification — should be performed before merge: open the global-mode AI chat with a configured AI provider and docs backend (/api/inkeep), ask "how does Windmill decide which worker runs a job?", and confirm the assistant invokes get_documentation and answers from the returned docs.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Codex Review

Good to merge

No issues found. Checked for bugs, security, public-surface concerns, and AGENTS.md compliance.

Test Coverage

Frontend coverage is appropriate: the existing global tool test now asserts get_documentation is exposed, and the AI evals cover the intended get_documentation / get_instructions boundary.

Manual verification before merge: ask global mode a Windmill concept question and confirm it calls get_documentation without creating drafts.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 3 files

Re-trigger cubic

…-global-chat

# Conflicts:
#	ai_evals/cases/global.yaml
#	frontend/src/lib/components/copilot/chat/global/core.test.ts
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Pi Review

Good to merge

What this PR does

Registers the existing getDocumentationTool (already shared across navigator, ask, and api chat modes) in the global-mode AI chat, adds a system-prompt rule distinguishing get_documentation (product docs) from get_instructions (authoring guidance), extends the existing unit test, and adds three ai_evals boundary cases covering the disambiguation in both directions across four LLM providers.

Analysis

Import and registrationgetDocumentationTool is a named export from navigator/core.ts (line 321), already used there and in navigator/ask/api tool arrays. Importing it into global/core.ts and appending it to globalTools avoids duplication and is type-correct (Tool<{}>[]). No circular dependency exists — navigator/core.ts has no import from global/core.ts.

System-prompt rule (global/core.ts:634) — placed immediately after the existing get_instructions rule, naturally grouped with the authoring-guidance instructions. The wording "This is distinct from get_instructions, which returns authoring guidance for writing a specific item type" directly addresses the confusion the eval cases guard against.

Underlying behaviorgetDocumentationTool.fn calls getDocumentation(args) which does a standalone fetch('/api/inkeep', …) with no navigator-specific page context. It degrades gracefully on error (returns a string rather than throwing), inherited from the shared implementation.

No new public surfaces — no pub fn, no exported Svelte components, no API changes. No AGENTS.md violations — no svelte files were modified, no $bindable(default_value) patterns, no backend changes.

Eval schema — the new cases use requiredToolsUsed, forbiddenToolsUsed, draftCountExactly, and skipJudge — all already established in global.yaml and other eval suites. The eval ids use a consistent naming convention (global-test29-doc-resource-vs-variable-concept, global-test30-doc-feature-explanation, global-test31-authoring-uses-instructions-not-doc).

Test coverage

  • Frontend (TS) — one line added to the existing core.test.ts presence test asserting get_documentation is in the global tools array. Appropriate for a registration change — no new pure-logic utilities were introduced. The file already has a sibling .test.ts.
  • AI evals — three well-designed boundary cases verifying the get_documentation / get_instructions split across four providers (12/12 pass). The cases correctly use skipJudge with tool-use validation since these interactions produce no draft. The multi-model run (not just sonnet) caught a false-negative in an earlier draft of test28, which the PR description acknowledges.
  • Manual verification still worth doing before merge: run npm run check (the PR checklist marks it not-yet-run locally) and the core.test.ts vitest suite to confirm the type-check and presence assertion pass. Optionally, in a global-mode chat with a configured AI provider and /api/inkeep backend, ask "how does Windmill decide which worker runs a job?" and confirm the assistant invokes get_documentation and answers from the returned docs.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Codex Review

Good to merge

No issues found. Checked for bugs, security, public-surface concerns, AGENTS.md compliance, and prior PR discussion.

Test coverage

Frontend coverage is appropriate: the existing global tool presence test now asserts get_documentation, and the added AI evals cover the get_documentation / get_instructions boundary. No Svelte component tests are expected for this change.

Manual verification still worth doing before merge: in global-mode AI chat with an AI provider and /api/inkeep available, ask a Windmill documentation question and confirm the assistant calls get_documentation without creating drafts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant