WorkIQ MCP auth: re-consent UX and failure-mode polish (#441)#446
Merged
Conversation
Phase 5 of the WorkIQ integration. Closes three failure-mode gaps left after Phases 1-4: - Surface MsalUiRequiredException as an actionable auth_required tool error so the LLM stops retrying and returns a clear "reconnect M365" message. New FindReauthRequired walker + BuildReauthRequiredMessage helper in McpBridgeService, with a catch clause placed before the existing FindAuthChallenge catch. - Add WorkIqHealthTracker singleton that flips healthy/unhealthy on cache writes and silent-refresh failures, emits a structured log on every transition, and raises HealthChanged so the bridge can hide workiq-* tools from the published tool list while auth is broken. Patrols and chat sessions started during the unhealthy window never see those tools and stop generating execution_failed noise. - Double-click guards on the Blazor Connect and Reconnect buttons so a fast second click can't kick off a parallel device-code flow that would clobber the first one's success state. Adds deploy/workiq-setup.md with the re-consent flow and scheduled- tasks-during-expiry sections (stub for Phase 4 to extend). Tests: 9 WorkIqHealthTrackerTests + 7 MsalToolErrorMappingTests, all passing. Blazor button guards live purely in the UI layer per the plan (no bUnit in the project; component-level guard is acknowledged-fine). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 5 of the WorkIQ integration (design/workiq-phase5-plan.md). Builds on #442/#443/#444. Closes three failure-mode gaps:
auth_requiredtool error.MsalUiRequiredExceptionand friends used to fall into the generic "execution failed, retryable" bucket, so the LLM blindly retried. NewFindReauthRequiredwalker + catch clause inMcpBridgeServiceproducesToolError { Code=auth_required, IsRetryable=false }with a message that names the Blazor Reconnect/Connect button.WorkIqHealthTrackersingleton flips on cache writes / silent-refresh failures, emits a structured log on every transition, and raisesHealthChanged. The bridge subscribes and republishes the tool list with workiq-* servers removed (or re-added) so patrols and chat sessions started during the unhealthy window never call them. Direct calls still get theauth_requirederror from above.WorkIqConnect.razorandWorkIqReconnectBanner.razorgain_starting/_reconnectingbools +disabledattributes so a fast second click can't kick off a parallel device-code flow that clobbers the first one's success state.Also adds
deploy/workiq-setup.mdwith the re-consent flow and scheduled-tasks-during-expiry sections (a stub Phase 4 can extend with its own initial-setup material).Test plan
dotnet build RockBot.slnxclean (only pre-existing CA1416 / NU190x warnings)dotnet test RockBot.slnx— all new tests pass (9WorkIqHealthTrackerTests+ 7MsalToolErrorMappingTests)RepairTicketApplyPassTests.TimeoutBackoff_PassesEscalatingBudgets_ToVerifierthat passes in isolation (unrelated to this PR)auth_requiredtext, confirm patrol logs stop trying workiq tools, reconnect, confirm tools come back within ~1 secondNote: the Blazor button guards live purely in the .razor (no bUnit in the project); the plan explicitly permitted the UI-only guard.
🤖 Generated with Claude Code