feat(llm,github): tenant-scoped runtime overrides + GitHub polish#19
Merged
Conversation
Extract SignupValidation static helper consumed by both Blazor Signup page and /tenants/register endpoint so form-side and server-side rules cannot drift. Slug regex per plan: lowercase, internal hyphens only, 1-32 chars. Password: ≥12 chars + upper/lower/digit/symbol. Email via MailAddress round-trip. 16 new test cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wire MailHog into Aspire AppHost (SMTP 1025, UI 8025) and point the realm smtpServer block at it. Set realm verifyEmail:true so unverified users cannot log in. KeycloakAdminClient: when sendVerifyEmail flips on, leave emailVerified=false and trigger Keycloak's execute-actions- email; for self-signup (password supplied) the action list drops UPDATE_PASSWORD and keeps only VERIFY_EMAIL. Signup page now passes sendVerifyEmail:true and lands on /signup/verify-email with a hint pointing dev users at MailHog. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ITenantSignupService dispatches invite / slug / auto-create by request
shape. Keycloak-first ordering so a registry-row failure can roll back
the Keycloak user (new DeleteUserAsync on the admin client).
Invitations are stateless DataProtection time-limited tokens — no DB
table — issued by /tenants/{id}/invitations (admin) and decoded by
/tenants/invitations/preview (public). Trade-off: tokens cannot be
revoked before TTL; keep TTL short for high-trust environments.
Signup page now accepts ?invite=<token>, pre-fills the email from the
invitation, and the Workspace ID field becomes optional (blank →
auto-generated slug). 7 new service tests + saga rollback verified
via NSubstitute.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the four red flags called out in the Epic D plan: 1. agentic-web client: directAccessGrantsEnabled flipped to false so Keycloak no longer accepts ROPC on the code-flow client. 2. RequireHttpsMetadata defaults to true; only flipped off in appsettings.Development.json. Same change in JwtAuthExtensions. 3. Cookie SecurePolicy = Always in non-Development, SameAsRequest in Development (Aspire pins http://localhost:5180 there). 4. agentic-web-dev-secret and admin/admin no longer hardcoded in any .cs file. AppHost surfaces them as Aspire Parameters with dev defaults in appsettings.json; Web Program throws at startup if ClientSecret is missing outside Development. Standalone `dotnet run --project src/AgentOs.Web` now expects user-secrets to supply the dev secret (Aspire AppHost injects it automatically). grep agentic-web-dev-secret\|admin/admin --include=*.cs → no matches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds GET /tenants/{id}/members guarded by Admin policy AND a runtime
ITenantContext match (so an Admin in tenant A cannot list tenant B).
KeycloakAdminClient gains ListUsersByTenantAsync — paged GET + client-
side filter on the tenant attribute, plus a follow-up role-mappings
fetch per user. Bounded by max=200 for v1; larger realms need a
server-side q-search rewrite.
Two new Razor pages under /admin: Members.razor lists the tenant
roster and mints DataProtection invitation URLs (admin pastes them
into chat/email); Settings.razor shows tenant id + name + created-at
read-only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New IAuditLog (EF + Null impls) writes append-only rows to
tenants.audit_events for signup-completed, tenant-created, member-
invited, and invitation-minted actions. Writes are best-effort —
audit failure logs a warning but never breaks the surrounding flow.
Reads are tenant-scoped at the repository, then re-checked at the
endpoint against ITenantContext (Admin in tenant A cannot peek at
tenant B's trail).
GET /tenants/{id}/audit + /admin/audit Razor page show newest-first
rows. login.failed action is reserved for a follow-up — wiring it
through cookie / JWT events lands separately.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
KeycloakAdminClient: 4 tests for DeleteUserAsync (no-content, 404 swallowed, 500 throws) and ListUsersByTenantAsync (filter + role-mapping fetch). HttpTenantContext: 5 tests for the claims projection — missing-claim default, normal user, role flattening, IsAdmin off-by-role, and the multi-value tenant claim edge case. Epic D delta now sits at ~32 new tests (validation 16 + signup service 7 + admin REST 4 + tenant context 5). Full unit suite stays green (218 passed, 5 skipped live-smoke). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Microsoft.Extensions.* 10.0.0 → 10.0.8 - Microsoft.Extensions.Http.Resilience 10.0.0 → 10.6.0 - Microsoft.Extensions.Logging.Abstractions 10.0.6 → 10.0.8 - Microsoft.AspNetCore.* (OpenIdConnect / JwtBearer / OpenApi / SignalR Client / DataProtection.Abstractions) 10.0.0 → 10.0.8 - Microsoft.EntityFrameworkCore (+ .Design / .InMemory) 10.0.0 → 10.0.8 - Npgsql.EntityFrameworkCore.PostgreSQL 10.0.0 → 10.0.2 - Microsoft.Agents.AI* 1.7.0 → 1.8.0 - Aspire.Hosting* 13.3.3 → 13.3.5 (Keycloak preview stays on 13.3.3) - Microsoft.Playwright 1.49.0 → 1.60.0 Three csproj files gain explicit Microsoft.EntityFrameworkCore + .Relational PackageReference at 10.0.8 — without the pin, MAF's transitive 10.0.4 wins the conflict resolution, ModuleLoader ReflectionTypeLoadException blows up loading Relational 10.0.8 from Npgsql 10.0.2. Deferred (major bumps, separate PR): - xunit.v3 1.1 → 3.2 - Microsoft.NET.Test.Sdk 17.13 → 18.6 - coverlet.collector 6.0.4 → 10.0.1 - JsonSchema.Net 7.x → 9.2.1 Build: 0 warnings, 0 errors. Tests: 218 passed, 5 skipped (live-smoke). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the in-memory singleton RuntimeOverrides with a thin read / write-through adapter over IAppConfigStore. Every property access now resolves the current ITenantContext (via a per-call DI scope) and reads/writes the tenant's row in app_config — Tenant A's Anthropic key, Azure endpoint, and GitHub PAT can no longer leak to Tenant B on the same process. The pre-existing in-memory model leaked: HydrateRuntimeOverridesAsync ran once at startup with whatever tenant the bootstrap scope happened to resolve, and from then on every tenant saw the SAME values. That codepath is gone — Hydrate / Save extensions deleted, LlmModule no longer implements IInitializableModule. The IRuntimeOverrides interface keeps its sync getter/setter shape so LlmModule's key-pool factories (singleton) and GitHubPrService (transient) don't need to become async. Sync-over-async is bridged in the impl; EfAppConfigStore's 15s per-(tenant,key) cache means the bridge is hit at most once per cache window per property. GitHub polish: - New GitHubBaseUrl override → Octokit picks up the GHE endpoint when set, falls back to public github.com otherwise. - Clear "missing X for this tenant" message lists every absent field instead of throwing a generic exception. - NotFoundException / AuthorizationException from the base-branch GET are rewrapped with actionable hints (branch missing vs PAT bad). - Branch prefix renamed agentic-sdlc → agentos. 8 new tests: 5 tenant-isolation + 3 GitHub config branches. Full unit suite: 226 passed, 5 skipped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Change description
Closes a multi-tenant correctness bug in the LLM gateway and the GitHub integration, then polishes the GitHub PR opener.
The bug
RuntimeOverridesused to be an in-memory singleton populated once at startup byHydrateRuntimeOverridesAsync— running inside whatever DI scope the AppHost happened to spin up first. Whatever tenant that scope resolved to became the source of truth for the entire process. Every subsequent request — regardless of which tenant signed in — read the SAME_overrides.AnthropicApiKey,_overrides.GitHubPat, etc.In multi-tenant deployments this means tenant A's Anthropic key bills tenant B's pipeline runs, and tenant B's PR opener pushes to tenant A's GitHub repo. Not a hypothetical leak — it's the only behavior the old singleton could produce.
The fix
RuntimeOverridesbecomes a thin read / write-through adapter overIAppConfigStore. Every property access creates a per-call DI scope, resolves the currentITenantContext, and reads / writes the matchingapp_configrow.IAppConfigStore's existing 15s per-(tenant, key)cache absorbs the read pressure, so a key-pool factory closure that reads_overrides.AnthropicApiKeyon every LLM call only hits the DB once per cache window per tenant.The
IRuntimeOverridesinterface keeps its sync getter/setter shape —LlmModule's key-pool factories (registered asSingleton) andGitHubPrService(registered asTransient) don't need to become async. Sync-over-async is bridged inside the impl; ASP.NET Core has noSynchronizationContext, so the bridge is deadlock-safe.GitHub integration polish
GitHubBaseUrloverride per tenant → Octokit hits a GHE host instead of public github.com when set. Blank / null → public host.missing X, Y, Zmessage lists every absent config field instead of "PAT and target repo must be set".NotFoundExceptionon the base-branch fetch → wrapped as "branch X not found, set GitHubBaseBranch or create it".AuthorizationException→ wrapped as "PAT rejected, check 'repo' scope and expiry".agentic-sdlc/<ts>→agentos/<ts>so operators cangit push origin --delete agentos/*to clean up.Removals
HydrateRuntimeOverridesAsync(LlmModule no longer implementsIInitializableModule)SaveRuntimeOverridesAsync(Settings UI writes through the setters)RuntimeOverridesExtensions.csdeleted entirelyType of change
Test plan
dotnet buildRelease: 0 warnings, 0 errorsdotnet test: 226 passed, 5 skipped (live-smoke)RuntimeOverrides(Tenant A writes, Tenant B reads null; per-tenant GitHub PAT; key-pool round-trips; blank value deletes; no-store falls back to silent no-op)Decisions
IRuntimeOverridesto async would force a cascade throughPooledChatLlmClient.SendAsync(the key-pool delegate is sync) andGitHubPrService. Trade-off: a.GetAwaiter().GetResult()lives in the impl. ASP.NET Core has noSynchronizationContext, so this is deadlock-safe; the per-(tenant, key)15s cache caps DB pressure.Checklist
dotnet buildpasses locally in Release modedotnet testpasses