feat(auth): Epic D — OIDC web signup + multi-tenant Keycloak by hoangsnowy · Pull Request #17 · hoangsnowy/AgentOs

hoangsnowy · 2026-05-30T04:40:55Z

Change description

Epic D — public self-service signup against multi-tenant Keycloak, with email verification, audit trail, and the four security red flags from the plan closed off.

8 commits on the branch since main (oldest first):

5226f57 Web OIDC client + public sign-up + drop operator JWT (already on the branch at plan time)
6f68c08 Tighten signup validation rules (Task A)
e4e8cbb MailHog + email verify on signup (Task B)
a139921 3-mode signup + saga rollback (Task C)
e41f6fa Kill ROPC, secrets out of code, tighten cookies (Task D)
04e0339 Tenant admin members + settings pages (Task E)
9b4b8df Tenant audit trail + /admin/audit page (Task F)
05f8da4 Cover Epic D code paths (Task G)

Type of change

New feature (new agent / endpoint / capability)
Infrastructure / CI configuration

Security checklist

Five red flags from the plan, all closed:

#	Red flag	Closed by
1	Signups skip email verification	`e4e8cbb` — realm `verifyEmail:true` + MailHog + Kc client triggers `VERIFY_EMAIL` action
2	`directAccessGrantsEnabled: true` on the code-flow client allows ROPC	`e41f6fa` — flipped to `false` on `agentic-web`
3	`RequireHttpsMetadata` defaults to `false` when the setting is missing	`e41f6fa` — default `true`, dev overrides via `appsettings.Development.json`
4	Cookie missing `Secure` policy	`e41f6fa` — `Always` outside Dev, `SameAsRequest` in Dev
5	`agentic-web-dev-secret` / `admin/admin` literals in `.cs` source	`e41f6fa` — Aspire `AddParameter` + AppHost `appsettings.json`; `grep --include=*.cs` returns nothing

Decisions (autopilot defaults)

Invitations are stateless DataProtection tokens, not a DB table. Trade-off: cannot be revoked before TTL. Acceptable for v1; keep TTLs short. Future work: add an invitations table if revocation becomes a real need.
Saga ordering: Keycloak first, then registry. A registry write failure rolls back the Keycloak user; a Keycloak failure throws before any registry row is touched. Inverse of the pre-Epic-D order, where a Kc failure could orphan a tenant row.
registrationAllowed: true stays in the realm. The plan suggested flipping it once the custom signup proved out, but we never expose the Keycloak registration page — our /signup covers it. Leaving it true keeps Keycloak's own forgot-password / verify-email pages functional.
login.failed audit action is reserved, not wired. Hooking cookie / JWT failure events lands in a follow-up — the table column + action constant are in place so the writer is a one-liner later.
Members listing is client-side filtered, capped at 200 users. Sufficient for dev realms; larger realms need a server-side q=tenant:<id> rewrite.
/admin/* pages use page-level [Authorize(Roles=\"admin\")] and the API endpoints add a runtime ITenantContext.TenantId == route tenantId check so an Admin in tenant A cannot list tenant B.

Checklist

dotnet build passes locally in Release mode
dotnet test passes — 218 unit tests green, 5 live-smoke skipped (need API keys)
No secrets committed (dev placeholders are in gitignored appsettings.Development.json or Aspire Parameters with dev defaults in the AppHost's own appsettings.json)
README / docs updated — Task I (docs) was the optional stretch and isn't in this PR

Test plan

F5 the AppHost: Postgres + Keycloak + MailHog + API + Web come up; http://localhost:5180 lands on the dashboard
/signup with no slug → auto-create mode picks a slug like alice-3ab2f1, user lands on /signup/verify-email
/signup with slug acme → slug mode creates the workspace, user becomes admin
MailHog (http://localhost:8025) catches the verify email; clicking the link activates the user and login succeeds
Sign in as the admin → /admin/members lists the user, /admin/settings shows the workspace, /admin/audit shows signup.completed + tenant.created rows
/admin/members → mint invite URL → open in incognito → join existing tenant; audit shows invitation.minted

Extract SignupValidation static helper consumed by both Blazor Signup page and /tenants/register endpoint so form-side and server-side rules cannot drift. Slug regex per plan: lowercase, internal hyphens only, 1-32 chars. Password: ≥12 chars + upper/lower/digit/symbol. Email via MailAddress round-trip. 16 new test cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Wire MailHog into Aspire AppHost (SMTP 1025, UI 8025) and point the realm smtpServer block at it. Set realm verifyEmail:true so unverified users cannot log in. KeycloakAdminClient: when sendVerifyEmail flips on, leave emailVerified=false and trigger Keycloak's execute-actions- email; for self-signup (password supplied) the action list drops UPDATE_PASSWORD and keeps only VERIFY_EMAIL. Signup page now passes sendVerifyEmail:true and lands on /signup/verify-email with a hint pointing dev users at MailHog. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ITenantSignupService dispatches invite / slug / auto-create by request shape. Keycloak-first ordering so a registry-row failure can roll back the Keycloak user (new DeleteUserAsync on the admin client). Invitations are stateless DataProtection time-limited tokens — no DB table — issued by /tenants/{id}/invitations (admin) and decoded by /tenants/invitations/preview (public). Trade-off: tokens cannot be revoked before TTL; keep TTL short for high-trust environments. Signup page now accepts ?invite=<token>, pre-fills the email from the invitation, and the Workspace ID field becomes optional (blank → auto-generated slug). 7 new service tests + saga rollback verified via NSubstitute. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Closes the four red flags called out in the Epic D plan: 1. agentic-web client: directAccessGrantsEnabled flipped to false so Keycloak no longer accepts ROPC on the code-flow client. 2. RequireHttpsMetadata defaults to true; only flipped off in appsettings.Development.json. Same change in JwtAuthExtensions. 3. Cookie SecurePolicy = Always in non-Development, SameAsRequest in Development (Aspire pins http://localhost:5180 there). 4. agentic-web-dev-secret and admin/admin no longer hardcoded in any .cs file. AppHost surfaces them as Aspire Parameters with dev defaults in appsettings.json; Web Program throws at startup if ClientSecret is missing outside Development. Standalone `dotnet run --project src/AgentOs.Web` now expects user-secrets to supply the dev secret (Aspire AppHost injects it automatically). grep agentic-web-dev-secret\|admin/admin --include=*.cs → no matches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds GET /tenants/{id}/members guarded by Admin policy AND a runtime ITenantContext match (so an Admin in tenant A cannot list tenant B). KeycloakAdminClient gains ListUsersByTenantAsync — paged GET + client- side filter on the tenant attribute, plus a follow-up role-mappings fetch per user. Bounded by max=200 for v1; larger realms need a server-side q-search rewrite. Two new Razor pages under /admin: Members.razor lists the tenant roster and mints DataProtection invitation URLs (admin pastes them into chat/email); Settings.razor shows tenant id + name + created-at read-only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

New IAuditLog (EF + Null impls) writes append-only rows to tenants.audit_events for signup-completed, tenant-created, member- invited, and invitation-minted actions. Writes are best-effort — audit failure logs a warning but never breaks the surrounding flow. Reads are tenant-scoped at the repository, then re-checked at the endpoint against ITenantContext (Admin in tenant A cannot peek at tenant B's trail). GET /tenants/{id}/audit + /admin/audit Razor page show newest-first rows. login.failed action is reserved for a follow-up — wiring it through cookie / JWT events lands separately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

KeycloakAdminClient: 4 tests for DeleteUserAsync (no-content, 404 swallowed, 500 throws) and ListUsersByTenantAsync (filter + role-mapping fetch). HttpTenantContext: 5 tests for the claims projection — missing-claim default, normal user, role flattening, IsAdmin off-by-role, and the multi-value tenant claim edge case. Epic D delta now sits at ~32 new tests (validation 16 + signup service 7 + admin REST 4 + tenant context 5). Full unit suite stays green (218 passed, 5 skipped live-smoke). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The doc was a one-shot plan for the overnight Epic D autopilot run. It's not project documentation, it doesn't reflect current direction after Epic D landed (PR #17), and it shouldn't have made it into the E3 commit — slipped past the staging filter that excluded it on E1 and E2. Removing per the product-focus rule: drop thesis / phase / roadmap docs, keep only AI-essential docs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…idence (#21) * feat(tools): add ITool/IToolRegistry contracts + InMemoryToolRegistry (E1) Epic E1 lays the contract surface every later step composes on. Domain gets six new types: ITool (callable capability), IToolRegistry (register/resolve/list/unregister), ToolDefinition (name, description, JSON input schema), ToolInvocationRequest/Result (carry CallId so the orchestrator can match tool_use -> tool_result blocks), and ToolException (distinct from LlmException so the orchestrator can tell "model misused a tool" apart from "model call itself failed"). New AgentOs.Modules.Tools module ships the default InMemoryToolRegistry (ConcurrentDictionary-backed so MCP probes and the orchestrator can read/mutate concurrently) and a ToolsModule that discovers ITool DI registrations at startup and pumps them into the registry. Slnx + Tests csproj reference the new project. 22 new tests cover registry register/resolve/list/unregister, duplicate detection, validation, and result factories. Full suite 248 pass, 5 skipped (live-LLM smoke). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(llm,tools): wire ITool into the LLM gateway + BuildVerifierTool (E2) LlmRequest grows an optional `Tools` field — a flat list of tool names the agent is allowed to invoke for this call. PooledChatLlmClient (the prod path for Claude + AzureOpenAI key pools) now resolves each name through IToolRegistry, wraps the chat client with FunctionInvokingChatClient, and threads the resolved tools into ChatOptions.Tools. The whole tool-call loop runs inside the gateway so ILlmClient.SendAsync still returns one LlmResponse (the final text turn) — agents don't need to learn a new contract. AIToolFunction (Modules.Llm, internal) adapts a Domain.Tools.ITool into a Microsoft.Extensions.AI.AIFunction: the schema string round-trips through JsonElement so it's exposed verbatim to the model, and the InvokeCoreAsync override serializes the model-emitted arguments back into ITool's stringly-typed Input, runs ITool.InvokeAsync, and returns the textual Output (or the error message) for the next LLM turn. BuildVerifier gains a primitive VerifyFilesAsync(IEnumerable<...>, CancellationToken) so callers without a full PipelineResult — like the new build_verifier tool — can pass a flat file list. The legacy VerifyAsync(PipelineResult, ...) delegates to it. BuildVerifierTool (Modules.Integration.Tools) wraps that primitive with a tight JSON contract — input `{files:[{path,content}]}`, output `{success, exit_code, output, elapsed_ms}` — and IntegrationModule registers it via AddTool<>() so ToolsModule.InitializeAsync auto-discovers it at startup. 14 new tests: AIToolFunction surface + invocation + error path, PooledChatLlmClient registers/resolves/drops tools across {no tools, tool found, tool missing, no registry}, BuildVerifierTool JSON validation + delegation + failure surfacing. Full suite 262 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(mcp): add MCP client module that registers remote tools (E3) New AgentOs.Modules.Mcp consumes external MCP servers (GitHub MCP, filesystem MCP, custom servers) and surfaces their tools into the existing IToolRegistry under the prefixed name "{server}.{tool}". Once registered, an LLM agent can invoke a remote MCP tool through the same LlmRequest.Tools path E2 wired for local ITools — no agent-side change. McpOptions binds the per-server config (stdio command+args+env or HTTP/SSE URL, enabled flag, call timeout). McpClientHost holds the live McpClient connections, ListToolsAsync's each one at startup, and wraps every returned McpClientTool into McpToolAdapter (ITool) with the remote schema and description carried over verbatim. A failed server connection is logged and skipped — the rest of the host still boots. DisposeAsync unregisters every name and closes every client. McpToolAdapter takes a McpToolInvoker delegate instead of an McpClientTool directly so tests can stub MCP without a live server. McpClientHost owns the delegate wiring and applies the per-tool-call timeout (default 60s, configurable via Mcp:CallTimeoutSeconds). 10 new tests: McpToolAdapter input parsing + delegation + error surfacing + cancellation, McpOptions config binding (defaults, two servers, args/env round-trip, enabled flag). McpClientHost itself is not yet exercised by tests — a live-MCP smoke test will land alongside the sample MCP-GitHub integration in E6. Full suite 272 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: drop Epic-D plan doc from the repo The doc was a one-shot plan for the overnight Epic D autopilot run. It's not project documentation, it doesn't reflect current direction after Epic D landed (PR #17), and it shouldn't have made it into the E3 commit — slipped past the staging filter that excluded it on E1 and E2. Removing per the product-focus rule: drop thesis / phase / roadmap docs, keep only AI-essential docs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(api,mcp): expose AgentOs pipeline as MCP server (E4) The API host now serves MCP at /mcp via the official ModelContextProtocol.AspNetCore Streamable HTTP transport, with three tools surfaced to remote MCP clients (Claude Desktop, VS Code, custom orchestrators): - run_pipeline(description, max_iterations?, locale?) -> PipelineResult full 5-agent run on a single user story, collected to the final result - list_runs(limit?) -> PipelineRunSummary[] (paged, capped 1..100) - get_run(runId) -> PipelineRunRecord (artifacts + per-call metrics) PipelineMcpTools is a [McpServerToolType] static class; the IPipelineClient / IPipelineRunRepository dependencies are injected per call by the MCP SDK's tool factory, so existing ITenantContext / auth still applies (JWT bearer middleware runs before /mcp routes). The Api host now also loads ToolsModule + McpModule alongside the others — Integration's BuildVerifierTool needs IToolRegistry to be registered, and McpModule's startup hook wires upstream MCP servers into the same registry. Together with E1-E3 this closes the tools-mesh loop: AgentOs is both an MCP client (consumes external tools) and an MCP server (exposes its own pipeline tools). No new tests this step — a meaningful MCP smoke test wants a live TestServer + an in-process McpClient round-trip and belongs with the sample app in E6. Build clean, full suite 272 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(tools): add IToolPolicy gate + IToolInvocationLog evidence sink (E5) Every tool call the LLM gateway routes through AIToolFunction now passes through two new seams: - IToolPolicy.EvaluateAsync(request) — pre-invocation gate. Denied calls short-circuit; the policy's reason string is fed back to the LLM as the tool_result so the model can react (retry with different arguments, give up, ask the user). Default impl PermissiveToolPolicy allows everything — production wires a tenant-aware impl that reads the allowlist + cost cap from AppConfig. - IToolInvocationLog.AppendAsync(evidence) — best-effort evidence sink. One entry per invocation (allowed-and-succeeded, allowed-and-errored, or denied) capturing the call id, tenant, run id, tool name, input JSON, output, error flag, and start/finish timestamps. The audit trail covers refusals as well as successful runs. Log failures are swallowed — a downstream sink outage must never break a tool call. Default impl InMemoryToolInvocationLog is a per-tenant ring buffer bounded at 500 entries so a runaway loop can't OOM the host. PooledChatLlmClient resolves both interfaces optionally from DI when constructing AIToolFunctions, so existing tests + hosts that don't register the new services keep working with null behaviour. EF-backed persistence (tool_invocations table in the tools schema) + the policy that loads tenant allowlists from AppConfig are deferred to E5.next — this commit ships the abstractions and the in-memory defaults so the rest of the platform can already start handing evidence data structures around. 9 new tests: InMemoryToolInvocationLog (recency order, per-tenant isolation, cap enforcement, limit), AIToolFunction integration (policy deny short-circuit, allow path, no-policy no-log fallback, log-failure absorption). Full suite 281 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(readme): document the Tools + MCP modules and the /mcp endpoint (E6) Adds two new module entries (AgentOs.Modules.Tools, AgentOs.Modules.Mcp), updates the Integration line so the BuildVerifierTool registration is visible, extends the cross-module dep note with Integration -> Tools, and adds a Tools & MCP subsection explaining the LlmRequest.Tools loop, the policy + evidence seams, and the fact that AgentOs is now both an MCP client and an MCP server. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

hoangsnowy and others added 7 commits May 30, 2026 11:20

hoangsnowy marked this pull request as ready for review May 30, 2026 04:42

hoangsnowy merged commit 4fd171f into main May 30, 2026
1 check passed

hoangsnowy deleted the feat/web-oidc-signup branch May 30, 2026 05:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(auth): Epic D — OIDC web signup + multi-tenant Keycloak#17

feat(auth): Epic D — OIDC web signup + multi-tenant Keycloak#17
hoangsnowy merged 7 commits into
mainfrom
feat/web-oidc-signup

hoangsnowy commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hoangsnowy commented May 30, 2026

Change description

Type of change

Security checklist

Decisions (autopilot defaults)

Checklist

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant