[AIG-649] Multi-tenancy + workspace isolation (M3-21)#37
Conversation
|
Warning Review limit reached
More reviews will be available in 15 minutes and 26 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds a full multi-tenancy implementation: DB migration and TenantService, request-context resolution with AsyncLocalStorage, tenant-aware memory namespacing and audit stamping, REST routes and CLI tooling, agent/dispatcher tenant propagation, frontend tenant switcher, migration tooling, tests, and documentation. ChangesMulti-Tenancy Foundation
Sequence DiagramsequenceDiagram
participant Client
participant Route
participant withTenantContext
participant TenantService
participant AsyncLocalStorage
Client->>Route: HTTP request (headers / JWT)
Route->>withTenantContext: withTenantContext(req)
withTenantContext->>TenantService: verify tenant/workspace/membership
TenantService-->>withTenantContext: tenant/workspace/context
withTenantContext->>AsyncLocalStorage: store context
AsyncLocalStorage-->>Route: getActiveTenantContext() used by handlers
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
blackms
left a comment
There was a problem hiding this comment.
Verdict: REQUEST CHANGES
The PR lands a clean schema + service + CLI + UI skeleton, with good tests for the isolated TenantService. However it falls substantially short of the AIG-649 acceptance criteria for tenancy isolation: most of the surface area is scaffolding that is never wired up, several routes are unauthenticated, and the migration violates the repo's own rollback convention. None of the critical isolation invariants in the review brief are actually enforced by code today.
I'm fine with the "base-layer + retrofit-deferred" framing documented in docs/MULTITENANT.md, but a few of the findings below are not retrofit work — they are gaps in what this PR itself promises to ship.
Blocking findings
1. Tenant REST routes have no authentication and no RBAC — anyone can list/create/delete tenants.
src/web/routes/tenants.ts registers POST /api/v1/tenants, DELETE /api/v1/tenants/:id, POST /api/v1/tenants/:id/workspaces, etc. and never calls createAuthMiddleware. Compare with src/web/routes/auth.ts:214/234/268, where admin-only routes explicitly wrap with createAuthMiddleware({ required: true, roles: ['admin'] }). As written, an unauthenticated attacker can:
- enumerate every tenant (
GET /api/v1/tenants) - create rogue tenants and workspaces
- delete any tenant by ID via
DELETE /api/v1/tenants/:id, which cascades to all workspaces + memberships
The PR description claims "RBAC default-deny"; current behavior is "no auth at all". At minimum every mutation route needs createAuthMiddleware({ required: true, roles: ['admin'] }), list routes need required: true, and the handlers must call TenantService.assertAccess(...) for tenant-scoped reads. Worth adding a regression test in tests/unit/web/routes/ that asserts 401 without a token and 403 for non-admins.
2. withTenantContext and workspaceNamespace are exported but never imported anywhere.
Grep over src/ confirms zero callers. There is no middleware that resolves a MultitenancyContext from the request, no place where AuthContext.tenantId/workspaceId/tenantRole is ever populated (despite the new fields in src/auth/types.ts), and no domain service that namespaces memory keys by workspaceNamespace(ctx). This means the "feature flag default OFF preserves single-tenant behavior" claim is trivially true — because the flag has no consumers. When a user flips multitenancy.enabled: true, nothing changes: there is no cross-tenant data leakage prevention because there is also no scoping at all.
That's defensible for memory-store / agents / tasks (called out as retrofit work). It is not defensible for the new HTTP and CLI surface this PR ships — those should wire withTenantContext end-to-end so the contract proven by service.test.ts ("tenant isolation invariant") actually applies to a real request path. Right now, even the new /api/v1/tenants/:id/workspaces route lets any caller list workspaces of any tenant regardless of membership.
3. Migration 008 is not reversible — violates migrations/README.md.
The repo README mandates a -- ==================== DOWN ==================== section with DROP statements, and migrations 001/002/003 all follow this. migrations/008_multitenant.sql only has UP. The review brief explicitly calls out "retrofit migration is reversible" as a hard criterion. Please append:
-- ==================== DOWN ====================
DROP INDEX IF EXISTS idx_tenant_users_workspace;
DROP INDEX IF EXISTS idx_tenant_users_tenant;
DROP INDEX IF EXISTS idx_tenant_users_user;
DROP TABLE IF EXISTS tenant_users;
DROP INDEX IF EXISTS idx_workspaces_tenant;
DROP TABLE IF EXISTS workspaces;
DROP INDEX IF EXISTS idx_tenants_slug;
DROP TABLE IF EXISTS tenants;4. tenant_users PRIMARY KEY does not protect against duplicate tenant-wide memberships.
PRIMARY KEY (tenant_id, user_id, workspace_id) — in SQLite, two rows with workspace_id IS NULL for the same (tenant_id, user_id) do not collide because NULL is not equal to NULL inside a PK. Combined with the INSERT OR REPLACE in TenantService.addMembership, a tenant-wide grant for the same user can silently accumulate duplicates, and resolveRole will then double-count. Reproducer would be straightforward. Fix options:
- replace the PK with a composite that uses
COALESCE(workspace_id, ''), expressed via a UNIQUE INDEX on(tenant_id, user_id, COALESCE(workspace_id, '')), or - use two tables (tenant-wide vs workspace-scoped) and union at read time.
A test for "calling addMembership twice with the same (tenantId, userId, undefined) results in exactly one row" would lock this down.
Significant concerns
5. cachedDb singleton in src/web/routes/tenants.ts re-opens the DB outside the existing MemoryManager lifecycle. WebServer.constructor already does memoryManager.getStore().getDatabase(). Opening a second better-sqlite3 handle to the same file from getDb(config) risks lock contention and bypasses any pragmas the memory manager sets. Pass the DB handle into registerTenantRoutes(router, config, db) like the other routes do, and drop the cachedDb / _resetTenantRoutesDb plumbing.
6. ensureSchema() in TenantService silently runs DDL on every instantiation. The routes file constructs new TenantService(getDb(config)) on every request, which means we issue 9 CREATE TABLE IF NOT EXISTS / CREATE INDEX IF NOT EXISTS statements per call. Functionally fine, but wasteful and surprising under load. Either construct the service once at server startup, or gate ensureSchema behind a Database symbol marker so it runs once per handle.
7. Migration numbering jump to 008 is fragile. Documented as intentional, but please file a follow-up Linear ticket pinned to merge-time renumbering so it isn't forgotten. The migrate CLI command also doesn't look at the migrations/ SQL file at all — it just calls migrateSingleToMulti(db). Worth verifying the file gets applied through whatever the project's existing migration runner is (I couldn't find one referenced from WebServer.constructor).
8. TenantContextRequest.jwtClaims is read but never produced. src/web/middleware/auth.ts issues JWTs via AuthService whose payload (src/auth/types.ts JWTPayload) has no tenant_slug / workspace_slug fields. Either extend JWTPayload and the issuance path, or drop the jwtClaims branch from withTenantContext to avoid pretending it works.
Minor / nits
assertValidSlugshould be applied to workspace slugs inmigrateSingleToMulticallers' inputs too — currentlymigrateSingleToMulti({ tenantSlug: 'BadSlug!' })surfaces the same error fromcreateTenant, which is fine, but invalid slugs supplied via--tenant-slugon the CLI surface today as a genericMigration failed: Invalid tenant slugrather than a usage error.TenantSwitcher.tsxpersists selection tolocalStoragebut the inlinefetchJsonhelper never reads it back into request headers. So the dropdown today is purely cosmetic — no request actually carriesX-Tenant-Slug. Either wire the active selection into a globalfetchwrapper, or call this out as TODO.docs/MULTITENANT.md"Application-layer guardrail" snippet usesauthContext.tenantId— please add an inline comment referencing the new field added insrc/auth/types.tsso reviewers can trace it.idx_tenant_users_workspacewill index a lot of NULLs given the tenant-wide membership pattern; not a correctness issue, just noting.
Test coverage gaps
- Unit tests cover
TenantServicein isolation thoroughly — good. - Integration test covers the migration tool — good.
- Missing: any test of
withTenantContext(header to context resolution, JWT fallback, default fallback, denied access when user lacks membership). - Missing: any HTTP-level test of the new routes (auth, RBAC, cross-tenant access denial).
- Missing: a regression asserting the "feature flag OFF = behavior identical to main" invariant. A simple integration test that boots the server with
multitenancy.enabled: falseand confirms a baseline request shape is unchanged would catch silent regressions when the retrofit lands.
Suggested path to merge
- Add
DOWNsection to migration 008 (blocking). - Wrap all tenant routes in
createAuthMiddleware({ required: true })and admin role gating for mutations (blocking). - Fix the
tenant_usersNULL-in-PK issue (blocking). - Land the simplest possible
withTenantContextHTTP middleware that populatesAuthContext.tenantId/workspaceId/tenantRolefor the new routes only, and have those routes callTenantService.assertAccess(blocking — without this, the PR doesn't materially deliver "workspace isolation"). - Share the DB handle with the rest of the web server (significant).
- Add the missing HTTP-level / context tests (significant).
- Retrofit checklist in
docs/MULTITENANT.mdis fine as a follow-up.
Happy to re-review once the blocking items are addressed.
…s, migration DOWN, fix PK NULL bug
… safety Replace module-scope ref with AsyncLocalStorage so the active tenant context survives await boundaries WITHOUT leaking across interleaved concurrent async handlers in the same process. The prior implementation had a latent cross-tenant data-leak: while request A was awaiting I/O inside runWithTenantContext(ctxA, ...), request B could enter runWithTenantContext(ctxB, ...) and overwrite the module-scope ref; on resumption, A's downstream calls (MemoryManager.search, SmartDispatcher.dispatch, audit logs) would observe ctxB. - runWithTenantContext now accepts () => T | Promise<T> and delegates to AsyncLocalStorage.run. Undefined ctx remains a pass-through to preserve single-tenant no-op semantics. - getActiveTenantContext reads from the async store. - _resetActiveTenantContext is now a documented no-op (storage scopes itself to the run() callback; nothing to reset). - Add concurrency regression test: two interleaved async runs with awaits assert each only ever observes its own tenant id.
…i-tenancy # Conflicts: # src/agents/spawner.ts # src/cli/commands/index.ts # src/cli/index.ts # src/types.ts # src/utils/config.ts # src/web/server.ts
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
docs/MULTITENANT.md (1)
89-98: ⚡ Quick winClarify the
ctxparameter and memory migration implications.The code example shows
workspaceNamespace(ctx)but doesn't explain whatctxis, its type (AuthContext? MultitenancyContext?), or how to obtain it in different contexts (e.g., request handlers, background jobs, CLI commands). Developers retrofitting existing code may be uncertain how to properly construct or pass this context.Additionally, the note that "the migration tool does not rewrite them" (line 97) has important data isolation implications that should be stated explicitly:
- Do pre-migration memory entries remain accessible after migration?
- Are they shared across tenants or isolated in some way?
- Is manual migration or cleanup required?
Without clarity here, implementations may inadvertently create data isolation vulnerabilities.
📝 Suggested improvements
After line 93, add:
The `ctx` parameter should be a `MultitenancyContext` object (typically extracted from `AuthContext`): ```ts import { getActiveTenantContext } from '../multitenancy/index.js'; // In request handlers const ctx = getActiveTenantContext(); // reads from AsyncLocalStorage // In background jobs or CLI, set context explicitly runWithTenantContext({ tenantId, workspaceId }, async () => { const ctx = getActiveTenantContext(); const ns = workspaceNamespace(ctx); // ... memory operations });After line 98, add: ```markdown **Pre-migration data handling:** Memory entries created before migration remain in their original (unscoped) namespace and are NOT automatically migrated. After enabling multi-tenancy: - These entries become inaccessible via tenant-scoped queries - Manual migration or cleanup is required if you need to preserve this data - See "Retrofit checklist" below for the data backfill plan🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/MULTITENANT.md` around lines 89 - 98, The example using workspaceNamespace(ctx) omits what ctx is and leaves data-migration implications ambiguous; update the docs to state that ctx is a MultitenancyContext (often derived from AuthContext via getActiveTenantContext) and show how to obtain it in different runtimes (e.g., read from AsyncLocalStorage in request handlers or wrap background/CLI work with runWithTenantContext to set tenantId/workspaceId), and explicitly document that pre-migration memory entries remain in their original unscoped namespace, become inaccessible to tenant-scoped queries after migration, and therefore require manual migration or cleanup (refer to the Retrofit checklist for backfill steps).src/multitenancy/index.ts (1)
107-109: 💤 Low valueMissing workspace resolution silently falls back to tenant-only context.
When
workspaceSlugis provided (via header, JWT, or config default) but the workspace doesn't exist in the database, the function continues withworkspaceId: undefinedrather than throwing. This could lead to unexpected behavior where a request targeting a specific workspace silently operates at tenant scope instead.Consider throwing when an explicitly requested workspace is not found to surface configuration or migration issues early.
Suggested change
const workspace = workspaceSlug ? service.getWorkspaceBySlug(tenant.id, workspaceSlug) : undefined; + + if (workspaceSlug && !workspace) { + throw new Error( + `Workspace "${workspaceSlug}" not found in tenant "${tenantSlug}"`, + ); + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/multitenancy/index.ts` around lines 107 - 109, When workspaceSlug is present but service.getWorkspaceBySlug(tenant.id, workspaceSlug) returns undefined, do not silently continue with workspaceId undefined; instead detect that case and throw a clear error. Update the code around the workspace variable (where workspaceSlug and service.getWorkspaceBySlug are used in src/multitenancy/index.ts) to check if workspaceSlug && !workspace and throw a descriptive error (e.g., "Workspace not found for slug: {workspaceSlug}") or a suitable NotFound error so callers cannot silently fall back to tenant-only context.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/web/routes/tenants.ts`:
- Around line 281-303: withTenantContext may produce a tenant context from
headers/JWT that doesn't match the route tenant id (id); before calling
runWithTenantContext you must enforce consistency by checking the resolved
context's tenant identifier against the path tenant (id) and if they differ,
discard or override the resolved context. Update the block around
withTenantContext/runWithTenantContext to compare resolved.tenantId (or
resolved.id/the appropriate tenant identifier on the resolved object) to id and
set resolved = undefined (or construct a context for the path-tenant) when they
don't match so downstream calls like service.listWorkspaces(id) always run under
the path tenant.
In `@web/src/pages/TenantSwitcher.tsx`:
- Around line 39-45: The fetchJson<T> helper currently returns the raw JSON
which in our tenant routes is wrapped as { success, data }, so callers (e.g.,
the TenantSwitcher code that expects tenants/workspaces) receive the envelope
instead of the actual payload; update fetchJson to detect and unwrap that
envelope: after parsing res.json(), if the object has a "data" property (and
optionally "success"), return obj.data as T, otherwise return the parsed object
as T so existing non-wrapped endpoints keep working; ensure the generic
signature of fetchJson<T> remains and that callers (the code that reads
tenants/workspaces) will now receive the unwrapped payload.
- Around line 73-76: Persisted tenantSlug may point to a removed tenant and
causes early return; after fetching tenants (data.tenants) reconcile the
current/persisted tenantSlug against the fetched list and if it is missing and
data.tenants.length > 0 call setTenantSlug(data.tenants[0].slug) and
persistActive(data.tenants[0].slug) to fall back to the first available tenant;
update the early-return logic that checks tenantSlug (in TenantSwitcher) so it
validates presence in data.tenants rather than only checking for a truthy slug.
---
Nitpick comments:
In `@docs/MULTITENANT.md`:
- Around line 89-98: The example using workspaceNamespace(ctx) omits what ctx is
and leaves data-migration implications ambiguous; update the docs to state that
ctx is a MultitenancyContext (often derived from AuthContext via
getActiveTenantContext) and show how to obtain it in different runtimes (e.g.,
read from AsyncLocalStorage in request handlers or wrap background/CLI work with
runWithTenantContext to set tenantId/workspaceId), and explicitly document that
pre-migration memory entries remain in their original unscoped namespace, become
inaccessible to tenant-scoped queries after migration, and therefore require
manual migration or cleanup (refer to the Retrofit checklist for backfill
steps).
In `@src/multitenancy/index.ts`:
- Around line 107-109: When workspaceSlug is present but
service.getWorkspaceBySlug(tenant.id, workspaceSlug) returns undefined, do not
silently continue with workspaceId undefined; instead detect that case and throw
a clear error. Update the code around the workspace variable (where
workspaceSlug and service.getWorkspaceBySlug are used in
src/multitenancy/index.ts) to check if workspaceSlug && !workspace and throw a
descriptive error (e.g., "Workspace not found for slug: {workspaceSlug}") or a
suitable NotFound error so callers cannot silently fall back to tenant-only
context.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 018f2ae3-e421-4ba2-85b2-3d10e4c5487f
📒 Files selected for processing (22)
docs/MULTITENANT.mdmigrations/008_multitenant.sqlsrc/agents/spawner.tssrc/auth/types.tssrc/cli/commands/index.tssrc/cli/commands/tenant.tssrc/cli/index.tssrc/memory/index.tssrc/memory/sqlite-store.tssrc/multitenancy/index.tssrc/multitenancy/service.tssrc/tasks/smart-dispatcher.tssrc/types.tssrc/utils/config.tssrc/web/routes/index.tssrc/web/routes/tenants.tssrc/web/server.tstests/integration/multitenancy-migration.test.tstests/unit/memory-agent-scoping.test.tstests/unit/multitenancy/service.test.tstests/unit/multitenancy/tenant-routes.test.tsweb/src/pages/TenantSwitcher.tsx
blackms
left a comment
There was a problem hiding this comment.
Reviewed current head after addressing the CodeRabbit findings and merging main. Thread-aware review check shows 0 unresolved active threads; CI is green (build, lint, typecheck, tests) and CodeRabbit is green.
Implements AIG-649.
Milestone: M3 - Enterprise
Estimate: 8 pts
Summary
See commit message for full implementation details (schema, API, CLI, tests, docs).
Notes from PM agent
MIGRATION: 008. FEATURE FLAG: implemented as default OFF (decision deferred to human: v2.0 breaking vs v1.x feature-flag). Retrofit strategy in docs/MULTITENANT.md.
Auto-opened by aistack PM agent on 2026-05-28 10:22. Review with /review or human dispatch.
Summary by CodeRabbit