feat(reporting): TAM-6862: Tamanu manages reporting/raw db roles#10082
feat(reporting): TAM-6862: Tamanu manages reporting/raw db roles#10082dannash100 wants to merge 30 commits into
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Want higher recall? High effort reviews run extra passes and find more bugs. A team admin can switch effort levels in the Cursor dashboard.
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 35ca9f0. Configure here.
|
🦸 Review Hero (could not post inline comments — showing here instead)
[BES Requirements] ALTER DEFAULT PRIVILEGES IN SCHEMA sets default grants for tables created by CURRENT_USER in future. This runs on every startup and is idempotent, but it only covers tables created by the exact same database user that ran this statement. If FHIR materialisation or other code creates reporting-schema tables while connected as a different role (e.g. after SET ROLE), those tables won't inherit these defaults and will be invisible to tamanu_reporting until the next GRANT SELECT ON ALL TABLES call. The existing GRANT SELECT ON ALL TABLES covers tables that already exist at startup time, so there's a window between table creation and the next restart. Since ALTER DEFAULT PRIVILEGES is per-user, consider explicitly re-running GRANT SELECT ON ALL TABLES on reporting schema tables after materialisation completes, or document this limitation clearly.
[Design & Architecture]
[Integration tests] The existing
[Performance] Every report execution now costs 5 sequential DB round-trips (BEGIN + SET TRANSACTION READ ONLY + SELECT 1 + actual query + COMMIT) instead of 1. The same security guarantee can be achieved at connection time instead of per-query: add |
|
🦸 Review Hero Summary Below consensus threshold (14 unique issues not confirmed by majority)
Local fix prompt (copy to your coding agent)Fix these issues identified on the pull request. One commit per issue fixed.
|
35ca9f0 to
de077a9
Compare
|
🦸 Review Hero Summary Below consensus threshold (9 unique issues not confirmed by majority)
Local fix prompt (copy to your coding agent)Fix these issues identified on the pull request. One commit per issue fixed.
|
de077a9 to
d2645b1
Compare
0350820 to
b9c7168
Compare
b9c7168 to
3e408d5
Compare
|
🦸 Review Hero Summary Below consensus threshold (4 unique issues not confirmed by majority)
Local fix prompt (copy to your coding agent)Fix these issues identified on the pull request. One commit per issue fixed.
|
The reporting and raw database connections no longer need their own login credentials in config. Tamanu creates and manages the tamanu_reporting and tamanu_raw roles itself: on startup it creates each role, grants it read-only access to its schema (plus ALTER DEFAULT PRIVILEGES so tables materialised later stay readable), and connects as it. Database credentials collapse to the single core connection; the core db user needs CREATEROLE to provision the roles. The reporting connections authenticate AS the unprivileged role rather than assuming it from the core user via SET ROLE. SET ROLE is reversible — the authenticated user stays the privileged core user — so report SQL could RESET ROLE, or end the surrounding transaction with COMMIT and continue in autocommit, to write as core. Logging in as the role itself removes that escape: the session has no write grants and is a member of no other role, so report SQL has nothing to escalate to. The role password is derived (HMAC) from the core db password already in config, so there is no separate secret to manage and every replica derives the same value (no rotation race). Removes the per-connection username/password config and the now-redundant createMockReportingSchemaAndRoles test helper. Tests assert the sandbox holds: a COMMIT + RESET ROLE write attempt through the report runner, and a trailing write through the central verifyQuery path, are both rejected with no data written.
3e408d5 to
421f953
Compare
|
🦸 Review Hero Summary Below consensus threshold (9 unique issues not confirmed by majority)
Local fix prompt (copy to your coding agent)Fix these issues identified on the pull request. One commit per issue fixed.
|
…g roles The reporting role password is derived (HMAC) from the core db password. If that is empty the key silently falls back to an empty string, so the derived passwords are no longer unique to the instance. Log a warning to surface this (expected under trust auth, otherwise a misconfiguration).
|
🦸 Review Hero Summary Below consensus threshold (12 unique issues not confirmed by majority)
Local fix prompt (copy to your coding agent)Fix these issues identified on the pull request. One commit per issue fixed.
|
…ation caveat Spell out the security implication in the empty-db.password warning (under password auth the reporting role password derives from an empty key and is inferable from source) and document that rotating db.password needs a coordinated restart so live reporting connections don't fail.
| expect(response.body.error.message).toEqual('permission denied for table reporting_test_table'); | ||
| }); | ||
|
|
||
| it('cannot escape the sandbox by ending the transaction and switching roles', async () => { |
There was a problem hiding this comment.
[Integration tests] suggestion
The sandbox-escape test (COMMIT; RESET ROLE; INSERT) only exercises the reporting connection (reportingDefinition). The raw connection accesses the public schema where all production tables (including patients) live, making it the higher-risk path. A parallel test using rawDefinition and a write targeting raw_test_table would confirm the unprivileged-role guarantee holds for that connection too. The existing rawDefinition fixture is already set up in beforeAll, so adding the case is straightforward.
|
🦸 Review Hero Summary Below consensus threshold (11 unique issues not confirmed by majority)
Local fix prompt (copy to your coding agent)Fix these issues identified on the pull request. One commit per issue fixed.
|
… connection Drop the per-connection reportSchemas.connections.*.pool config — reporting connections inherit the main db connection's pool. initReporting now iterates the known connection roles instead of reading connections from config, so the reportSchemas config shrinks to just `enabled` (and the now-unreachable unknown-connection guard is removed).
…r raw-connection escape
The ALTER ROLE ... PASSWORD statement embeds the derived password as a literal,
which Sequelize would otherwise log — and the reporting roles can read patient
data, so a logged password is a read-access leak. Pass { logging: false } for
that one statement.
Also adds a raw-connection (public schema) variant of the sandbox-escape test,
and removes a couple of redundant test-setup comments.
…ection string The core (tamanu) DB connection can now be given as one connection string via the DATABASE_URL env var or a db.url config field, instead of separate host/name/username/password fields. Sequelize pool settings can ride along as query params (?max=10&min=2&idle=10000). resolveDbConfig() parses it and merges over the structured config; when no URL is set the structured fields are used unchanged, so existing deployments keep working. The reporting raw/reporting connections reuse the resolved host/name but keep their own credentials.
The role passwords were an HMAC of config.db.password, which is empty on trust/ peer servers — so the "secret" was a fixed value derivable from source. Generate a random per-server secret instead, stored in local_system_facts (not synced, the same way the device key is), and derive the role passwords from that. Works the same regardless of the core db auth method. Reading the secret needs a migrated db, so facility opens its reporting connections after migrate via a new initReportingStores() (called from the start subcommands) rather than in init(). Central already inits after migrate.
…ine tooling and pool-headroom check Apply resolveDbConfig() in the remaining spots that build a DB connection or read the pool size directly from config.db: the dbt model generator, the migration-baseline generator, and the startApi connection-pool headroom warning. No-op when no URL is set, so structured configs are unaffected.
…SE_URL env only The connection string is supplied solely via the DATABASE_URL env var; removes the db.url config key, its commented example, and the backwards-compatible wording. Structured config remains the default when DATABASE_URL is unset.
|
Android builds 📱
|
The raw reporting role gets SELECT on all of public, which includes tables holding cleartext credentials that report SQL has no business reading: local_system_facts (device private key + reporting secret), one_time_logins, portal_one_time_tokens, refresh_tokens, and the certificate signers tables. Revoke SELECT on them from the public-scoped (raw) role after granting the schema; to_regclass skips any not present on a given server.
7793b35 to
fec44ae
Compare
🍹
|
…_secrets The raw reporting role is granted SELECT on all of public, so any secret stored in local_system_facts (the device private key, the reporting-role secret) was readable by report SQL. Move those into a new local_system_secrets table that the raw role is never granted, and let it read local_system_facts normally again. - new LocalSystemSecret model + table (mirrors local_system_facts; DO_NOT_SYNC) - excluded from sync-tick + changelog triggers so values never reach logs.changes - DDL + DML migrations (create table, then move the two rows out of facts) - getDeviceKey + the reporting secret now live on LocalSystemSecret; callers (SendStatusToMetaServer, the initDeviceKey upgrade step) updated - raw role revoke list now targets local_system_secrets instead of local_system_facts Server-only: mobile has neither the device key in LSF nor reporting, so no mobile migration is needed.
ebcf522 to
c5af4a4
Compare
…emSecret Adds getSecret/setSecret (AES via the crypto.keyFile) to LocalSystemSecret, for external credentials that shouldn't be plaintext even behind the table grant — e.g. a sync_host password. Moved here from LocalSystemFact (where they were unused). Encryption is opt-in per value: callers that store such a secret pull in the key-file dependency, while the self-generated device key and reporting secret stay on the plain get/set path so the always-on startup path doesn't require crypto.keyFile.
🍹
|
| // behind the table grant. Encryption uses the server key file | ||
| // (config `crypto.keyFile`), so only callers that store such a secret pull in | ||
| // that dependency; the self-generated device key / reporting secret stay on | ||
| // the plain get/set path. |
There was a problem hiding this comment.
Hmm, it might be more elegant to only have encrypted secrets in the table?
There was a problem hiding this comment.
Yeah I agree, ai pushed back on this because it would break deployments because of deviceKey if they dont have crypto keyFile, but if thats sortable thats all good.
I will QA this card and feat(database): TAM-6862: support a single DATABASE_URL connection string
today
There was a problem hiding this comment.
deviceKey in particular is essentially dead
There was a problem hiding this comment.
Ok cool! Should i remove that logic even?
…tional Every local_system_secrets value is now encrypted at rest via the server key file (config crypto.keyFile). get/set/setIfAbsent encrypt and decrypt, and the plaintext getSecret/setSecret accessors are removed so a secret can't be stored in the clear by accident. A legacy plaintext value (e.g. one moved out of local_system_facts before this change) self-heals: it is re-encrypted in place the first time it is read. Encryption now requires crypto.keyFile wherever the server boots, so: - deploys already mount a key and set CRYPTO_KEY_FILE; add the env mapping so node-config resolves crypto.keyFile from it - dev/test default to a committed key file (overridden in deploys)
…//github.com/beyondessential/tamanu into feat/TAM-6862/tamanu-manages-reporting-roles
…gration scripts tsx's CJS require hook doesn't complete the extensionless `@tamanu/database/services/connectionConfig` export, so dbt-generate-model and generate-migration-baseline crashed. Use `await import()` like the adjacent services/database call already does.
…URL validation - Re-throw the reporting-role password set without the original DB error, whose sql field carries the password inline and could leak via a generic error log. - Reject a non-postgres DATABASE_URL up front with a clear message instead of a vague connection error later (prefix check keeps the unix-socket form working). - Correct the stale "cleartext" comment: the reporting secret is stored encrypted.
9c94dbf to
5553f3e
Compare

Changes
Tamanu now creates and manages the reporting/raw database roles itself instead of relying on manually-provisioned login users with credentials in config. Database credentials collapse to the single core ("tamanu") connection, and reporting is always on (the
reportSchemas.enabledconfig flag is removed).ops:
Includes the DATABASE_URL change: feat(database): TAM-6862: support a single DATABASE_URL connection string (#10106) was merged into this branch rather than landing on its own, so this PR also carries the collapse to a single
DATABASE_URLcore connection string (the matching infra change is ops#237).How it works
tamanu_reporting/tamanu_rawroles —CREATE ROLE, password, schema,GRANT SELECT, andALTER DEFAULT PRIVILEGESso tables materialised later stay readable. Idempotent and self-healing.SET ROLE).local_system_secrets(see Secret storage & read isolation below) — not synced. So they're real, instance-unique secrets regardless of the core db auth method (trust, peer or password).username/passwordconfig and the redundantcreateMockReportingSchemaAndRolestest helper.Why log in as the role, not
SET ROLESET ROLEis reversible (the authenticated user stays the privileged core user), so report SQL couldRESET ROLE— or end the surrounding transaction withCOMMITand continue in autocommit — to write as core. Authenticating as the unprivileged role removes the escape: the session has no write grants and is a member of no other role. Covered by tests that runCOMMIT; RESET ROLE; INSERT …through the report runner and a trailing write through the centralverifyQuerypath, both rejected with no data written.Secret storage & read isolation
The
rawrole is grantedSELECTon all ofpublic, so any secret stored there is readable by report SQL. To prevent that:local_system_secretstable (mirrorslocal_system_facts;DO_NOT_SYNC; excluded from the sync-tick + changelog triggers so values never reachlogs.changes). A DDL migration creates it and a DML migration moves the two rows out oflocal_system_facts; dbt source model added withnilmasking onvalue.rawrole is revoked fromlocal_system_secretsand from the credential/token tables (one_time_logins,portal_one_time_tokens,refresh_tokens,signers/signers_historical), so reports can't read the device private key, auth tokens or signing keys. It still readslocal_system_facts(sync ticks etc.) normally.LocalSystemSecretalso offers opt-in encrypted accessors (setSecret/getSecret, AES viaconfig.crypto.keyFile) for external credentials that shouldn't be plaintext even behind the grant (e.g. a future sync_host password). Self-generated internal values (reporting secret, device key) stay on the plain path, so the always-on startup path keeps no key-file dependency.Concurrency
startAllbrings up the api, fhir workers and tasks runner concurrently, each provisioning/opening the reporting connections, so two safeguards keep them from racing on the cluster-global roles:CREATE/ALTER ROLEwould otherwise hit "tuple concurrently updated");openDatabasecaches the in-flight connection promise, so concurrent opens of the shared reporting connection key share one connection instead of throwing.Startup ordering
Reading the per-server secret needs a migrated db, so facility opens its reporting connections after migrate via a new
initReportingStores()(called from the start subcommands) rather than ininit(). Central already inits after migrate.CREATEROLEto provision the roles.tamanu_reporting/tamanu_rawroles needpg_hbaentries permitting them to authenticate (they log in with the derived password). Both are handled by the ops PR for ansible single-servers; k8s getsCREATEROLEvia the cluster initSql.local_system_secrets, so this works on trust/peer/minimal servers too. (The opt-in encrypted accessors needconfig.crypto.keyFile, but nothing in this PR uses them; on k8s the key file is already provisioned.)reportSchemas.connections.*.username/passwordandreportSchemas.enabledfrom config (now removed/ignored); startup re-aligns any pre-existing roles' password/grants.Deferred / considered (kept simple on purpose)
GRANT/CREATEROLEcost): reverted — roles are cluster-global, so a migration running in every parallel test/replica DB raced on the shared role catalogue. Startup provisioning + the advisory lock avoids the race. Revisit if the per-bootGRANT ON ALL TABLESis measured to hurt at scale.PULL_FROM_CENTRAL(central-owned), which fights a self-bootstrapping per-server credential, and secret settings need a PSK that minimal/trust/peer servers don't have.local_system_secrets(self-generated per server, like the device key) sidesteps both with no extra config.Testing:
ReportSchemaRoles(facility) andverifyQueryreporting-path (central) suites pass, including the sandbox-escape cases. Also smoke-tested a real local central boot (offline upgrade → provision → serve): HTTP 200, both roles log in, secret persisted. Thelocal_system_secretsmove + raw revoke were verified live on a k8s deploy (rawcan readlocal_system_factsbut notlocal_system_secrets/token tables); database build + lint clean,dbt-check-todospasses.Auto-Deploy
Options
Tests
Review Hero
.github/review-hero/suppressions.yml. Also runs automatically at the end of any auto-fix run.Remember to...