feat(sync): TAM-6863: facility sync-host setup workflow#10124
feat(sync): TAM-6863: facility sync-host setup workflow#10124dannash100 wants to merge 31 commits into
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit c3e6f9d. Configure here.
| const cancelTasks = startScheduledTasks(context); | ||
| const cancelSyncTask = startScheduledTasks(context, syncTaskClass); | ||
| // SyncTask needs a sync manager, which only exists when the server is configured. | ||
| const cancelSyncTask = isConfigured ? startScheduledTasks(context, [SyncTask]) : () => {}; |
There was a problem hiding this comment.
Sync stays off after wizard
Medium Severity
When startAll boots unconfigured, it skips creating syncManager and SyncTask based on a one-time isConfigured flag. The setup wizard only refreshes in-memory serverConfig; it does not create those components. After wizard + page reload, the process still has no outbound sync until a full restart.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit c3e6f9d. Configure here.
| const configuredFacilities = getDeclaredFacilityIds(); | ||
| // A wizard-configured server has no external (env/config) declaration to | ||
| // drift-check against — the recorded fact is the source of truth. | ||
| if (!configuredFacilities?.length) return; |
There was a problem hiding this comment.
Integrity check without sync host
Medium Severity
If SYNC_FACILITY_IDS or legacy config declares facility IDs but no sync host/credentials are available, ensureFacilityMatches still calls CentralServerConnection and connect(). With an empty default sync.host, startup throws before the server can serve the setup wizard.
Reviewed by Cursor Bugbot for commit c3e6f9d. Configure here.
|
🦸 Review Hero Summary Below consensus threshold (6 unique issues not confirmed by majority)
Local fix prompt (copy to your coding agent)Fix these issues identified on the pull request. One commit per issue fixed.
|
| import { useQuery } from '@tanstack/react-query'; | ||
| import { useApi } from '../useApi'; | ||
|
|
||
| // Unauthenticated check used pre-login to decide whether to show the first-run |
There was a problem hiding this comment.
Instead of having a dedicated endpoint and a new request, we could have the existing status endpoint return a special value if setup is required.
… resolver Foundation for moving the facility sync target out of static config into local system facts: - FACT_SYNC_EMAIL / FACT_SYNC_PASSWORD constants (syncHost + facilityIds facts already existed) - parseSyncUrl: parse a SYNC_URL connection string (creds embedded, à la DATABASE_URL) into host/email/password - initSyncConfig: resolve sync host, creds and facility ids once at boot with precedence env (SYNC_URL/SYNC_FACILITY_IDS) -> facts -> legacy config, seeding facts from legacy config for back-compat Not yet wired into the boot sequence; consumers still read config.
…t boot Add serverConfig.js: a boot-time resolver (mirroring initDeviceId) that resolves the sync connection (host/email/password) and facility identity (facilityIds) once, with precedence env (SYNC_URL / SYNC_FACILITY_IDS) > local system fact > legacy config. It writes nothing — facts are written out of band by the setup wizard (and FACT_FACILITY_IDS by the existing facility-match integrity check). Runtime now reads getSyncConfig() / getServerFacilityIds() / isServerConfigured() instead of config.sync.* / selectFacilityIds(config): CentralServerConnection, ApplicationContext settings, auth login, integrity drift-check, mSupply tasks, shell, and the startApi/startSync/startAll/startTasks boot paths. The boot paths now skip timesync/central/sync setup when unconfigured so a fresh server still boots to serve the setup wizard. The default sync.host config is emptied (kept as a deprecated one-version fallback) so a fresh install is unconfigured and shows the wizard.
Public (unauthenticated, pre-sync there are no users) facility endpoints:
- GET /api/public/setup/status -> { configured } — gates the web setup wizard
- POST /api/public/setup/sync — validates the submitted SYNC_URL + facilityIds by
doing an outbound login probe against the host, then records them to local
system facts and refreshes the in-memory holder
Security: the sync endpoint only works while UNCONFIGURED (409 once configured,
so a live server can't be repointed by an unauthenticated call); https-only
(http allowed in non-production); rate-limited; generic validation errors and no
syncUrl/password in logs.
TODO: endpoint integration tests (need a central login mock + controlled
resolver state).
When the facility web app loads against an unconfigured server, show a setup wizard instead of the login screen: - useSetupStatusQuery: unauthenticated GET /public/setup/status; treats 404 (central) / errors as configured so the wizard never blocks login - SetupWizardView: single SYNC_URL field + a single/multiple (omniserver) facility-id selector (multiple uses add/remove inputs), client-side trim + dedupe; POSTs to /public/setup/sync and reloads into login on success - App.jsx renders the wizard after the server-alive check, before login
…uperadmin gate
Two design corrections from review:
- The wizard (and POST /public/setup/sync) now take separate host + username +
password fields rather than a single combined SYNC_URL (that connection-string
form stays only for the SYNC_URL env var, for k8 automation).
- The setup endpoint is now gated on central super-admin credentials: it logs in
with the supplied creds and requires ability.can('manage', 'all') before
writing facts. This both authorises setup (a fresh server has no local user to
authenticate, so we prove super-admin on central — which is the same identity
as the facility super admin) and resolves the bootstrap problem.
Next pass: have central provision a dedicated sync user and return its
credentials once the admin has authenticated, instead of storing the admin's.
…al labelling - equal-width facility-mode radios with a gap (fullWidth) - remove button is now a minus icon button; facility-id inputs stay equal width via a fixed remove-slot - smaller heading so it clears the Tamanu logo - relabel the credential fields as central Administrator username/password (they authorise setup as a central super-admin; see manage:all gate) - note: sync password should move to the encrypted local_system_secrets table once that lands, and is superseded by central-provisioned sync creds later
Add POST /api/admin/syncCredentials (gated manage:all): find-or-rotates a
dedicated per-server sync user ("System: <facilities> sync", role admin,
generated password) and returns its credentials. Mirrors the sync users the
provision subcommand creates.
The facility setup endpoint now, after authenticating the admin and checking
manage:all, calls this to provision sync credentials and stores those — rather
than persisting the admin's own credentials.
Clears the sync host/credentials/facilities facts so the server returns to the unconfigured state and can be set up again. Host-level access only; restart to take effect.
- ensureHostMatches now compares the externally declared host (env/config) via getDeclaredHost and skips when there's none, instead of constructing a CentralServerConnection (which throws when unconfigured) — so a fresh server boots cleanly to serve the setup wizard. - resetSyncConfig hard-deletes the facts (force: true): local_system_facts is paranoid, and a soft-deleted row keeps occupying the unique key constraint, so a later set() of the same key would collide.
Now that local_system_secrets exists, hold the sync password as an encrypted secret there (LocalSystemSecret.setSecret/getSecret) rather than plaintext in local_system_facts. Host, username and facility ids stay as plain facts. The resolver reads the secret separately and tolerantly, so a missing/unconfigured crypto key file doesn't blank out the non-secret facts. resetSyncConfig clears the secret too.
…wport AuthFlowView vertically centres its content, so a tall wizard (omniserver / many facilities) clipped off the top and bottom. Bound the form to the viewport height and scroll internally instead.
Pad the scroll area's top so a full-height wizard doesn't render its heading under the fixed top-left Tamanu logo.
Use margin-top (not padding) so the scroll viewport starts below the fixed logo; content then scrolls within that frame instead of sliding up behind it.
Stop reusing AuthFlowView for the wizard: its vertically-centred body and fixed top-left logo don't suit a tall scrolling form (content scrolled behind the logo, uneven top/bottom gaps). The wizard now has its own layout — logo in normal flow atop a scrollable form column, splash image alongside — so the form scrolls cleanly beneath the logo regardless of height.
…ty list A server with one facility id filled in is a single-facility server; no need for a mode toggle. Always render the facility-id list with add/remove (the first row can't be removed).
…L field yup's .url() rejects hosts without a TLD (e.g. localhost); validate with the URL parser instead, matching the server-side check.
- syncCredentials: await async setPassword (was persisting unhashed); derive the sync user email from a hash of the facility ids (fixed length, collision-free) and cap facility id count - integrity: skip the facility-match central verify when the server has no host (was constructing CentralServerConnection and crashing unconfigured boot) - serverConfig: guard configHost against malformed URLs; use ?? not || - setup: gate POST /public/setup/sync to trusted source networks (loopback / RFC1918 / link-local / Tailscale, v4+v6 via ipaddr.js) per review — restrict the request source rather than the target host; wrap fact/secret writes in a transaction so a mid-write failure can't half-configure the server - fold setup status into the public ping response (setupRequired) instead of a dedicated endpoint + extra request; drop useSetupStatusQuery - fix serverConfig test wiring (mock LocalSystemSecret; password under the secret)
…d facts Emptying the default sync.host broke two consumers still reading config directly, failing every facility test at createTestContext: - websocketClientService did new URL(config.sync.host) — now reads the resolved host and no-ops when the server is unconfigured - FacilitySyncManager read sync email/password from config — now from getSyncConfig
…rdening (continuation — these files were missed by an earlier mis-staged commit) - syncCredentials: await setPassword, hash-derived email, cap facility ids - integrity: skip facility-match central verify when unconfigured - serverConfig: guard configHost, ?? over || - setup: trusted-source-IP gate, atomic writes - ping carries setupRequired; serverConfig test wiring fixed
- central POST /admin/syncCredentials: unauthenticated rejected, non-admin forbidden, empty ids rejected, admin provisions a sync user (hashed password), repeat calls rotate the same account - facility isTrustedSetupSource: trusts loopback/RFC1918/link-local/Tailscale (v4+v6), rejects public + malformed - facility setup: GET /public/ping reports setupRequired; POST /public/setup/sync rejects missing host / empty facilities / non-https
Emptying the default sync.host left the test env with no host, so every test constructing CentralServerConnection threw. The test config supplies sync host/email/password + facilities (a configured env, as before), so CSC builds. Setup/serverConfig tests adjusted for the now-configured test server.
The base's lock was stale under the node 26 bump's npm (dev->devOptional flips, nested duplicate dedup), failing the CI lockfile check on any PR off this base. Regenerated with npm 11 so `npm install` is a no-op.
134a7a0 to
fae9f64
Compare
The mSupply task tests mocked selectFacilityIds to control the server's facility ids per-test, but the tasks now read getServerFacilityIds. Mock that instead (partial serverConfig mock, keeping the real initServerConfig so createTestContext still works).
…/TAM-6863/sync-host-setup
- LocalSystemSecret.getSecret/setSecret were renamed to get/set by the TAM-6862 crypto merge; serverConfig and the setup endpoint still called the old names, so the wizard 500'd and the boot password read silently failed. - Extract the triplicated configured/unconfigured sync boot block (timesync + central connection + sync manager + timezone checks, plus the warn) into setupSyncRuntime(); use it from startAll/startTasks and the sync path of startApp. - mSupply task tests mocked ../../dist/serverConfig, but the tasks load ../serverConfig from source under jest — point the mock at ../../app/serverConfig.
…stemSecret get/set
…echeck The determinism harness builds its baseline DB at the pre-migration commit, which can pre-date the build-less switch and so records `.js` migration names. At HEAD, areMigrationsAvailable() consulted migrations directly, tripping createMigrationInterface's `.js`-vs-`.ts` guard before migrateAndHash() could run the upgrade that normalises them. Normalise the storage in the precheck too (exporting normaliseMigrationStorageExtensions from @tamanu/upgrade), matching what upgrade() does before any migration state is read.
| import { log } from '@tamanu/shared/services/logging'; | ||
|
|
||
| // { sync: { host, email, password }, facilityIds }, resolved once at boot. | ||
| let resolved = null; |
There was a problem hiding this comment.
[BES Requirements] suggestion
Module-level mutable singleton let resolved = null means getSyncConfig() / getServerFacilityIds() return stale data if initServerConfig is called again (e.g. after the setup wizard writes new facts). The setup handler does call initServerConfig after saving, but other callers that captured the old return value (e.g. CentralServerConnection constructor stored host in its constructor) won't pick up the change until the process restarts. The code comment in setup.js says "the sync process picks it up on its next (re)start" which is correct, but isServerConfigured() in the same process will flip mid-flight — worth documenting that the sync runtime is NOT hot-reloaded.
| if (isServerConfigured()) { | ||
| context.timesync = await initTimesync({ | ||
| models: context.models, | ||
| url: `${getSyncConfig().host.replace(/\/*$/, '')}/api/timesync`, |
There was a problem hiding this comment.
[BES Requirements] critical
performTimeZoneChecks is called with remote: context.centralServer, but context.centralServer is never set in the API branch (the comment on line 48 even says "doesn't open a central connection"). While performTimeZoneChecks handles undefined remote gracefully today, this is fragile — the intent is clearly wrong and will break if the timezone check is ever tightened. Either skip the remote check entirely when there's no central connection, or instantiate one.
| const password = crypto.randomBytes(24).toString('base64url'); | ||
|
|
||
| const existing = await User.findOne({ where: { email } }); | ||
| if (existing) { |
There was a problem hiding this comment.
[BES Requirements] suggestion
The displayName is built as System: ${uniqueFacilityIds.join(', ')} sync. With up to 100 facility IDs (the schema max), this could produce a very long string. Consider truncating or summarising when the list exceeds a reasonable length, or lowering the .max(100) limit to match realistic use.
| email: z.string().trim().min(1), | ||
| password: z.string().min(1), | ||
| facilityIds: z | ||
| .array(z.string().trim().min(1)) |
There was a problem hiding this comment.
[BES Requirements] suggestion
isProduction is evaluated once at module load time (const isProduction = process.env.NODE_ENV === 'production'). This is fine for the running server but will bind to whatever NODE_ENV is set at import time, which can cause surprises in tests where NODE_ENV may change. Consider inlining the check inside the refine callback: process.env.NODE_ENV === 'production'.
| .max(100), | ||
| }); | ||
|
|
||
| export const setupSyncHandler = asyncHandler(async (req, res) => { |
There was a problem hiding this comment.
[Integration tests] suggestion
POST /public/setup/sync is a security-sensitive unauthenticated endpoint but the integration test in __tests__/apiv1/Setup.test.js only covers the 409 (already-configured) path. The trusted-source IP gate (line 78) is only unit-tested via isTrustedSetupSource, and there is no integration test for request validation (missing/invalid fields returning 4xx). Consider adding at least a validation-error integration test (e.g. missing host or facilityIds) to exercise the Zod schema through the HTTP layer — the unit test for isTrustedSetupSource doesn't prove the middleware wires req.ip correctly. The comment in Setup.test.js acknowledges the gap ('happy path is verified manually'), which is reasonable given the test harness always boots a configured server, but the validation path can be tested even against a configured server (it runs before the isServerConfigured check) — actually no, the trusted-source check runs first. Still, a test confirming the 403 from a non-trusted IP (if the test runner's loopback counts as trusted, test that it's accepted) would strengthen confidence in the wiring.
| } | ||
|
|
||
| // Reconfiguring a live server goes through env / direct DB, not this endpoint. | ||
| if (isServerConfigured()) { |
There was a problem hiding this comment.
[Security] critical
TOCTOU race condition: isServerConfigured() is checked outside the transaction (line 85), but the config is written inside a transaction (line 135). Two concurrent requests from trusted sources can both pass the check and race to configure the server. The second overwrites the first's credentials silently — an attacker on the local network could re-point a server that was just legitimately configured. Move the configured check inside the transaction (or use a DB-level advisory lock / unique constraint) so only one request can win.
| export const setupSyncHandler = asyncHandler(async (req, res) => { | ||
| req.flagPermissionChecked(); | ||
|
|
||
| if (!isTrustedSetupSource(req.ip)) { |
There was a problem hiding this comment.
[Security] critical
IP-based trust is bypassable when a reverse proxy is configured. req.ip honours Express's trust proxy setting (set from config.proxy.trusted in addFacilityMiddleware.js:22). If the proxy list is misconfigured or overly broad, an external attacker can spoof X-Forwarded-For to inject a private-range IP and bypass isTrustedSetupSource. Since this is the primary gate on an unauthenticated endpoint that makes outbound requests (SSRF) and can claim a fresh server, consider a defence-in-depth measure: e.g. a one-time setup token generated at install time that must accompany the request, so network position alone is not sufficient.
|
|
||
| const email = syncUserEmail(uniqueFacilityIds); | ||
| const displayName = `System: ${uniqueFacilityIds.join(', ')} sync`; | ||
| const password = crypto.randomBytes(24).toString('base64url'); |
There was a problem hiding this comment.
[Security] suggestion
The plaintext password is returned in the HTTP response body (line 46). If central is behind a TLS-terminating proxy and the downstream leg is plain HTTP, or if response logging/caching is enabled at any layer, the password is exposed. Consider whether the credential can be exchanged via a short-lived token or at least ensure the response carries Cache-Control: no-store to prevent intermediary caching.
|
🦸 Review Hero Summary Below consensus threshold (4 unique issues not confirmed by majority)
Nitpicks
Local fix prompt (copy to your coding agent)Fix these issues identified on the pull request. One commit per issue fixed.
|
- setup.js: close the TOCTOU between the isServerConfigured() check and the write with an advisory lock + re-check inside the transaction, so concurrent trusted requests can't both configure the server. Inline the NODE_ENV check so tests that set it are respected. - syncCredentials.js: summarise the display name for servers with many facilities; send the credential response with Cache-Control: no-store. - startApp.js: don't pass a non-existent central connection as the timezone-check remote on the API server. - serverConfig.js: document that the sync runtime is not hot-reloaded. - test: assert the provisioned sync password actually authenticates.


Changes
Adds a deliberate first-run setup workflow for pointing a facility server at its sync host, held in local system facts rather than originating in config (TAM-6863).
Boot resolution. A new
serverConfigresolver (mirroringinitDeviceId) resolves the sync host/credentials and facility ids once at boot, precedenceSYNC_URL/SYNC_FACILITY_IDSenv → local system fact → legacy config. The runtime readsgetSyncConfig()/getServerFacilityIds()/isServerConfigured()instead ofconfig.sync.*/selectFacilityIds(config), and the boot paths skip timesync/central/sync setup when unconfigured so a fresh server still boots to serve the wizard.First-run wizard. When the facility server is unconfigured the web app shows a setup wizard instead of login (
GET /api/public/setup/statusgate): sync server URL + central administrator credentials + facility id(s). On submit the facility authenticates the admin against central, requiresmanage:all, then has central provision a dedicated sync user and stores the returned credentials.Endpoints. Public
GET /api/public/setup/status+POST /api/public/setup/sync(facility);POST /api/admin/syncCredentials(central,manage:all) find-or-rotates aSystem: <facilities> syncuser.Secrets. The sync password is stored encrypted in
local_system_secrets; host/username/facility ids are plain facts.Also:
resetSyncConfigfacility subcommand to clear config and re-run setup; the defaultsync.hostconfig is emptied (kept as a deprecated one-version fallback). TheSYNC_URLenv var keeps the single connection-string form for headless/k8; the wizard uses separate fields.Auto-Deploy
Options
Tests
Review Hero
.github/review-hero/suppressions.yml. Also runs automatically at the end of any auto-fix run.Remember to...