Skip to content

feat: migrate doc-api to Cloud SQL IAM auth#273

Open
panish16 wants to merge 11 commits into
bcgov:mainfrom
panish16:feat/doc-api-iam-auth
Open

feat: migrate doc-api to Cloud SQL IAM auth#273
panish16 wants to merge 11 commits into
bcgov:mainfrom
panish16:feat/doc-api-iam-auth

Conversation

@panish16
Copy link
Copy Markdown
Contributor

@panish16 panish16 commented Jun 4, 2026

Summary

Migrates doc-api to Cloud SQL IAM auth, matching the pattern established in `person-search` (PR #9) and `legal-api`.

Changes:

  • `config.py`: When `CLOUDSQL_INSTANCE_CONNECTION_NAME` is set, uses `DBConfig` from `sbc-connect-common/cloud-sql-connector` (IAM auth, `pool_pre_ping`, `pool_recycle`, `pool_use_lifo`). Falls back to Unix socket or TCP — no behaviour change without the env var. Adds pool size tuning via `DATABASE_MIN/MAX_POOL_SIZE` etc. Adds `MigrationConfig` for migration jobs.
  • `migrations/env.py`: Adds `DATABASE_OWNER_ROLE` support — `SET ROLE` before running migrations so DDL is executed under the owner role, ensuring correct table ownership when the IAM service account runs Alembic.
  • `init.py`: `setup_pg8000_close_event_listener` conditional on IAM auth being active.

Why: `doc-api-prod` (`c4hnrd-prod`) is actively hitting `pg8000.exceptions.InterfaceError` on connection pool teardown (scale-to-zero noise) AND on live requests via stale pooled connections. IAM auth via the Cloud SQL Python Connector adds `pool_pre_ping=True` which prevents stale connection reuse at the pool level, not just suppresses the error noise.

Reference: person-search PR #9


GCP infra required

  • Cloud SQL instance: enable IAM database authentication
  • Cloud Run service account: grant `roles/cloudsql.instanceUser` on the Cloud SQL instance
  • Cloud SQL: create IAM database user matching the Cloud Run service account email
  • Cloud Run (`doc-api-dev`): set the new env vars above

Test plan

  • CI passes (linting, unit tests, Docker build)
  • Deploy to `c4hnrd-dev` with `CLOUDSQL_INSTANCE_CONNECTION_NAME` set — app starts and connects
  • Run migrations via `flask db upgrade` — verify tables owned by correct role (`DATABASE_OWNER_ROLE`)
  • No `InterfaceError` in Cloud Run logs at overnight scale-to-zero
  • `Suppressed pg8000 InterfaceError on connection close during teardown.` appears in DEBUG logs
  • Key API endpoints respond correctly after scale-up from zero

panish16 added 11 commits May 26, 2026 16:42
Calls setup_pg8000_close_event_listener(engine) after db init in notify-api
and notify-delivery so pg8000 InterfaceError is suppressed instead of logged
as an error when SQLAlchemy closes pooled connections during teardown.

Resolves #33564
Covers all branches of _setup_pg8000_graceful_shutdown:
non-pg8000 early return, listener registration, normal close,
InterfaceError suppression, re-raise of other errors,
and ImportError fallback. Also renames _InterfaceError to
_interface_error to satisfy naming linter rules.
…n doc-api

Replace local _setup_pg8000_graceful_shutdown copy with the shared
setup_pg8000_close_event_listener from sbc-connect-common/cloud-sql-connector
(main branch, v0.2.3). Matches the approach used by notify-api,
notify-delivery, and the lear queue services.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant