Skip to content

fix(storage): unblock Postgres boot — disable FK creation during AutoMigrate (RAN-49)#32

Merged
aksOps merged 1 commit into
mainfrom
ran49-postgres-fk-fix
Apr 25, 2026
Merged

fix(storage): unblock Postgres boot — disable FK creation during AutoMigrate (RAN-49)#32
aksOps merged 1 commit into
mainfrom
ran49-postgres-fk-fix

Conversation

@aksOps

@aksOps aksOps commented Apr 25, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Set DisableForeignKeyConstraintWhenMigrating=true on the global GORM config so AutoMigrate never emits spans.trace_id → traces.trace_id (or the equivalent on logs). After RAN-21 made trace identity tenant-scoped (composite (tenant_id, trace_id) unique), Postgres rejected the auto-emitted FK at CREATE TABLE time with SQLSTATE 42830 and the binary aborted boot on every fresh Postgres deployment.
  • The model already declared constraint:false, but gorm v2's relationship parser does not honor that tag — which is exactly why the MySQL migrate path was already dropping fk_traces_spans / fk_traces_logs post-AutoMigrate. This change replaces the cross-driver hack with the documented config flag.
  • Async ingestion intent is preserved: spans/logs may legitimately land before their parent trace; cross-tenant isolation is enforced at the application layer (every read scopes by tenant_id) and at the composite unique on traces.
  • Adds TestPG_AutoMigrate_FromEmpty_NoSpansTracesFK (integration tag) — boots a fresh Postgres 16 container, runs AutoMigrateModels, and asserts no FK exists from spans / logs back to traces. Locks in the regression class so SQLite can no longer hide a Postgres DDL failure (SQLite does not validate FK targets at CREATE TABLE time, which is why the regression slipped past CI).

Verification

  • go vet ./... — clean
  • go build ./... — clean
  • go test -race ./... — all default suites pass
  • go test -race -tags=integration -timeout=10m ./internal/storage/... against Docker-backed Postgres 16:
    • TestPG_AutoMigrate_FromEmpty_NoSpansTracesFK — PASS
    • TestPG_ILIKE_CaseInsensitiveSearch / TestPG_Bytea_CompressedTextRoundTrip / TestPG_PgTrgm_IndexCreated / TestPG_VacuumAnalyze_OutsideTx / TestPG_PurgeLogsBatched_LargeVolume / TestPG_PurgeTracesBatched_OrphanSpanSweep_NOT_IN / TestPG_AutoMigrate_BlobTypesBecomeBytea — all PASS
    • TestPG_PgTrgm_SubstringQueryUsesGIN — failing both before and after this change (planner picks Seq Scan on tiny seed; pre-existing flake, unrelated to RAN-49)
  • Reproduced RAN-49 on the unpatched tree: stashed the fix, re-ran an integration test, observed the exact failure mode failed to migrate database: ERROR: there is no unique constraint matching given keys for referenced table "traces" (SQLSTATE 42830) with the offending DDL CONSTRAINT "fk_traces_spans" FOREIGN KEY ("trace_id") REFERENCES "traces"("trace_id"). With the fix applied, the migrator completes cleanly.

Acceptance criteria mapping

  • ./otelcontext boots cleanly against an empty Postgres 16 DB — verified via integration test (which exercises AutoMigrateModels against a fresh Postgres 16 container).
  • spans.trace_id FK references match an actual unique constraint after migrate — by removing the FK entirely, the schema is internally consistent on every driver. Async ingest semantics already required no FK.
  • Postgres-backed test in internal/storage/... exercises a full migrate from empty — TestPG_AutoMigrate_FromEmpty_NoSpansTracesFK.

Test plan

  • Reviewer runs go test -race -tags=integration -run TestPG_AutoMigrate_FromEmpty -timeout=10m ./internal/storage/... against a Docker-equipped host and confirms PASS.
  • Reviewer optionally points a fresh Postgres 16 DSN at the binary and confirms boot completes (no SQLSTATE 42830).

Resolves RAN-49.

…res boot (RAN-49)

After RAN-21 made trace identity tenant-scoped — replacing the single-column
unique on traces.trace_id with a composite (tenant_id, trace_id) — Postgres
rejected the auto-generated spans.trace_id FK at CREATE TABLE time with
SQLSTATE 42830 ("there is no unique constraint matching given keys for
referenced table"), aborting boot on every fresh Postgres deployment.

The model already declared `constraint:false`, but gorm v2's relationship
parser does not honor that tag — which is why the MySQL migrate path was
explicitly dropping fk_traces_spans / fk_traces_logs post-AutoMigrate. The
reliable cross-driver suppression is the gorm.Config flag, so set
DisableForeignKeyConstraintWhenMigrating=true on the global session.

This matches the existing async-ingest design (spans/logs may land before
their parent trace) and removes the silent SQLite-hiding-Postgres regression
class: SQLite did not validate FK targets at CREATE TABLE time, so the
SQLite-only migrate tests passed while Postgres failed.

- factory.go: set DisableForeignKeyConstraintWhenMigrating; reword the MySQL
  drop comments to reflect they are now legacy-only.
- pg_integration_test.go: add TestPG_AutoMigrate_FromEmpty_NoSpansTracesFK
  to lock in the regression class — boots an empty Postgres 16 container,
  asserts AutoMigrateModels succeeds, then verifies no FK exists from
  spans/logs back to traces.

Verified: full default test suite green; 8/8 PG integration tests pass
against a real Postgres 16 container (the unrelated pg_trgm planner-stats
flake is pre-existing and reproduces on the prior tree).

Co-Authored-By: Paperclip <noreply@paperclip.ing>
@sonarqubecloud

Copy link
Copy Markdown

@aksOps aksOps merged commit eb87231 into main Apr 25, 2026
11 checks passed
@aksOps aksOps deleted the ran49-postgres-fk-fix branch April 25, 2026 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant