Skip to content

Advisory lock for migrations acquires and releases on different pool connections, causing E_UNABLE_RELEASE_LOCK #1153

@simoneNEMO

Description

@simoneNEMO

Package version

@adonisjs/lucid: 22.x

Describe the bug

Description

The migration runner's advisory lock mechanism (PG_TRY_ADVISORY_LOCK / PG_ADVISORY_UNLOCK) is fundamentally broken when the connection pool has more than one connection. The lock and unlock calls execute on different PostgreSQL backend connections, causing the unlock to fail with E_UNABLE_RELEASE_LOCK because PostgreSQL session-level advisory locks are bound to the session that acquired them.

This leaves the advisory lock permanently held on an orphaned pool connection, blocking all subsequent migration runs until the connection is destroyed or the lock is manually released.

Environment

  • @adonisjs/lucid: 22.x
  • Database: PostgreSQL (but MySQL is affected by the same pattern — GET_LOCK / RELEASE_LOCK are also session-scoped)
  • Pool config: min > 1 (or any pool size where connections may differ between calls)

Steps to Reproduce

  1. Configure a PostgreSQL database with pool min: 2 or higher
  2. Run node ace migration:run with at least one pending migration
  3. Observe that releaseLock() throws E_UNABLE_RELEASE_LOCK

The issue is intermittent — it depends on whether the pool returns the same connection for both calls. With higher pool sizes and concurrent activity, the probability increases significantly. In our production Kubernetes deployment (2 API replicas + 1 admin replica, pool max: 20), it reproduces reliably during rolling deployments.

Root Cause Analysis

In src/dialects/pg.ts, both getAdvisoryLock and releaseAdvisoryLock use this.client.rawQuery():

async getAdvisoryLock(key: string): Promise<boolean> {
  const response = await this.client.rawQuery(
    `SELECT PG_TRY_ADVISORY_LOCK('${key}') as lock_status;`
  )
  return response.rows[0] && response.rows[0].lock_status === true
}

async releaseAdvisoryLock(key: string): Promise<boolean> {
  const response = await this.client.rawQuery(
    `SELECT PG_ADVISORY_UNLOCK('${key}') as lock_status;`
  )
  return response.rows[0] && response.rows[0].lock_status === true
}

rawQuery() creates a RawQueryBuilder, which is executed by QueryRunner.executeUsingManagedConnection(). This method acquires a random connection from tarn's pool, executes the query, and immediately releases the connection back:

// src/query_runner/index.ts
private async executeUsingManagedConnection(query: Knex.QueryBuilder | Knex.Raw) {
  const queryClient = this.getQueryClient(query)
  const connection = await queryClient.client.acquireConnection()  // random connection
  query.connection(connection)
  const [error, result] = await this.executeQuery(query)
  queryClient.client.releaseConnection(connection)  // returned to pool immediately
  // ...
}

Since PG_TRY_ADVISORY_LOCK is a session-level lock in PostgreSQL, it is bound to the specific backend connection that executed it. When releaseAdvisoryLock later calls PG_ADVISORY_UNLOCK on a different pool connection, PostgreSQL returns false (that session never held the lock), and Lucid throws E_UNABLE_RELEASE_LOCK.

Failure sequence in production

1. acquireLock() → rawQuery("PG_TRY_ADVISORY_LOCK(1)")
   → Pool checks out Connection A → lock acquired → Connection A returned to pool

2. Migrations run (using various pool connections)

3. releaseLock() → rawQuery("PG_ADVISORY_UNLOCK(1)")
   → Pool checks out Connection B → PG_ADVISORY_UNLOCK returns false
   → E_UNABLE_RELEASE_LOCK thrown

4. Lock remains held on Connection A (orphaned in the pool)

5. All subsequent migration:run attempts fail with E_UNABLE_ACQUIRE_LOCK
   until Connection A is destroyed by the pool or the process exits

Suggested Fix

Pin a dedicated connection for the advisory lock lifecycle

The dialect should acquire a raw connection from the pool, hold it for both lock and unlock, and only release it after unlocking. The Knex .connection() method on query builders supports pinning a specific connection — this is the same mechanism already used by QueryRunner.executeUsingManagedConnection() itself.

// src/dialects/pg.ts

export class PgDialect implements DialectContract {
  // ... existing fields ...

  /**
   * Dedicated connection held between advisory lock acquire and release.
   * Session-level advisory locks are bound to the PostgreSQL backend
   * connection, so we must use the same connection for both operations.
   */
  private lockConnection: any = null

  async getAdvisoryLock(key: string): Promise<boolean> {
    const knexClient = this.client.getWriteClient()
    this.lockConnection = await knexClient.client.acquireConnection()

    try {
      const response = await knexClient
        .raw(`SELECT PG_TRY_ADVISORY_LOCK('${key}') as lock_status;`)
        .connection(this.lockConnection)

      return response.rows[0] && response.rows[0].lock_status === true
    } catch (error) {
      knexClient.client.releaseConnection(this.lockConnection)
      this.lockConnection = null
      throw error
    }
  }

  async releaseAdvisoryLock(key: string): Promise<boolean> {
    const knexClient = this.client.getWriteClient()

    if (!this.lockConnection) {
      // Fallback: no pinned connection (lock was never acquired or already released)
      const response = await this.client.rawQuery(
        `SELECT PG_ADVISORY_UNLOCK('${key}') as lock_status;`
      )
      return response.rows[0] && response.rows[0].lock_status === true
    }

    try {
      const response = await knexClient
        .raw(`SELECT PG_ADVISORY_UNLOCK('${key}') as lock_status;`)
        .connection(this.lockConnection)

      return response.rows[0] && response.rows[0].lock_status === true
    } finally {
      knexClient.client.releaseConnection(this.lockConnection)
      this.lockConnection = null
    }
  }
}

The same fix applies to src/dialects/mysql.ts, which has the identical problem with GET_LOCK / RELEASE_LOCK.

Additional robustness improvement

run() currently calls shutdown() after the catch block, which is functionally similar to finally for normal control flow.
However, if shutdown() throws, that error can still escape and may obscure the original migration failure.

It may help to wrap shutdown/release in dedicated error handling so both the primary migration error and shutdown error are preserved/reported.

Reproduction repo

No response

Metadata

Metadata

Assignees

Labels

Priority: CriticalThe issue needs urgent attentionType: BugThe issue has indentified a bug

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions