Skip to content

Remove broken JDBC connection from pool on local-transaction connection errors#26063

Open
renatsaf wants to merge 3 commits into
eclipse-ee4j:mainfrom
renatsaf:fix-25930-jdbc-broken-connection-validation
Open

Remove broken JDBC connection from pool on local-transaction connection errors#26063
renatsaf wants to merge 3 commits into
eclipse-ee4j:mainfrom
renatsaf:fix-25930-jdbc-broken-connection-validation

Conversation

@renatsaf

@renatsaf renatsaf commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Problem

Fixes #25930.

A pooled JDBC connection that breaks while in use (e.g. an HTTP request-timeout-seconds interrupts a statement and leaves the Oracle connection closed) is never removed from the pool when validate-atmost-once-period-in-seconds is greater than 0, so the application keeps getting the dead connection forever:

RAR5031:System Exception
jakarta.resource.spi.LocalTransactionException: ORA-17008: Closed connection

Setting the period to 0 "fixes" it only because that forces validation on every checkout.

Root cause

LocalTransactionImpl.begin()/commit()/rollback() caught the SQLException from the physical connection and rethrew it as a LocalTransactionException without raising a CONNECTION_ERROR_OCCURRED event. As a result ResourceHandle.setConnectionErrorOccurred() was never set and the pool never discarded the connection.

The only remaining safety net was validation-on-checkout (ConnectionPool.isConnectionValid), and validate-atmost-once-period-in-seconds > 0 deliberately skips that while the period has not elapsed — so a connection that broke shortly after its last validation stays broken indefinitely.

Fix

1. Detect the error (RA, LocalTransactionImpl). Raise ManagedConnectionImpl.connectionErrorOccurred() from the local-transaction operations when the failure is a genuine connection error, so the resource gets flagged with hasConnectionErrorOccurred(). The pool checks that flag before validation on every checkout, so the dead connection is discarded regardless of the validate-atmost-once setting.

Detection is driver-agnostic and conservative:

  • SQLRecoverableException / SQLNonTransientConnectionException subclasses, and
  • SQLState class 08 (connection exception),
  • walking both the getNextException() and getCause() chains.

Data/transient failures (e.g. constraint violations, SQLState 22/23) do not discard the connection. Oracle reports ORA-17008 with SQLState 08003 and the interrupted read as SQLRecoverableException, so both symptoms are covered.

2. Remove it cleanly (LocalTxConnectionEventListener). A connection error during begin/commit/rollback happens while the resource is still enlisted in the transaction. Removing it from the pool immediately would make the JTA transaction-completion path (ConnectionPool.transactionCompleted, driven by the connector's own resource set in PoolTxHelper) re-process an already-removed handle, causing redundant cleanup and monitoring-counter drift. So when the resource is still enlisted, the listener sets the error flag but defers pool removal to the normal connection-close / transaction-completion path; the connection is then removed exactly once at the next checkout.

This second change is scoped to local-tx, non-XA resources only — XA resources use a different listener (ConnectorAllocator.ConnectionListenerImpl), so XA behavior is unchanged.

Tests

LocalTransactionImplTest covers the connection-error detection logic (recoverable, non-transient-connection, SQLState 08003, next-exception/cause chains, and negative cases for data errors / null SQLState).

Note: I could not run a full Maven build locally (this environment's Maven 3.8.6 is incompatible with the repo's glassfishbuild-maven-plugin:4.1.0, which needs Maven 3.9+); relying on CI to validate the build and the connector/transaction integration paths.

🤖 Generated with Claude Code

…on errors

When a pooled JDBC connection dies while in use (for example after an HTTP
request-timeout interrupts a statement and leaves the connection closed),
LocalTransactionImpl caught the SQLException from begin/commit/rollback and
rethrew it as a LocalTransactionException without notifying the pool. The
broken connection was therefore never flagged with a CONNECTION_ERROR_OCCURRED
event, so it was never removed from the pool. The only thing that could discard
it was validation-on-checkout, which "validate-atmost-once-period-in-seconds"
suppresses while the period has not elapsed - leaving the connection broken
forever.

Fire ManagedConnectionImpl.connectionErrorOccurred() when the failure is a
genuine connection error, mirroring the existing XA path
(XAStartOccurred/XAEndOccurred). Detection is driver agnostic: SQLState class
"08" (connection exception) and the SQLRecoverableException /
SQLNonTransientConnectionException subclasses, walking both the next-exception
and cause chains. Data/transient errors (e.g. constraint violations) do not
discard the connection.

Fixes eclipse-ee4j#25930

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…mpletes

Following review of the previous commit: firing connectionErrorOccurred from
LocalTransactionImpl removed the resource from the pool immediately, while the
resource was still enlisted in the active transaction. The JTA
transaction-completion path (ConnectionPool.transactionCompleted, driven by the
connector's own resource set in PoolTxHelper) would then re-process the
already-removed handle, causing redundant cleanup and monitoring-counter drift.

Fix it in LocalTxConnectionEventListener (used only by local-tx, non-XA
resources - XA uses a separate listener, so this is scoped to the bug): when a
connection error is signalled while the resource is still enlisted, set the
hasConnectionErrorOccurred() flag but defer the actual pool removal. The flag
makes ConnectionPool discard the connection on the next checkout (the check runs
before validation, so validate-atmost-once cannot keep it alive), and keeping
the listener attached lets the normal connection-close / transaction-completion
path return the resource to the pool, where it is removed exactly once.

Updated the LocalTransactionImpl javadoc to describe the deferred removal.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@renatsaf

renatsaf commented Jun 7, 2026

Copy link
Copy Markdown
Contributor Author

Built and unit-tested both touched modules locally with Maven 3.9.16 / JDK 21:

  • appserver/jdbc/jdbc-ra/jdbc-core — BUILD SUCCESS, 35 tests, 0 failures (includes the new LocalTransactionImplTest).
  • appserver/connectors/connectors-runtime — BUILD SUCCESS, 124 tests, 0 failures, 1 pre-existing skip (includes the existing ConnectionPoolTest / PoolManagerImplTest).

Covers the issue eclipse-ee4j#25930 listener change: a CONNECTION_ERROR_OCCURRED while the
resource is still enlisted in a transaction flags the resource
(hasConnectionErrorOccurred) but keeps the listener attached and defers pool
removal, so the transaction-completion path does not re-process an already
removed handle.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@renatsaf

renatsaf commented Jun 7, 2026

Copy link
Copy Markdown
Contributor Author

Added unit-test coverage for the deferred-removal behavior and re-verified locally with Maven 3.9.16 / JDK 21.

New testLocalTxConnectionEventListenerTest.connectionErrorWhileEnlistedDefersPoolRemoval: asserts that a CONNECTION_ERROR_OCCURRED raised while the resource is still enlisted in a transaction flags the resource (hasConnectionErrorOccurred()) but keeps the listener attached and defers pool removal (a strict mock verifies removeConnectionEventListener is not called). This guards the fix against a regression back to immediate mid-transaction removal.

Local results:

  • appserver/jdbc/jdbc-ra/jdbc-core — BUILD SUCCESS, 35 tests, 0 failures (incl. LocalTransactionImplTest).
  • appserver/connectors/connectors-runtime — BUILD SUCCESS, 124 tests, 0 failures, 1 pre-existing skip (incl. LocalTxConnectionEventListenerTest, ConnectionPoolTest, PoolManagerImplTest).

The full Oracle + request-timeout end-to-end scenario from the issue still isn't reproducible in my environment, so that path relies on CI / a maintainer with a real Oracle setup.

@dmatej

dmatej commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Just a small warning here - the code around JDBC pools and transactions is pretty horrible and I would not recommend using AI for that, because it is very easy to get confused by these nonstandard heavily "spagettized" sources. To make it even more complicated, it is multithreaded, so any change in line order has might have some effect.

Basically once I see you are adding lines, it smells to me that it makes things worse not better. And refactoring that big ball of mud - we do that in iterations, each opens some small door to improvements like yours, but ... I have doubts about those notifications before throwing the LocalTransactionException. It is just a quick look, I have just a limited time now to go deep into it.

@renatsaf

renatsaf commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

Just a small warning here - the code around JDBC pools and transactions is pretty horrible and I would not recommend using AI for that, because it is very easy to get confused by these nonstandard heavily "spagettized" sources. To make it even more complicated, it is multithreaded, so any change in line order has might have some effect.

Basically once I see you are adding lines, it smells to me that it makes things worse not better. And refactoring that big ball of mud - we do that in iterations, each opens some small door to improvements like yours, but ... I have doubts about those notifications before throwing the LocalTransactionException. It is just a quick look, I have just a limited time now to go deep into it.

Thank you for the reply. I have same problem in the issue in my project in prod. Could ypu please take the issue in work without using AI?
Exactly this problem I have:
A pooled JDBC connection that breaks while in use (e.g. an HTTP request-timeout-seconds interrupts a statement and leaves the Oracle connection closed) is never removed from the pool when validate-atmost-once-period-in-seconds is greater than 0, so the application keeps getting the dead connection forever.

The only thing that I use FirebirdSQL connection instead of Oracle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

JDBC Connection Pool validation doesn't work if "Validate At Most Once" is set to >0

2 participants