Remove broken JDBC connection from pool on local-transaction connection errors#26063
Remove broken JDBC connection from pool on local-transaction connection errors#26063renatsaf wants to merge 3 commits into
Conversation
…on errors When a pooled JDBC connection dies while in use (for example after an HTTP request-timeout interrupts a statement and leaves the connection closed), LocalTransactionImpl caught the SQLException from begin/commit/rollback and rethrew it as a LocalTransactionException without notifying the pool. The broken connection was therefore never flagged with a CONNECTION_ERROR_OCCURRED event, so it was never removed from the pool. The only thing that could discard it was validation-on-checkout, which "validate-atmost-once-period-in-seconds" suppresses while the period has not elapsed - leaving the connection broken forever. Fire ManagedConnectionImpl.connectionErrorOccurred() when the failure is a genuine connection error, mirroring the existing XA path (XAStartOccurred/XAEndOccurred). Detection is driver agnostic: SQLState class "08" (connection exception) and the SQLRecoverableException / SQLNonTransientConnectionException subclasses, walking both the next-exception and cause chains. Data/transient errors (e.g. constraint violations) do not discard the connection. Fixes eclipse-ee4j#25930 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…mpletes Following review of the previous commit: firing connectionErrorOccurred from LocalTransactionImpl removed the resource from the pool immediately, while the resource was still enlisted in the active transaction. The JTA transaction-completion path (ConnectionPool.transactionCompleted, driven by the connector's own resource set in PoolTxHelper) would then re-process the already-removed handle, causing redundant cleanup and monitoring-counter drift. Fix it in LocalTxConnectionEventListener (used only by local-tx, non-XA resources - XA uses a separate listener, so this is scoped to the bug): when a connection error is signalled while the resource is still enlisted, set the hasConnectionErrorOccurred() flag but defer the actual pool removal. The flag makes ConnectionPool discard the connection on the next checkout (the check runs before validation, so validate-atmost-once cannot keep it alive), and keeping the listener attached lets the normal connection-close / transaction-completion path return the resource to the pool, where it is removed exactly once. Updated the LocalTransactionImpl javadoc to describe the deferred removal. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Built and unit-tested both touched modules locally with Maven 3.9.16 / JDK 21:
|
Covers the issue eclipse-ee4j#25930 listener change: a CONNECTION_ERROR_OCCURRED while the resource is still enlisted in a transaction flags the resource (hasConnectionErrorOccurred) but keeps the listener attached and defers pool removal, so the transaction-completion path does not re-process an already removed handle. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Added unit-test coverage for the deferred-removal behavior and re-verified locally with Maven 3.9.16 / JDK 21. New test — Local results:
The full Oracle + request-timeout end-to-end scenario from the issue still isn't reproducible in my environment, so that path relies on CI / a maintainer with a real Oracle setup. |
|
Just a small warning here - the code around JDBC pools and transactions is pretty horrible and I would not recommend using AI for that, because it is very easy to get confused by these nonstandard heavily "spagettized" sources. To make it even more complicated, it is multithreaded, so any change in line order has might have some effect. Basically once I see you are adding lines, it smells to me that it makes things worse not better. And refactoring that big ball of mud - we do that in iterations, each opens some small door to improvements like yours, but ... I have doubts about those notifications before throwing the |
Thank you for the reply. I have same problem in the issue in my project in prod. Could ypu please take the issue in work without using AI? The only thing that I use FirebirdSQL connection instead of Oracle. |
Problem
Fixes #25930.
A pooled JDBC connection that breaks while in use (e.g. an HTTP
request-timeout-secondsinterrupts a statement and leaves the Oracle connection closed) is never removed from the pool whenvalidate-atmost-once-period-in-secondsis greater than0, so the application keeps getting the dead connection forever:Setting the period to
0"fixes" it only because that forces validation on every checkout.Root cause
LocalTransactionImpl.begin()/commit()/rollback()caught theSQLExceptionfrom the physical connection and rethrew it as aLocalTransactionExceptionwithout raising aCONNECTION_ERROR_OCCURREDevent. As a resultResourceHandle.setConnectionErrorOccurred()was never set and the pool never discarded the connection.The only remaining safety net was validation-on-checkout (
ConnectionPool.isConnectionValid), andvalidate-atmost-once-period-in-seconds > 0deliberately skips that while the period has not elapsed — so a connection that broke shortly after its last validation stays broken indefinitely.Fix
1. Detect the error (RA,
LocalTransactionImpl). RaiseManagedConnectionImpl.connectionErrorOccurred()from the local-transaction operations when the failure is a genuine connection error, so the resource gets flagged withhasConnectionErrorOccurred(). The pool checks that flag before validation on every checkout, so the dead connection is discarded regardless of the validate-atmost-once setting.Detection is driver-agnostic and conservative:
SQLRecoverableException/SQLNonTransientConnectionExceptionsubclasses, and08(connection exception),getNextException()andgetCause()chains.Data/transient failures (e.g. constraint violations, SQLState
22/23) do not discard the connection. Oracle reportsORA-17008with SQLState08003and the interrupted read asSQLRecoverableException, so both symptoms are covered.2. Remove it cleanly (
LocalTxConnectionEventListener). A connection error during begin/commit/rollback happens while the resource is still enlisted in the transaction. Removing it from the pool immediately would make the JTA transaction-completion path (ConnectionPool.transactionCompleted, driven by the connector's own resource set inPoolTxHelper) re-process an already-removed handle, causing redundant cleanup and monitoring-counter drift. So when the resource is still enlisted, the listener sets the error flag but defers pool removal to the normal connection-close / transaction-completion path; the connection is then removed exactly once at the next checkout.This second change is scoped to local-tx, non-XA resources only — XA resources use a different listener (
ConnectorAllocator.ConnectionListenerImpl), so XA behavior is unchanged.Tests
LocalTransactionImplTestcovers the connection-error detection logic (recoverable, non-transient-connection, SQLState08003, next-exception/cause chains, and negative cases for data errors / null SQLState).🤖 Generated with Claude Code