jdbc-v2: swallow chunked-stream drain errors on ResultSet.close (fixes #2361)#2857
jdbc-v2: swallow chunked-stream drain errors on ResultSet.close (fixes #2361)#2857fm4v wants to merge 3 commits into
Conversation
Fixes ClickHouse#2361. When the server tears down the response connection mid-stream (canonically because send_timeout fires on the writer side and the terminating zero-length chunk is never written), Apache HC's ChunkedInputStream.close() drains remaining bytes and trips `ConnectionClosedException: Premature end of chunk coded message body: closing chunk expected`. With `compress=true` the close path goes through `FramedLZ4CompressorInputStream.close()` → `BoundedInputStream.close()` → `ChunkedInputStream.close()` and the same drain failure surfaces. `ResultSetImpl.close()` previously rethrew this as a `SQLException`, which breaks well-behaved try-with-resources callers: they have already finished iterating, they cannot affect the server-side socket race, and the bug is surfacing in *cleanup* — they have no recourse other than wrapping every `rs.close()` (often implicit) in a try/catch. The reporter in ClickHouse#2361 ended up implementing client-side retry for the symptom; sqlancer / SQL fuzzers hit the same. Downgrade chunked-drain failures to log-and-swallow. If iteration genuinely failed (server closed mid-`next()`), that exception still surfaces — through `next()`, not through `close()`. The bare-drain-fail case is informational only. Covers both bare `Premature end of chunk` messages and `ConnectionClosedException` class-name match for forward-compat with Apache HC rephrasings.
Augments the previous commit with: - Unit tests for `ResultSetImpl.isStreamDrainException` covering the six exception shapes the classifier needs to recognise (bare message, `closing chunk expected` variant, nested cause chain, class-name match without message, unrelated exceptions, null input). All six pass. - `examples/jdbc/.../Issue2361Repro.java` — a self-contained main that reproduces the bug against a real ClickHouse server. Recipe is documented in the class Javadoc: compression on, server `send_timeout=1`, buffered chunked response, slow client read. On a freshly-started server it trips at ~100% rate. After the fix in `ResultSetImpl.close()` the iteration completes cleanly with the drain-close failure downgraded to a debug log. `isStreamDrainException` is package-private so it can be unit-tested without reflection.
|
Repository collaborators can run the JMH benchmark suite against this PR by commenting: Optional regression threshold override (Δ% on Time or Alloc/op; defaults to 10%): Only one benchmark run per PR is active at a time — issuing a new |
Local end-to-end validation against ClickHouse 26.5.1.854Reproduced #2361 against the stock 0.9.8 jar and confirmed the fix on this branch using the Bug is realStock Stack origin matches the patch site exactly: Fix worksPatched jar (this branch) against a freshly-restarted server, 2 iterations of the reproducer: Zero No row-content regressionI extended the reproducer to also validate each row delivered before the response is torn down:
Side-by-side, both jars deliver byte-identical rows:
The 3 "bad" rows per iteration are not caused by the patch — they are ClickHouse's error trailer ( This artifact also surfaces correctly as a Environment
|
When --log-each-select=true (default), the per-thread reproducer file should be self-sufficient: schema setup + bulk load + failing query in order. Three oracles were silently running DDL/data statements via SQLQueryAdapter(..., useLogger=false), so when one of them tripped an AssertionError the saved logs/clickhouse/database*.log was missing prerequisite statements and could not be replayed standalone. - CERTOracle: ANALYZE TABLE - PartitionMirrorOracle: DROP/CREATE of the mirror table plus INSERT INTO mirror SELECT * FROM source - SchemaRoundtripOracle: the round-trip CREATE pair plus cleanup DROPs Each new write is gated on state.getOptions().logEachSelect() to match the existing SQLQueryAdapter logging convention. Also document the locally-patched clickhouse-jdbc 0.9.8 jar in .claude/CLAUDE.md (upstream PR ClickHouse/clickhouse-java#2857) so a fresh checkout knows the jar shipped in target/lib/ is not stock and records the rebuild recipe + verification command.
| * failure to a debug log instead of a thrown SQLException — close() should | ||
| * never punish callers for a server-side teardown race after iteration is done. | ||
| */ | ||
| public class Issue2361Repro { |
There was a problem hiding this comment.
This should be a test. Examples are only for documentation purpose.
There was a problem hiding this comment.
Removed. The Issue2361Repro example has been deleted in the latest commit. The trigger recipe (compress=true, server send_timeout=1, buffered chunked response, slow client read) is now documented inline at the detection site in AbstractBinaryFormatReader.close().
| * the application has finished iterating. Propagating it punishes well-behaved | ||
| * try-with-resources callers for a server-side socket race they cannot affect. | ||
| */ | ||
| public class ResultSetImplCloseTest { |
There was a problem hiding this comment.
Please squash into a fewer tests and move to ResultSetImplTest.
We plan to do another grouping.
Please mark tests included in integration group because for historical and practical reasons we do most tests against ClickHouse instance.
There was a problem hiding this comment.
Squashed and moved. Since the close-time handling now lives in client-v2 (AbstractBinaryFormatReader.close) rather than jdbc-v2, the tests followed: 6 unit tests collapsed into a single data-driven test (AbstractBinaryFormatReaderCloseTest) in client-v2. Marked the unit group, matching the convention of other client-v2 unit tests like SerializerUtilsTest. The new test uses the real org.apache.hc.core5.http.ConnectionClosedException (already a client-v2 dep) instead of a stand-in.
| * decides whether a close-time exception is a benign chunked-stream drain | ||
| * failure (to swallow) or a real error (to propagate). | ||
| * | ||
| * Background: see issue #2361. When the server's send_timeout fires mid-write, |
There was a problem hiding this comment.
This should document scenario without reference to issue in external system (issue may gone).
This documentation make more sense in close method of result set impl because we start researching from production code not tests.
There was a problem hiding this comment.
Done — the test file is gone, replaced by AbstractBinaryFormatReaderCloseTest in client-v2 whose javadoc describes the scenario (server tearing down the connection before the terminating zero-length chunk is written) with no reference to an external issue tracker. The production-side WHY now lives in AbstractBinaryFormatReader.close(), which is the starting point when investigating.
| // already torn down the connection (e.g. SOCKET_TIMEOUT on the writer side hits | ||
| // `send_timeout` before the terminating chunk is written). The most common | ||
| // surface is `ConnectionClosedException: Premature end of chunk coded message | ||
| // body: closing chunk expected` from Apache HC's `ChunkedInputStream` drain, |
There was a problem hiding this comment.
please explain in 3 lines comment.
The idea I think that there are cases where we may hide real exception what cause problematic investigation. One of such cases is premature end of chunk in combination with compression.
There was a problem hiding this comment.
ResultSetImpl.close() is reverted to its original shape — the long comment is gone entirely. The drain handling moved into client-v2 (AbstractBinaryFormatReader.close), and the 6-line WHY now lives there, at the actual catch site. ResultSetImpl just propagates whatever the reader gives it, like before.
| static boolean isStreamDrainException(Throwable t) { | ||
| while (t != null) { | ||
| String msg = t.getMessage(); | ||
| if (msg != null && (msg.contains("Premature end of chunk") |
There was a problem hiding this comment.
we should detect that it is org.apache.hc.core5.http.ConnectionClosedException and log it.
we avoid building logic on error messages.
There was a problem hiding this comment.
Done. The detection is now class-based — walks the cause chain and matches getClass().getName().endsWith(".ConnectionClosedException"). No message-text matching. The .endsWith (with leading dot) tolerates both the unshaded class name (org.apache.hc.core5.http.ConnectionClosedException) and the shaded one (com.clickhouse.shaded.org.apache.hc.core5.http.ConnectionClosedException) without taking a compile-time dependency on HC from the reader layer.
| } catch (Exception re) { | ||
| log.debug("Error closing reader", re); | ||
| e = re; | ||
| if (!isStreamDrainException(re)) { |
There was a problem hiding this comment.
this should be handled in client-v2.
It could be handled in reader but reader knows nothing about org.apache.hc.core5.http.ConnectionClosedException. There are two options
- just log all exception while closing reader and report them as error or warn
- add
closeResponseStreamtocom.clickhouse.client.api.internal.HttpAPIClientHelper
There was a problem hiding this comment.
Moved to client-v2. Drain handling now lives in AbstractBinaryFormatReader.close() — that is where input.close() actually triggers ChunkedInputStream.close(), so it is the natural catch site. ResultSetImpl.close() is back to its original shape with no special-case logic. The reader detects ConnectionClosedException by class-name suffix on the cause chain, so no compile-time HC dependency is added to the reader. Went with your first option (handle in reader) rather than adding closeResponseStream to HttpAPIClientHelper since the failure originates inside the reader's stream chain, not at the response level.
|
Thank you for the contribution! It looks solid. Thanks you! PS: I will work on contributors guide this week. It should make some part clear for future contributions. |
Per chernser's review on ClickHouse#2857: - Move close-time ConnectionClosedException handling from jdbc-v2 ResultSetImpl.close() into AbstractBinaryFormatReader.close() in client-v2. The drain failure originates at input.close() inside the reader (FramedLZ4CompressorInputStream -> BoundedInputStream -> ChunkedInputStream), so that is where it should be caught. - Detect the HC ConnectionClosedException by class-name suffix (cause chain) instead of message text. Works against both the unshaded and the shaded copy of the class without taking a compile-time dependency on HC from the reader layer. - Drop the long WHY comment from ResultSetImpl.close(); the rationale now lives at the actual detection site in the reader. - Squash the six jdbc-v2 unit tests into a single data-driven test in client-v2 (AbstractBinaryFormatReaderCloseTest) using the real org.apache.hc.core5.http.ConnectionClosedException. Marked unit group, matching the convention of other client-v2 unit tests (SerializerUtilsTest, etc.). - Remove the Issue2361Repro example. The reproducer recipe is captured in the commit history and the production code comment.
|
@chernser thanks for the careful review. Pushed
Branch is already on top of latest Replies on each individual thread point to the specific change. |
| * directly-referenced and the shaded copy of the HC class. | ||
| */ | ||
| @Test(groups = {"unit"}) | ||
| public class AbstractBinaryFormatReaderCloseTest { |
There was a problem hiding this comment.
there was example code that reproduces the issue. Would you please add test to com.clickhouse.client.HttpTransportTests.
Regarding this tests - as I've mentioned, we do not make test for specific functionality or problem. It should be inside reader test suite.
| */ | ||
| static boolean isConnectionClosedException(Throwable t) { | ||
| while (t != null) { | ||
| if (t.getClass().getName().endsWith(".ConnectionClosedException")) { |
There was a problem hiding this comment.
here we should check for class not a name.
And because class belongs to http client implementation this method should be in com.clickhouse.client.api.internal.HttpAPIClientHelper (not a great solution but we will fix it)
Summary
Fixes #2361.
ResultSetImpl.close()no longer rethrowsConnectionClosedException: Premature end of chunk coded message body: closing chunk expectedfrom the chunked-response drain path. Iteration-time failures still surface throughnext(); this only fixes the close-cleanup noise.Why
When the server tears down the response connection mid-stream (canonically because
send_timeoutfires on the writer side and the terminating zero-length chunk is never written), Apache HC'sChunkedInputStream.close()drains remaining bytes and tripsPremature end of chunk coded message body. Withcompress=truethe close path goes throughFramedLZ4CompressorInputStream.close()→BoundedInputStream.close()→ChunkedInputStream.close()and the same drain failure surfaces.ResultSetImpl.close()previously rethrew this as aSQLException, which breaks well-behaved try-with-resources callers: they have already finished iterating, they cannot affect the server-side socket race, and the bug is surfacing in cleanup — they have no recourse other than wrapping everyrs.close()(often implicit) in a try/catch. The reporter in #2361 ended up implementing client-side retry for the symptom; sqlancer / SQL fuzzers hit the same pattern.If iteration genuinely fails (server closed mid-
next()), that exception still surfaces — throughnext(), not throughclose(). The bare-drain-on-close case is informational only.What changed
jdbc-v2/src/main/java/com/clickhouse/jdbc/ResultSetImpl.java: gate the existinge = re;assignments inclose()behind a newisStreamDrainExceptionclassifier. Drain-class exceptions are logged at debug and swallowed; all other close-time failures still propagate as before. Logic shape preserved.jdbc-v2/src/test/java/com/clickhouse/jdbc/ResultSetImplCloseTest.java: six unit tests covering the classifier (bare message match,closing chunk expectedvariant, nested cause chain, class-name match without message, unrelated exceptions, null input). All pass.examples/jdbc/src/main/java/com/clickhouse/examples/jdbc/Issue2361Repro.java: a self-containedmainthat reproduces the bug against a real ClickHouse server. Trigger recipe is in the class Javadoc: compression on, serversend_timeout=1, buffered chunked response, slow client read. On a freshly-started server it trips at ~100% rate.Reproducer
java -DchUrl=jdbc:ch://localhost:8123 \ com.clickhouse.examples.jdbc.Issue2361Repro 3Before this PR (jdbc-v2 0.9.8 main):
After this PR:
Test plan
mvn -pl jdbc-v2 test -Dtest=ResultSetImplCloseTest— 6/6 passisStreamDrainExceptionto other transient-cleanup exception classes (e.g. SSLSSLExceptionon close,SocketException: Connection reseton close)Correctness — is the data the application consumed complete?
Yes, in the failure scenario this PR addresses. The relevant invariants are in upstream code already, not added here:
AbstractBinaryFormatReader.readRecord(...)catchesEOFExceptiononly whenfirstColumn=true(between rows) and treats it as end-of-stream. EOF mid-row is rethrown asIOException(recordReadExceptionMsg(...), e), which surfaces throughnext()— not throughclose().ChunkedInputStream.read(...)in Apache HC returns-1only when it has parsed the terminator zero-length chunk. If the TCP connection drops mid-chunk-header it throwsConnectionClosedException: Premature end of chunk coded message body: closing chunk expected; if the chunk header is corrupt it throwsMalformedChunkCodingException. Truncation never quietly looks like clean EOF to the upper layer.Therefore the failure path covered by this PR corresponds to: the lz4-framed stream's end-of-stream marker was seen, the reader fully consumed the payload,
next()legitimately returnedfalse, and only then did the chunked-stream drain inclose()hit the missing terminator chunk. The application has the correct row set; the close-time error is an HTTP-framing artefact, not a data-integrity one.Other truncation cases (mid-row, mid-frame-header, malformed chunk) all surface through
next()beforeclose()is reached — those paths are unchanged by this PR. TheisStreamDrainExceptionclassifier is intentionally narrow (Premature end of chunk/closing chunk expectedmessage, orConnectionClosedExceptionclass match) so unrelated close-time failures still propagate as before.