persist/pubsub: fix connection leak on gRPC stream error#35938
Merged
teskje merged 1 commit intoMaterializeInc:mainfrom Apr 13, 2026
Merged
persist/pubsub: fix connection leak on gRPC stream error#35938teskje merged 1 commit intoMaterializeInc:mainfrom
teskje merged 1 commit intoMaterializeInc:mainfrom
Conversation
When the client's gRPC response stream errors, the reconnect loop drops the tonic `Channel` but hyper's background task keeps running: It's blocked polling the `broadcast_messages` async stream, which holds a live `BroadcastStream` receiver. This keeps the HTTP2 connection open, leaking one connection per reconnect. Fix this by giving the stream a cancellation token. When the reconnect loop drops `cancel_tx`, the stream observes it via `select!` and terminates, allowing hyper to close the HTTP2 connection.
Contributor
|
Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone. PR title guidelines
Pre-merge checklist
|
antiguru
approved these changes
Apr 13, 2026
Member
antiguru
left a comment
There was a problem hiding this comment.
Checks out! Maybe wait for someone else to review, too.
DAlperin
approved these changes
Apr 13, 2026
Member
DAlperin
left a comment
There was a problem hiding this comment.
LGTM. Slightly bummed not to have the test in this PR but it would be kind of a pain, I agree. Thanks for fixing!
Contributor
Author
|
TFTRs! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When the client's gRPC response stream errors, the reconnect loop drops the tonic
Channelbut hyper's background task keeps running: It's blocked polling thebroadcast_messagesasync stream, which holds a liveBroadcastStreamreceiver. This keeps the HTTP2 connection open, leaking one connection per reconnect.Fix this by giving the stream a cancellation token. When the reconnect loop drops
cancel_tx, the stream observes it viaselect!and terminates, allowing hyper to close the HTTP2 connection.Motivation
Fixes https://github.com/MaterializeInc/database-issues/issues/11276
Verification
There is a repro test in the issue comments and I verified that it passes with this fix, i.e. it doesn't reproduce the leak anymore. It'd be great to include the test in this PR, but it requires patching the
max_decoding_message_sizeto allow injecting messages that are too large and produce decoding errors. Is there another way to provoke receive errors? Alternatively, we could make themax_decoding_message_sizea dyncfg (currently it's hardcoded tousize::MAX) but it would be a bit of plumbing that only exists for this one test.