PS-11137 fix: binlog-server fails to reconnect in position-based replication mode when network timeout interrupts a transaction#123
Merged
Conversation
…ication mode when network timeout interrupts a transaction https://perconadev.atlassian.net/browse/PS-11137 Always trim the in-memory event buffer to the last completed transaction boundary on connection termination, regardless of the replication mode (GTID-based or position-based). Previously, 'storage::discard_incomplete_transaction_events()' was invoked only when 'storage::is_in_gtid_replication_mode()' returned true, based on the assumption that in position-based mode it is safe to resume streaming from an arbitrary mid-transaction byte offset. This assumption turned out to be incorrect: on reconnect, 'reader_context' always expects the first logical event delivered after the pseudo-preamble (FDE + optional rotate) to be one of 'anonymous_gtid_log' / 'gtid_log' / 'gtid_tagged_log' (see 'reader_context::process_event_in_gtid_log_expected_state'). If a 'mysql_binlog_fetch()' timeout / network error fires after the source delivered 'anonymous_gtid_log' but before delivering the corresponding 'BEGIN' query event, the persisted stream offset would point at 'BEGIN', and the next 'mysql_binlog_open()' attempt would resume the stream mid-transaction, immediately triggering the "expected gtid_log_event-like event" assertion in 'reader_context' and terminating PBS. Calling 'discard_incomplete_transaction_events()' unconditionally rewinds the in-memory buffer (and therefore the offset persisted to the metadata file) back to the previous transaction boundary, so reconnection in position-based mode always restarts the stream from a position where the next event is a GTID-style event, matching the reader state machine expectations.
b5614cd to
5fb8db6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
https://perconadev.atlassian.net/browse/PS-11137
Always trim the in-memory event buffer to the last completed transaction boundary on connection termination, regardless of the replication mode (GTID-based or position-based).
Previously, 'storage::discard_incomplete_transaction_events()' was invoked only when 'storage::is_in_gtid_replication_mode()' returned true, based on the assumption that in position-based mode it is safe to resume streaming from an arbitrary mid-transaction byte offset. This assumption turned out to be incorrect: on reconnect, 'reader_context' always expects the first logical event delivered after the pseudo-preamble (FDE + optional rotate) to be one of 'anonymous_gtid_log' / 'gtid_log' / 'gtid_tagged_log' (see 'reader_context::process_event_in_gtid_log_expected_state'). If a 'mysql_binlog_fetch()' timeout / network error fires after the source delivered 'anonymous_gtid_log' but before delivering the corresponding 'BEGIN' query event, the persisted stream offset would point at 'BEGIN', and the next 'mysql_binlog_open()' attempt would resume the stream mid-transaction, immediately triggering the
"expected gtid_log_event-like event" assertion in 'reader_context' and terminating PBS.
Calling 'discard_incomplete_transaction_events()' unconditionally rewinds the in-memory buffer (and therefore the offset persisted to the metadata file) back to the previous transaction boundary, so reconnection in position-based mode always restarts the stream from a position where the next event is a GTID-style event, matching the reader state machine expectations.