Skip to content

PS-11137 fix: binlog-server fails to reconnect in position-based replication mode when network timeout interrupts a transaction#123

Merged
percona-ysorokin merged 1 commit into
Percona-Lab:0.2from
kamil-holubicki:PS-111137
May 15, 2026
Merged

PS-11137 fix: binlog-server fails to reconnect in position-based replication mode when network timeout interrupts a transaction#123
percona-ysorokin merged 1 commit into
Percona-Lab:0.2from
kamil-holubicki:PS-111137

Conversation

@kamil-holubicki
Copy link
Copy Markdown
Collaborator

https://perconadev.atlassian.net/browse/PS-11137

Always trim the in-memory event buffer to the last completed transaction boundary on connection termination, regardless of the replication mode (GTID-based or position-based).

Previously, 'storage::discard_incomplete_transaction_events()' was invoked only when 'storage::is_in_gtid_replication_mode()' returned true, based on the assumption that in position-based mode it is safe to resume streaming from an arbitrary mid-transaction byte offset. This assumption turned out to be incorrect: on reconnect, 'reader_context' always expects the first logical event delivered after the pseudo-preamble (FDE + optional rotate) to be one of 'anonymous_gtid_log' / 'gtid_log' / 'gtid_tagged_log' (see 'reader_context::process_event_in_gtid_log_expected_state'). If a 'mysql_binlog_fetch()' timeout / network error fires after the source delivered 'anonymous_gtid_log' but before delivering the corresponding 'BEGIN' query event, the persisted stream offset would point at 'BEGIN', and the next 'mysql_binlog_open()' attempt would resume the stream mid-transaction, immediately triggering the
"expected gtid_log_event-like event" assertion in 'reader_context' and terminating PBS.

Calling 'discard_incomplete_transaction_events()' unconditionally rewinds the in-memory buffer (and therefore the offset persisted to the metadata file) back to the previous transaction boundary, so reconnection in position-based mode always restarts the stream from a position where the next event is a GTID-style event, matching the reader state machine expectations.

…ication mode when network timeout interrupts a transaction

https://perconadev.atlassian.net/browse/PS-11137

Always trim the in-memory event buffer to the last completed transaction
boundary on connection termination, regardless of the replication mode
(GTID-based or position-based).

Previously, 'storage::discard_incomplete_transaction_events()' was invoked
only when 'storage::is_in_gtid_replication_mode()' returned true, based on
the assumption that in position-based mode it is safe to resume streaming
from an arbitrary mid-transaction byte offset. This assumption turned out
to be incorrect: on reconnect, 'reader_context' always expects the first
logical event delivered after the pseudo-preamble (FDE + optional rotate)
to be one of 'anonymous_gtid_log' / 'gtid_log' / 'gtid_tagged_log' (see
'reader_context::process_event_in_gtid_log_expected_state'). If a
'mysql_binlog_fetch()' timeout / network error fires after the source
delivered 'anonymous_gtid_log' but before delivering the corresponding
'BEGIN' query event, the persisted stream offset would point at 'BEGIN',
and the next 'mysql_binlog_open()' attempt would resume the stream
mid-transaction, immediately triggering the
"expected gtid_log_event-like event" assertion in 'reader_context' and
terminating PBS.

Calling 'discard_incomplete_transaction_events()' unconditionally rewinds
the in-memory buffer (and therefore the offset persisted to the metadata
file) back to the previous transaction boundary, so reconnection in
position-based mode always restarts the stream from a position where the
next event is a GTID-style event, matching the reader state machine
expectations.
Copy link
Copy Markdown
Collaborator

@percona-ysorokin percona-ysorokin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@percona-ysorokin percona-ysorokin merged commit 1ffc5c3 into Percona-Lab:0.2 May 15, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants