Skip to content

PHOENIX-7854 Use Abortable interface to trigger RS aborts#2474

Open
tkhurana wants to merge 3 commits into
apache:PHOENIX-7562-feature-newfrom
tkhurana:PHOENIX-7854
Open

PHOENIX-7854 Use Abortable interface to trigger RS aborts#2474
tkhurana wants to merge 3 commits into
apache:PHOENIX-7562-feature-newfrom
tkhurana:PHOENIX-7854

Conversation

@tkhurana
Copy link
Copy Markdown
Contributor

@tkhurana tkhurana commented May 13, 2026

Summary

  • Replace RuntimeException-based abort with the HBase Abortable interface for triggering real RS shutdowns on fatal replication errors
  • Annotate PhoenixRegionServerEndpoint and IndexRegionObserver with @CoreCoprocessor to access HasRegionServerServices and obtain an Abortable (RegionServerServices), threaded into ReplicationLogGroup
  • On unrecoverable errors the Disruptor event handler sets a fatalException field and fails pending sync futures, but never calls abort() or close() itself — the
    producer thread (RPC handler) that receives the failed future initiates the abort
  • Wrap onEvent() in a top-level catch (Throwable) so exceptions never escape to LogExceptionHandler, which becomes a safety net
  • Simplify close(boolean graceful) into a single close() that always drains with a timeout — when the handler has a fatal exception the drain completes instantly since each event hits the short-circuit path;

Test plan

  • Unit tests in ReplicationLogGroupTest updated to expect IOException / PhoenixWALSyncTimeoutException instead of RuntimeException
  • All 18 HABaseIT integration tests pass
  • Verify graceful close drains the Disruptor with a bounded timeout before halting

tkhurana added 3 commits May 13, 2026 13:16
Use @CoreCoprocessor + HasRegionServerServices to obtain an Abortable
(RegionServerServices) in IndexRegionObserver and PhoenixRegionServerEndpoint,
and thread it into ReplicationLogGroup so that abort() triggers a real RS
shutdown instead of only throwing a RuntimeException.
…r lifecycle

The event handler thread no longer calls abort() or close() on itself.
On fatal errors it sets a fatalException field, fails pending sync futures,
and keeps consuming events (so the ring buffer doesn't block producers).
The producer thread that receives the failed future calls abort(), which
closes resources and invokes Abortable on its own thread.

This eliminates the close(boolean graceful) split — a single close()
always drains with a timeout and falls back to halt. When the handler
has a fatal exception the drain completes instantly. The onShutdown
callback derives graceful vs non-graceful from fatalException == null,
removing the gracefulShutdownEventHandlerFlag field entirely.
@tkhurana tkhurana requested review from apurtell and kadirozde May 14, 2026 00:53
if (env instanceof RegionServerCoprocessorEnvironment) {
this.serverName = ((RegionServerCoprocessorEnvironment) env).getServerName();
}
if (env instanceof HasRegionServerServices) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check necessary? Are there cases where env is not an instance of RegionServerCoprocessorEnvironment here or is this just defensive coding? Can we add a comment here to clarify?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants