Skip to content

[fix][test] Drop racy concurrent-delete assertion in CompactionConcurrencyTest#25827

Open
merlimat wants to merge 1 commit into
apache:masterfrom
merlimat:mmerli/fix-compaction-concurrency-test
Open

[fix][test] Drop racy concurrent-delete assertion in CompactionConcurrencyTest#25827
merlimat wants to merge 1 commit into
apache:masterfrom
merlimat:mmerli/fix-compaction-concurrency-test

Conversation

@merlimat
Copy link
Copy Markdown
Contributor

Motivation

CompactionConcurrencyTest.testDisableCompactionConcurrently fires two threads that both call deleteSubscriptionAsync(\"__compaction\") on the same topic and asserts that at least one of them fails:

assertTrue(f1.get().isCompletedExceptionally() || f2.get().isCompletedExceptionally());

That assertion has been flaky since the test was migrated from ProducerConsumerBase to SharedPulsarBaseTest in #25381.

The original test relied on MockZooKeeper.delay(...) to inject a delay into the cursor delete operation. This forced the two calls to overlap inside PersistentTopic#asyncDeleteCursorWithCleanCompactionLedger's synchronized block — the second one then hit the disablingCompaction.compareAndSet(false, true) == false branch and failed with SubscriptionBusyException.

SharedPulsarBaseTest uses an in-memory metadata store and there is no equivalent delay hook. The first delete can complete (and remove the subscription) before the second thread even reaches the broker. The second call then runs through asyncDeleteCursorWithClearDelayedMessage, sees subscriptions.get(\"__compaction\") == null, and returns success as a no-op:

PersistentSubscription persistentSubscription = subscriptions.get(subscriptionName);
if (persistentSubscription == null) {
    log.warn().attr(...).log(\"Can't find subscription, skip delete cursor\");
    unsubscribeFuture.complete(null);
    return;
}

Both futures complete successfully — perfectly valid behavior — and the assertion fails.

Example failure: https://scans.gradle.com/s/2odpjlsucvuly/tests/task/:pulsar-broker:test/details/org.apache.pulsar.broker.service.persistent.CompactionConcurrencyTest/testDisableCompactionConcurrently/2/output

Modifications

Drop the "at least one fails" assertion — the broker handles either outcome correctly (concurrent overlap → SubscriptionBusyException; sequential → no-op).

Keep the deadlock check (both futures complete in bounded time) and add a verification that the compaction subscription ends up removed, which is the actual invariant we care about.

Verifying this change

This change is already covered by CompactionConcurrencyTest.testDisableCompactionConcurrently. Locally I ran 5 times with --rerun-tasks; all passed.

Does this pull request potentially affect one of the following parts:

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

…rencyTest

testDisableCompactionConcurrently fires two threads that both call
deleteSubscriptionAsync("__compaction") and asserts that at least one
fails. That assertion was flaky after the test was migrated to
SharedPulsarBaseTest in apache#25381.

The original test used MockZooKeeper.delay(...) to inject a delay into
the cursor delete operation, forcing the two calls to overlap inside
PersistentTopic#asyncDeleteCursorWithCleanCompactionLedger's synchronized
block, which made one of them fail with SubscriptionBusyException.
SharedPulsarBaseTest uses an in-memory metadata store with no such hook,
so the first delete can complete (and remove the subscription) before the
second thread even reaches the broker. The second call then finds
subscriptions.get("__compaction") == null and returns success as a no-op
— both succeed legally.

Drop the "at least one fails" assertion; the broker handles either
outcome correctly. Keep the deadlock check (both futures complete) and
add a verification that the compaction subscription ends up removed,
which is the actual invariant we care about.

Example failure:
https://scans.gradle.com/s/2odpjlsucvuly/tests/task/:pulsar-broker:test/details/org.apache.pulsar.broker.service.persistent.CompactionConcurrencyTest/testDisableCompactionConcurrently/2/output
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant