perf(network): lazy ErrorMeter construction in RequestMetrics#3
Open
mashraf-222 wants to merge 1 commit intotrunkfrom
Open
perf(network): lazy ErrorMeter construction in RequestMetrics#3mashraf-222 wants to merge 1 commit intotrunkfrom
mashraf-222 wants to merge 1 commit intotrunkfrom
Conversation
Reduces RequestMetrics$ErrorMeter instance count on an idle broker from 17,010 to 6 by replacing eager pre-population in the constructor with computeIfAbsent on first markErrorMeter call. Backing map changes from EnumMap to ConcurrentHashMap to preserve thread-safety on the lazy-init path. Public API unchanged. MBean ObjectName format preserved. New RequestMetricsTest locks in the lazy contract with 7 tests including a 32-thread contention check. Evidence: ~400 KB per broker steady-state heap reduction at idle; 6/7 tests fail on trunk as regression guard.
Collaborator
Author
Why this is a winThe claim here is unusual for a perf PR: the measurement is the claim. We're counting objects in a heap histogram. 17,010 before, 6 after. No confidence intervals, no rig sensitivity, no inference chain from microbenchmark to production — rerun it tomorrow on any machine and you get the same number. Contrast with most allocation-reduction work, which claims a per-call byte save and hopes it surfaces against GC noise. The per-broker saving is ~400 KB at idle. Small. What makes it worth shipping is that it's constant, always-on, and scales linearly with fleet size. Rough production arithmetic:
Not dramatic. But it's free (8 production lines, zero runtime cost, public API unchanged) and every broker restart pays it back forever. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
perf(network): lazy ErrorMeter construction in RequestMetrics
Summary
RequestMetricseagerly allocated oneErrorMeterinstance perErrorsenum value (135) for every one of the ~126 request-metric groups, producing
17,010
ErrorMeterobjects per broker regardless of which (API, error)tuples the broker ever reports. Making construction lazy (via
computeIfAbsenton firstmarkErrorMetercall) drops the residentErrorMeterinstance count on an idle broker from 17,010 → 6(−99.96%), saves ~408 KB of
ErrorMeterobjects plus several MB of tagLinkedHashMap/Entrystorage, and preserves the Yammer MBeanObjectNameformat exactly. No existing test regresses; a newRequestMetricsTestlocks in the lazy semantics.Session mode: LIBRARY PRIMITIVE (memory-footprint refactor, impact
scope is "every broker process"; argued via allocator call count +
histogram delta, not via benchmark throughput).
What Changed
server/src/main/java/org/apache/kafka/network/metrics/RequestMetrics.javaEnumMap<Errors,_>→ConcurrentMap<Errors,_>; delete eagerfor (Errors error : Errors.values())loop in constructor; replaceget(error)withcomputeIfAbsent(error, e -> new ErrorMeter(name, e))inmarkErrorMeterserver/src/test/java/org/apache/kafka/network/metrics/RequestMetricsTest.javaProduction LOC delta: net −4 lines. Public API: unchanged — only one public
method (
markErrorMeter) exists, and its signature and documented behaviorare unchanged.
Why It Works
The Yammer
Meterregistered by eachErrorMeterwas already lazilyconstructed —
getOrCreateMeter()is guarded by double-checked locking.So the 17,010 eager objects held only:
volatile Meter meter = nullfield (8 B),LinkedHashMap<String,String>(request=X,error=Y,~160-200 B per map once entries are included),
RequestMetrics$ErrorMeteritself (24 B shell).On a typical deployment, only a handful of (API, Error) tuples ever fire
(idle broker showed 4-5
error=MBeans actually registered vs 17,010ErrorMeterJava objects). The other 17,005 objects were dead weight.Replacing the eager loop with lazy
computeIfAbsentmakes the Java objectlifecycle match the MBean lifecycle: create only what is actually used.
ConcurrentHashMap.computeIfAbsentguarantees at-most-once mapping-function execution per key across concurrent threads, preserving the
invariant that each (request, error) pair produces exactly one
ErrorMeterinstance and therefore exactly one Yammer MBeanregistration.
Why It Is Correct
Contract preserved:
markErrorMeter(Errors, int)), same signature.ObjectNameformat(
kafka.network:type=RequestMetrics,name=ErrorsPerSec,request=X,error=Y)— tag insertion order
request→erroris preserved by theLinkedHashMapinsideErrorMeter(unchanged).ErrorMeter+ YammerMeter+ MBean; subsequent calls increment the existingMeter.removeMetrics()iterates over theConcurrentMap.values()(weakly consistent iteration — safe forshutdown, no concurrent writes possible) and calls
removeMeter()oneach, then
errorMeters.clear().EnumMap.getandpost-fix
ConcurrentHashMap.getare lock-free reads of a publishedentry). Only the cold
computeIfAbsentmutator is new; itsthread-safety is guaranteed by
ConcurrentHashMap's contract.Evidence:
RequestMetricsTest(new, 7 tests) locks in the lazy contract:empty map at construction; single entry after
markErrorMeter;idempotent on repeated marks; distinct entries per distinct error;
exactly-one entry under 32-thread × 1000-iteration contention;
MBean
ObjectNameends with the exact suffix":type=RequestMetrics,name=ErrorsPerSec,request=FetchTestProbe,error=NONE"(confirming tag ordering);
removeMetrics()clears the map andunregisters all MBeans.
the production change reverted (stash), 6 of 7 tests fail with
expected: <1> but was: <135>etc. (one test — the MBean ObjectNameformat — passes against trunk because the eager loop did not change
the format). With the fix applied, all 7 pass.
:core:test --tests kafka.network.SocketServerTest(61 tests, 0 failures) — this exercises
updateErrorMetricswhich isthe sole production caller of
markErrorMeter.:server:test --tests org.apache.kafka.network.*(adjacent servernetwork tests) green.
Benchmark Methodology
This is a memory-footprint change, not a latency/throughput change. The
primary metric is the
ErrorMeterresident instance count on an idlebroker. Measurement is deterministic at N=1 — same workload, same JVM,
same heap settings yield the same instance count up to minor internal
events. The 6 remaining Java objects correspond to
ErrorMeters thatwere touched by broker self-heartbeat traffic during the 30 s settle
window. 5 of them materialized their inner Yammer
Meter(visible inJMX as
error=NONEMBeans forApiVersions,BrokerHeartbeat,BrokerRegistration,ControllerRegistration, andDescribeCluster— see
evidence/after-t1/mbean-list.txt). The sixthErrorMeterobject exists but its inner
Meterfield is stillnull, so it isnot yet present in JMX — see "Results" below for the 6-vs-5
explanation.
-Xms1g -Xmx1g),KAFKA_JVM_PERFORMANCE_OPTSunchanged.histogram to let broker internal heartbeats fire.
jcmd GC.run(Full GC, blocking) followed immediately byjcmd GC.class_histogram. Full GC ensures only live objects arecounted.
MBeanDumper.javausing the Attach API, script in session dir).Startup-time measurement was attempted (5 interleaved runs per side)
but confidence intervals overlap at N=5 on this 4-vCPU rig. This PR
therefore does not claim a startup-time improvement; see "Results" for
details.
Results
Primary:
RequestMetrics$ErrorMeterinstance count on idle broker94b6886)RequestMetrics$ErrorMeterinstancesThe 6 remaining Java objects correspond to
ErrorMeters that weretouched by broker self-heartbeat traffic during the 30 s settle window.
Of these 6, 5 have materialized their inner Yammer
Meter(visible inJMX as
error=NONEMBeans — see the next table); the sixth has anErrorMeterobject but the innerMeteris stillnullpending itsfirst
markcall. This is expected and correct: both lazy layers (theouter
computeIfAbsentforErrorMeterand the inner double-checkedgetOrCreateMeter()for the YammerMeter) compose cleanly. The6-vs-5 gap is the window between the two lazy events.
Exact byte delta:
408,240 − 144 = 408,096 B = 398.53 KiBsaved perbroker
RequestMetricsregistry. That is the full per-broker savings:17,010 is already the product of the ~126 request-metric groups × 135
Errorsvalues, so the 408 KB figure is not per-group but per-brokerprocess.
Fleet-scale context. The saving is per broker process, constant.
At 1,000 brokers (typical large deployment), fleet-wide steady-state
resident heap reduction is ~400 MB. Modest and durable. No runtime
cost on the hit path.
Secondary: MBean inventory (JMX ObjectName format preservation)
kafka.network:type=RequestMetrics,*MBeanserror=tagrequest=X,error=Yrequest=X,error=Y(identical)The slight variation in total count (756→759) and
error=count (4→5)reflects timing of which self-heartbeats fired during the ~30 s settle
window (
DescribeClusterappears on feature but not baseline becausethe probe timing differed). ObjectName format is bit-identical in
both directions — no rename, no re-ordered tags, no new MBean names.
Secondary: broker startup time (INCONCLUSIVE)
Raw per-run wall times (ms, cold start → first successful
describeClusterviaBrokerProbe):94b6886)Trunk exhibits a bimodal distribution on this 4-vCPU rig (three runs
~3300 ms, two runs ~4600 ms). Feature is a tight cluster near 4600 ms.
N=5 is insufficient to resolve startup-time deltas on this rig because
trunk exhibits bimodal variance (stdev 688.63 ms vs feature's
129.34 ms). Deriving a credible startup claim would require N≥20 at
minimum. We do not claim a startup effect in either direction.
The primary metric (instance count) is unaffected by this rig noise
and is reported as the conclusive result.
Reproduction
Baseline vs feature heap histogram:
Tests:
Expected: 7 tests pass in
RequestMetricsTest; 61 pass, 2 skipped, 0 failin
SocketServerTest.Callers / Impact Scope
LIBRARY PRIMITIVE.
markErrorMeteris the single public write methodof
RequestMetrics.errorMeters. Callers:core/src/main/scala/kafka/network/RequestChannel.scala:485— the oneproduction call, inside
updateErrorMetrics, fired once per responseper reported
Errorsbucket. Reached fromRequestChannel.sendResponseandsendNoOpResponse.core/src/test/scala/unit/kafka/network/SocketServerTest.scala:1196—one test.
Every Kafka broker process instantiates ~126
RequestMetricsgroups atstartup (one per
ApiKeys.nameplus the specialFetchConsumer,FetchFollower,AddPartitionsToTxnVerification, andListClientMetricsResources/ListConfigResourcesnames). Every one ofthose pre-allocated 135
ErrorMeterobjects eagerly. 17,010 isalready the product of those two numbers, so the savings number
(
408,240 − 144 = 408,096 B = 398.53 KiB) is the full per-brokerfigure — not per-group. In addition, there is ~3 MB of downstream
LinkedHashMap/Entry/Stringstorage attached to the eagerErrorMeters'tagsmaps that is also elided.Fleet-scale context: at 1,000 brokers, ~400 MB of steady-state heap
reclaimed fleet-wide. Per broker the delta is small; its value is that
it is constant, always-on, and requires no tuning.
End-to-end impact (latency, throughput, pause time) not measured — see
PR #1 GC re-measurement showing this rig's consumer JVM is GC-unpressured
at 1-2 GB heap. This PR explicitly does not claim infrastructure-cost
impact beyond the stated footprint reduction.
Risks And Limitations
≤135 entries is ~10-30 ns, same order of magnitude as EnumMap (which
uses array indexing). No measurable difference on the request-
completion hot path.
RequestMetricsinstance: aConcurrentHashMapis heavier than an
EnumMap(rough estimate +64-96 B per instance).× 126 groups × = ~10 KB total overhead. Offset by ≥400 KB saved.
pre-probe
kafka.network:type=RequestMetrics,*error=*may see fewerMBeans on freshly-started brokers (only those that have fired at
least once). This is the explicit intent. JMX MBean names for MBeans
that DO exist are unchanged.
markErrorMetercall for each (API, Error) pair pays aCHM.computeIfAbsent cost (~100-300 ns on a small map) plus Yammer
MBean registration cost. This happens once per tuple per broker
lifetime and is not on the per-request critical path for already-
registered tuples. If a burst of never-before-seen error types
arrives simultaneously, the MBean registration lock (Yammer internal)
may briefly serialize them — no change from baseline since Yammer's
newMeteris already synchronized on the registry.Test Plan
RequestMetricsTest(new, 7 tests): green on this branch, 6 FAILon trunk as expected (regression guard verified).
SocketServerTest(:core, exercisesupdateErrorMetrics): 61tests, 0 failures, 2 skipped.
:server:test --tests org.apache.kafka.network.*(adjacentnetwork tests): green.
ErrorMeterinstances.ObjectNameformat unchanged.a workload that actually fires diverse error types (e.g. fetch
replication failures, producer throttling). Expected behavior:
post-fix
ErrorMetercount grows on demand up to the subset of(API, Error) tuples actually observed in the workload, never
reaching 17,010.
Pre-PR Checklist
ErrorMeter per tuple, removeMetrics unregisters all)
codeflash-ai/kafka) unless upstream isexplicitly requested by user