perf(network): lazy ErrorMeter construction in RequestMetrics by mashraf-222 · Pull Request #3 · codeflash-ai/kafka

mashraf-222 · 2026-05-08T02:20:42Z

perf(network): lazy ErrorMeter construction in RequestMetrics

Summary

RequestMetrics eagerly allocated one ErrorMeter instance per Errors
enum value (135) for every one of the ~126 request-metric groups, producing
17,010 ErrorMeter objects per broker regardless of which (API, error)
tuples the broker ever reports. Making construction lazy (via
computeIfAbsent on first markErrorMeter call) drops the resident
ErrorMeter instance count on an idle broker from 17,010 → 6
(−99.96%), saves ~408 KB of ErrorMeter objects plus several MB of tag
LinkedHashMap/Entry storage, and preserves the Yammer MBean
ObjectName format exactly. No existing test regresses; a new
RequestMetricsTest locks in the lazy semantics.

Session mode: LIBRARY PRIMITIVE (memory-footprint refactor, impact
scope is "every broker process"; argued via allocator call count +
histogram delta, not via benchmark throughput).

What Changed

File	Change	Lines
`server/src/main/java/org/apache/kafka/network/metrics/RequestMetrics.java`	field type `EnumMap<Errors,_>` → `ConcurrentMap<Errors,_>`; delete eager `for (Errors error : Errors.values())` loop in constructor; replace `get(error)` with `computeIfAbsent(error, e -> new ErrorMeter(name, e))` in `markErrorMeter`	+2 / −6
`server/src/test/java/org/apache/kafka/network/metrics/RequestMetricsTest.java`	new file — 7 behavioral tests	+189 (new)

Production LOC delta: net −4 lines. Public API: unchanged — only one public
method (markErrorMeter) exists, and its signature and documented behavior
are unchanged.

Why It Works

The Yammer Meter registered by each ErrorMeter was already lazily
constructed — getOrCreateMeter() is guarded by double-checked locking.
So the 17,010 eager objects held only:

an outer-class reference (8 B),
a volatile Meter meter = null field (8 B),
a 2-entry LinkedHashMap<String,String> (request=X, error=Y,
~160-200 B per map once entries are included),
plus RequestMetrics$ErrorMeter itself (24 B shell).

On a typical deployment, only a handful of (API, Error) tuples ever fire
(idle broker showed 4-5 error= MBeans actually registered vs 17,010
ErrorMeter Java objects). The other 17,005 objects were dead weight.
Replacing the eager loop with lazy computeIfAbsent makes the Java object
lifecycle match the MBean lifecycle: create only what is actually used.

ConcurrentHashMap.computeIfAbsent guarantees at-most-once mapping-
function execution per key across concurrent threads, preserving the
invariant that each (request, error) pair produces exactly one
ErrorMeter instance and therefore exactly one Yammer MBean
registration.

Why It Is Correct

Contract preserved:

Same public method (markErrorMeter(Errors, int)), same signature.
Same MBean ObjectName format
(kafka.network:type=RequestMetrics,name=ErrorsPerSec,request=X,error=Y)
— tag insertion order request → error is preserved by the
LinkedHashMap inside ErrorMeter (unchanged).
Same per-mark behavior: first call constructs the ErrorMeter + Yammer
Meter + MBean; subsequent calls increment the existing Meter.
Same shutdown behavior: removeMetrics() iterates over the
ConcurrentMap.values() (weakly consistent iteration — safe for
shutdown, no concurrent writes possible) and calls removeMeter() on
each, then errorMeters.clear().
Thread-safety unchanged on the hit path (both pre-fix EnumMap.get and
post-fix ConcurrentHashMap.get are lock-free reads of a published
entry). Only the cold computeIfAbsent mutator is new; its
thread-safety is guaranteed by ConcurrentHashMap's contract.

Evidence:

RequestMetricsTest (new, 7 tests) locks in the lazy contract:
empty map at construction; single entry after markErrorMeter;
idempotent on repeated marks; distinct entries per distinct error;
exactly-one entry under 32-thread × 1000-iteration contention;
MBean ObjectName ends with the exact suffix
":type=RequestMetrics,name=ErrorsPerSec,request=FetchTestProbe,error=NONE"
(confirming tag ordering); removeMetrics() clears the map and
unregisters all MBeans.
Verified as a real regression guard: with the test checked in and
the production change reverted (stash), 6 of 7 tests fail with
expected: <1> but was: <135> etc. (one test — the MBean ObjectName
format — passes against trunk because the eager loop did not change
the format). With the fix applied, all 7 pass.
Adjacent test suites green: :core:test --tests kafka.network.SocketServerTest
(61 tests, 0 failures) — this exercises updateErrorMetrics which is
the sole production caller of markErrorMeter.
:server:test --tests org.apache.kafka.network.* (adjacent server
network tests) green.

Benchmark Methodology

This is a memory-footprint change, not a latency/throughput change. The
primary metric is the ErrorMeter resident instance count on an idle
broker. Measurement is deterministic at N=1 — same workload, same JVM,
same heap settings yield the same instance count up to minor internal
events. The 6 remaining Java objects correspond to ErrorMeters that
were touched by broker self-heartbeat traffic during the 30 s settle
window. 5 of them materialized their inner Yammer Meter (visible in
JMX as error=NONE MBeans for ApiVersions, BrokerHeartbeat,
BrokerRegistration, ControllerRegistration, and DescribeCluster
— see evidence/after-t1/mbean-list.txt). The sixth ErrorMeter
object exists but its inner Meter field is still null, so it is
not yet present in JMX — see "Results" below for the 6-vs-5
explanation.

Broker JVM: JDK 25.0.2, G1GC default, 1 GB heap (-Xms1g -Xmx1g),
KAFKA_JVM_PERFORMANCE_OPTS unchanged.
Rig: 4-vCPU AWS VM, Ubuntu 24.04.
Workload: idle (no producer, no consumer). 30 s settle time before the
histogram to let broker internal heartbeats fire.
jcmd GC.run (Full GC, blocking) followed immediately by
jcmd GC.class_histogram. Full GC ensures only live objects are
counted.
MBean count extracted via a JMX dumper (custom
MBeanDumper.java using the Attach API, script in session dir).

Startup-time measurement was attempted (5 interleaved runs per side)
but confidence intervals overlap at N=5 on this 4-vCPU rig. This PR
therefore does not claim a startup-time improvement; see "Results" for
details.

Results

Primary: `RequestMetrics$ErrorMeter` instance count on idle broker

	Trunk (`94b6886`)	Feature
`RequestMetrics$ErrorMeter` instances	17,010	6
Bytes in histogram	408,240	144
Reduction	—	−99.96 %

The 6 remaining Java objects correspond to ErrorMeters that were
touched by broker self-heartbeat traffic during the 30 s settle window.
Of these 6, 5 have materialized their inner Yammer Meter (visible in
JMX as error=NONE MBeans — see the next table); the sixth has an
ErrorMeter object but the inner Meter is still null pending its
first mark call. This is expected and correct: both lazy layers (the
outer computeIfAbsent for ErrorMeter and the inner double-checked
getOrCreateMeter() for the Yammer Meter) compose cleanly. The
6-vs-5 gap is the window between the two lazy events.

Exact byte delta: 408,240 − 144 = 408,096 B = 398.53 KiB saved per
broker RequestMetrics registry. That is the full per-broker savings:
17,010 is already the product of the ~126 request-metric groups × 135
Errors values, so the 408 KB figure is not per-group but per-broker
process.

Fleet-scale context. The saving is per broker process, constant.
At 1,000 brokers (typical large deployment), fleet-wide steady-state
resident heap reduction is ~400 MB. Modest and durable. No runtime
cost on the hit path.

Secondary: MBean inventory (JMX ObjectName format preservation)

	Trunk	Feature
Total `kafka.network:type=RequestMetrics,*` MBeans	756	759
With `error=` tag	4	5
ObjectName format	`request=X,error=Y`	`request=X,error=Y` (identical)

The slight variation in total count (756→759) and error= count (4→5)
reflects timing of which self-heartbeats fired during the ~30 s settle
window (DescribeCluster appears on feature but not baseline because
the probe timing differed). ObjectName format is bit-identical in
both directions — no rename, no re-ordered tags, no new MBean names.

Secondary: broker startup time (INCONCLUSIVE)

Raw per-run wall times (ms, cold start → first successful
describeCluster via BrokerProbe):

	Trunk (`94b6886`)	Feature
Run 1	3392.4	4851.6
Run 2	3347.1	4528.6
Run 3	3272.2	4609.7
Run 4	4601.2	4552.2
Run 5	4582.8	4597.0
Mean (N=5)	3839.14	4627.82
Stdev	688.63	129.34
95 % half-width (t=2.776, df=4)	±854.91 (22.27 %)	±160.58 (3.47 %)
95 % CI	[2984.2, 4694.1]	[4467.2, 4788.4]
CI overlap	yes (shared interval [4467.2, 4694.1])

Trunk exhibits a bimodal distribution on this 4-vCPU rig (three runs
~3300 ms, two runs ~4600 ms). Feature is a tight cluster near 4600 ms.
N=5 is insufficient to resolve startup-time deltas on this rig because
trunk exhibits bimodal variance (stdev 688.63 ms vs feature's
129.34 ms). Deriving a credible startup claim would require N≥20 at
minimum. We do not claim a startup effect in either direction.
The primary metric (instance count) is unaffected by this rig noise
and is reported as the conclusive result.

Reproduction

Baseline vs feature heap histogram:

# Baseline (trunk):
cd /path/to/kafka && git checkout trunk && ./gradlew :server:jar -q

# Start fresh broker with 1 GB heap
rm -rf /tmp/kraft-combined-logs
UUID=$(./bin/kafka-storage.sh random-uuid)
./bin/kafka-storage.sh format -t "$UUID" -c ./config/server.properties --standalone
KAFKA_HEAP_OPTS="-Xms1g -Xmx1g" ./bin/kafka-server-start.sh ./config/server.properties &
BROKER_PID=$!
# Wait for broker readiness (any of: TCP probe on 9092, log line, 30 s sleep)
sleep 30
jcmd "$BROKER_PID" GC.run
sleep 2
jcmd "$BROKER_PID" GC.class_histogram | grep RequestMetrics\\\$ErrorMeter

# Feature: same, after `git checkout codeflash/errormeter-lazy-20260508 && ./gradlew :server:jar -q`

Tests:

./gradlew :server:test --tests org.apache.kafka.network.metrics.RequestMetricsTest
./gradlew :core:test --tests kafka.network.SocketServerTest

Expected: 7 tests pass in RequestMetricsTest; 61 pass, 2 skipped, 0 fail
in SocketServerTest.

Callers / Impact Scope

LIBRARY PRIMITIVE. markErrorMeter is the single public write method
of RequestMetrics.errorMeters. Callers:

core/src/main/scala/kafka/network/RequestChannel.scala:485 — the one
production call, inside updateErrorMetrics, fired once per response
per reported Errors bucket. Reached from
RequestChannel.sendResponse and sendNoOpResponse.
core/src/test/scala/unit/kafka/network/SocketServerTest.scala:1196 —
one test.

Every Kafka broker process instantiates ~126 RequestMetrics groups at
startup (one per ApiKeys.name plus the special FetchConsumer,
FetchFollower, AddPartitionsToTxnVerification, and
ListClientMetricsResources/ListConfigResources names). Every one of
those pre-allocated 135 ErrorMeter objects eagerly. 17,010 is
already the product of those two numbers, so the savings number
(408,240 − 144 = 408,096 B = 398.53 KiB) is the full per-broker
figure — not per-group. In addition, there is ~3 MB of downstream
LinkedHashMap/Entry/String storage attached to the eager
ErrorMeters' tags maps that is also elided.

Fleet-scale context: at 1,000 brokers, ~400 MB of steady-state heap
reclaimed fleet-wide. Per broker the delta is small; its value is that
it is constant, always-on, and requires no tuning.

End-to-end impact (latency, throughput, pause time) not measured — see
PR #1 GC re-measurement showing this rig's consumer JVM is GC-unpressured
at 1-2 GB heap. This PR explicitly does not claim infrastructure-cost
impact beyond the stated footprint reduction.

Risks And Limitations

ConcurrentHashMap vs EnumMap read performance: CHM.get on a map of
≤135 entries is ~10-30 ns, same order of magnitude as EnumMap (which
uses array indexing). No measurable difference on the request-
completion hot path.
Extra memory per RequestMetrics instance: a ConcurrentHashMap
is heavier than an EnumMap (rough estimate +64-96 B per instance).
× 126 groups × = ~10 KB total overhead. Offset by ≥400 KB saved.
MBean registration now truly demand-driven. Dashboards that
pre-probe kafka.network:type=RequestMetrics,*error=* may see fewer
MBeans on freshly-started brokers (only those that have fired at
least once). This is the explicit intent. JMX MBean names for MBeans
that DO exist are unchanged.
The first markErrorMeter call for each (API, Error) pair pays a
CHM.computeIfAbsent cost (~100-300 ns on a small map) plus Yammer
MBean registration cost. This happens once per tuple per broker
lifetime and is not on the per-request critical path for already-
registered tuples. If a burst of never-before-seen error types
arrives simultaneously, the MBean registration lock (Yammer internal)
may briefly serialize them — no change from baseline since Yammer's
newMeter is already synchronized on the registry.

Test Plan

RequestMetricsTest (new, 7 tests): green on this branch, 6 FAIL
on trunk as expected (regression guard verified).
SocketServerTest (:core, exercises updateErrorMetrics): 61
tests, 0 failures, 2 skipped.
:server:test --tests org.apache.kafka.network.* (adjacent
network tests): green.
Idle-broker heap histogram: 17,010 → 6 ErrorMeter instances.
Idle-broker JMX inspection: MBean ObjectName format unchanged.
Follow-up (not blocking this PR): long-running broker under
a workload that actually fires diverse error types (e.g. fetch
replication failures, producer throttling). Expected behavior:
post-fix ErrorMeter count grows on demand up to the subset of
(API, Error) tuples actually observed in the workload, never
reaching 17,010.

Pre-PR Checklist

Reduces RequestMetrics$ErrorMeter instance count on an idle broker from 17,010 to 6 by replacing eager pre-population in the constructor with computeIfAbsent on first markErrorMeter call. Backing map changes from EnumMap to ConcurrentHashMap to preserve thread-safety on the lazy-init path. Public API unchanged. MBean ObjectName format preserved. New RequestMetricsTest locks in the lazy contract with 7 tests including a 32-thread contention check. Evidence: ~400 KB per broker steady-state heap reduction at idle; 6/7 tests fail on trunk as regression guard.

mashraf-222 · 2026-05-08T02:25:02Z

Why this is a win

The claim here is unusual for a perf PR: the measurement is the claim. We're counting objects in a heap histogram. 17,010 before, 6 after. No confidence intervals, no rig sensitivity, no inference chain from microbenchmark to production — rerun it tomorrow on any machine and you get the same number. Contrast with most allocation-reduction work, which claims a per-call byte save and hopes it surfaces against GC noise.

The per-broker saving is ~400 KB at idle. Small. What makes it worth shipping is that it's constant, always-on, and scales linearly with fleet size.

Rough production arithmetic:

Deployment size	Savings at steady state
100 brokers (mid-size org)	~40 MB heap reclaimed
1,000 brokers (large enterprise)	~400 MB
4,000+ brokers (LinkedIn-scale, reported publicly)	~1.6 GB

Not dramatic. But it's free (8 production lines, zero runtime cost, public API unchanged) and every broker restart pays it back forever.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(network): lazy ErrorMeter construction in RequestMetrics#3

perf(network): lazy ErrorMeter construction in RequestMetrics#3
mashraf-222 wants to merge 1 commit intotrunkfrom
codeflash/errormeter-lazy-20260508

mashraf-222 commented May 8, 2026

Uh oh!

mashraf-222 commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mashraf-222 commented May 8, 2026

perf(network): lazy ErrorMeter construction in RequestMetrics

Summary

What Changed

Why It Works

Why It Is Correct

Benchmark Methodology

Results

Primary: RequestMetrics$ErrorMeter instance count on idle broker

Secondary: MBean inventory (JMX ObjectName format preservation)

Secondary: broker startup time (INCONCLUSIVE)

Reproduction

Callers / Impact Scope

Risks And Limitations

Test Plan

Pre-PR Checklist

Uh oh!

mashraf-222 commented May 8, 2026

Why this is a win

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Primary: `RequestMetrics$ErrorMeter` instance count on idle broker