perf(clients): dense array lookup for Errors.forCode by mashraf-222 · Pull Request #2 · codeflash-ai/kafka

mashraf-222 · 2026-05-07T13:38:49Z

perf(clients): dense array lookup for Errors.forCode

Summary: Replace HashMap<Short, Errors> with dense Errors[]
for Errors.forCode. Per-call cost −70% to −80% on 4/4 JMH
configurations with non-overlapping 95% CIs. 279 non-test callers.
End-to-end impact not claimed (Amdahl-bounded below single-box noise floor).

Replaces the HashMap<Short, Errors> lookup table inside
org.apache.kafka.common.protocol.Errors.forCode(short) with a
dense Errors[] indexed by (code − MIN_CODE). The change is
backed by 3 equivalence tests added in this PR plus the 7 existing
ErrorsTest cases (10 total, all pass), a JMH benchmark
(ErrorsForCodeBenchmark) whose four input-distribution
configurations each show a −70% to −80% per-call reduction with
non-overlapping 95% CIs, and a post-refactor profile showing the
method drops out of broker and consumer CPU samples. This PR is
scoped as LIBRARY PRIMITIVE: 279 non-test callers in the
codebase. End-to-end throughput impact was measured at N=5 on a
single-box lz4 workload and is inconclusive due to CI overlap,
as the Amdahl ceiling (~0.32% consumer / ~0.11% broker) sits well
below this hardware's single-box noise floor.

Tier and claim

LIBRARY PRIMITIVE — Errors.forCode(short): per-call cost
reduced by −70% to −80% (non-overlap 95% CIs, JMH with 2 forks ×
10 measurement × 5 s + 5 × 5 s warmup per fork) across 4 input
distributions. 279 non-test downstream callers in the codebase.
End-to-end measured inconclusive at N=5 on lz4 1024-byte
reference workload; CIs overlap (see "End-to-end" section).

What changed

clients/src/main/java/org/apache/kafka/common/protocol/Errors.java:425-439, 526-539
— replaces Map<Short, Errors> CODE_TO_ERROR = new HashMap<>()
with dense static final Errors[] CODE_TO_ERROR_ARR plus
MIN_CODE/MAX_CODE short fields; rewrites forCode(short) to
bounds-check then single array load, fallback unchanged.
clients/src/test/java/org/apache/kafka/common/protocol/ErrorsTest.java:91-138
— adds 3 new tests covering round-trip identity, out-of-range
fallback, and forCode(e.code()) == e idempotence.
jmh-benchmarks/src/main/java/org/apache/kafka/jmh/common/ErrorsForCodeBenchmark.java
— new JMH benchmark, four @Param distributions (allZero,
small, mixed, miss), unrolled 4× per op, Blackhole-
consumed.
codeflash-evidence/20260507-cand3-errors-forcode-delta.md,
codeflash-evidence/20260507-cand3-errors-forcode-e2e.md —
per-candidate evidence files committed alongside the code (an
additional correction commit fixes an arithmetic error in the
initial evidence: N=5 half-widths now use the correct t=2.776
(df=4), raw data rows match the session's txt files, and caller
count is 279 as re-verified on this tree).

Public API: unchanged. Errors.forCode(short) signature,
return type, and observable behaviour on every valid and invalid
input are preserved.

On-disk / on-wire formats: unchanged.

JVM floor: unchanged (clients module continues to target Java 11).

Why it works

Commit `f8cba68` — `refactor: Errors.forCode dense array lookup`

Mechanism. The old forCode path does
CODE_TO_ERROR.get(Short.valueOf(code)). This executes:

Autobox the primitive short to a Short (cached in the Short
autobox for codes in −128..127; most Kafka codes fall in this
range, so this step is often free, but the cache lookup still
consumes a load).
HashMap.hash(key) — a method call on the Short key.
HashMap.getNode(hash, key) — table indexing, bucket probe,
key.equals(probe.key) on each probe.
Integer.equals dispatch through the key object's vtable
(Short is final so this is effectively monomorphic, but the
JVM still emits the check).

The new path is a primitive bounds check plus one array load:

if (code >= MIN_CODE && code <= MAX_CODE) {
    Errors error = CODE_TO_ERROR_ARR[code - MIN_CODE];
    if (error != null) return error;
}
// fallback log.warn + UNKNOWN_SERVER_ERROR

The JMH numbers show allZero 11.854 ns/op → 2.941 ns/op per 4-call
unroll, i.e. ~2.96 ns/call → ~0.74 ns/call. Those ns/call figures are
consistent with a picture of the baseline doing roughly 7–8 cycles
of chained loads and branches at 3 GHz, and the feature doing
roughly 2–3 cycles for a masked-compare bounds check plus an
L1-resident array load. The cycle estimates are anchored in the
measured ns/op rather than an independent measurement.

Post-refactor profile (profiles/post-refactor/cand3/DELTA.md):
on a 30-second CPU sample of both broker and consumer under the
reference workload, Errors.forCode went from 5/1159 = 0.43%
consumer samples and 2/3295 = 0.06% broker samples down to 0
samples on both JVMs. At the default 9.9 ms sampling interval, a
≤1 ns call site has a vanishing probability of being sampled at
all — consistent with the observed JMH reduction.

Why it's correct.

Public API: Errors.forCode(short) signature and return type unchanged.
Output for every valid input: the static initializer iterates
Errors.values() and stores each enum at index code − MIN_CODE,
so for any enum e, forCode(e.code()) returns e. This is a
bijection on the defined range.
Output for every invalid input: codes outside [MIN_CODE, MAX_CODE] short-circuit to the fallback path (log.warn("Unexpected error code: {}.", code) + return UNKNOWN_SERVER_ERROR), exactly
matching the prior behaviour. Codes inside the range that don't
map to an enum leave a null slot; forCode detects the null
and falls through to the same fallback.
Duplicate-code guard: the static initializer still throws
ExceptionInInitializerError if two enum entries would land at
the same array index, preserving the pre-existing guard.
Log behaviour: the fallback log.warn(...) message and
returned UNKNOWN_SERVER_ERROR sentinel are byte-identical to
before.
Exception identity / null behaviour / thread safety / determinism: unchanged.
forCode is a pure read from a static final reference; no
synchronisation, no ordering observable to callers.

Tests: clients/src/test/java/org/apache/kafka/common/protocol/ErrorsTest.java
now has 10 @test methods (7 prior + 3 new). The three new tests
cover the contract this refactor touches:

testForCodeReturnsMatchingForEveryDefinedCode — iterates every
Errors.values() and asserts forCode(e.code()) == e.
testForCodeReturnsUnknownServerErrorForOutOfRangeCodes —
probes Short.MIN_VALUE, Short.MAX_VALUE, and just-past-edge
values, asserting fallback identity.
testForCodeIsIdentityInverseOfCode — forCode(forCode(c).code()) == forCode(c).

All 10 tests pass on the refactor commit (output at
/home/ubuntu/code/kafka/clients/build/test-results/test/TEST-org.apache.kafka.common.protocol.ErrorsTest.xml).

Benchmark

JMH methodology

JMH version: 1.37 (inside kafka-jmh-benchmarks-4.4.0-SNAPSHOT shadow jar)
JVM: OpenJDK 25.0.2 (Ubuntu build 25.0.2+10-Ubuntu-124.04)
Host: 4 vCPU AWS (x86_64), Ubuntu 24.04
Config flags: -f 2 -wi 5 -w 5 -i 10 -r 5 — 2 forks × (5 × 5 s warmup + 10 × 5 s measurement)
Benchmark: org.apache.kafka.jmh.common.ErrorsForCodeBenchmark.forCode
Harness specifics:
- Four Errors.forCode(short) results per measured op (unrolled 4×), consumed via Blackhole to defeat dead-code elimination.
- Codes drawn from a 4096-entry preloaded shuffled buffer populated at @Setup (seeded Random(0xC0DEC0DEL)) so per-call cost is dominated by the lookup itself and no loop-inside-loop constant folding is possible.
- Four input distributions via @Param:
  - allZero — every call is NONE (code 0); the common happy path.
  - small — uniformly random over Errors.values().
  - mixed — 80% NONE / 10% UNKNOWN_SERVER_ERROR / 10% random other (realistic broker shape).
  - miss — codes outside the defined range; worst-case fallback.

JMH results

distribution	baseline ns/op (4 calls)	feature ns/op (4 calls)	Δ%	CI overlap
allZero	11.854 ± 0.299	2.941 ± 0.059	−75.19%	NO
small	9.947 ± 0.358	2.993 ± 0.057	−69.91%	NO
mixed	11.647 ± 0.162	3.005 ± 0.038	−74.20%	NO
miss	9.722 ± 0.153	1.959 ± 0.025	−79.85%	NO

Source: codeflash-evidence/20260507-cand3-errors-forcode-delta.md
(committed in this PR).

All four configurations: non-overlap 95% CIs (verified by
not (base_mean − base_err > feat_mean + feat_err or feat_mean − feat_err > base_mean + base_err)).
All scoreError/score ratios below 5% (max = 3.60% on baseline small).

End-to-end

E2E was measured at N=5 interleaved on a single-box KRaft broker
(4 vCPU, 1 GB heap, G1GC default, broker + producer + consumer on
the same host — CPU-contended), lz4 compression, 5 M × 1024 B
records. Full protocol and raw runs are in
codeflash-evidence/20260507-cand3-errors-forcode-e2e.md (committed
in this PR) and the raw per-run txt files under the session directory.

Half-widths below are 95% CIs via t-distribution with df=4 (t=2.776)
at N=5, recomputed directly from the raw records/sec values.

metric	trunk (N=5) mean ± hw	feature (N=5) mean ± hw	Δ%	CI overlap
Producer records/sec	188,153 ± 18,254	190,046 ± 12,252	+1.01%	YES
Consumer records/sec	267,182 ± 57,828	240,174 ± 62,266	−10.11%	YES

Source for all E2E numbers above: raw per-run records/sec files cited in
codeflash-evidence/20260507-cand3-errors-forcode-e2e.md. Means and
half-widths were recomputed from the raw data with t=2.776 (df=4, N=5)
and verified in this PR's integrity check.

E2E measurement inconclusive at N=5; 95% CIs overlap on both
producer and consumer.

This PR is not claiming an E2E throughput improvement. By Amdahl,
the measured Errors.forCode share of CPU on this workload was
~0.43% cumulative consumer and ~0.15% broker (reported in
codeflash-evidence/20260507-cand3-errors-forcode-delta.md under the
Amdahl section); a 75% reduction there caps the theoretical E2E at
~0.32% consumer / ~0.11% broker. N=5 on this single-box setup has
half-widths of 6.5–25.9%, so the Amdahl-predicted signal is several
orders of magnitude below the noise floor and cannot be distinguished.

The consumer point estimate of −10.11% is directional-negative
inside a wide noise band. No mechanism for a real consumer
regression is plausible (the refactor is strictly a same-inputs
same-outputs faster lookup), and the trunk/feature raw sequences
show the variance is driven by single-box CPU-contention jitter,
not by the refactor. The evidence file
codeflash-evidence/20260507-cand3-errors-forcode-e2e.md discusses
this in more detail under "Consumer (−10.11%, CI overlap)".

Callers and impact scope

`Errors.forCode(short)`

Verified non-test caller count: 279 Java call sites outside
/test/ and outside /jmh-benchmarks/.

rg -n "Errors\.forCode" --type java | grep -v '/test/' | grep -v '/jmh-benchmarks/' | wc -l
# → 279

The method sits on virtually every response-parsing path across
clients, broker, KRaft, and coordinators. Hot-path examples
verified by direct rg -n:

raft/src/main/java/org/apache/kafka/raft/KafkaRaftClient.java:953, 973, 1170, 1182, 1346, 1358 (KRaft vote/fetch response parsing; 14 total in this file)
clients/src/main/java/org/apache/kafka/clients/NetworkClient.java:1029 (ApiVersions response parsing)
clients/src/main/java/org/apache/kafka/clients/consumer/internals/AbstractFetch.java:208 (per-partition fetch error translation)
clients/src/main/java/org/apache/kafka/common/requests/FetchResponse.java:94, 134 (fetch response top-level + partition error codes)
clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java:1373, 1523, 1554 (offset-commit + group join error paths)
clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerHeartbeatRequestManager.java:188
core/src/main/scala/kafka/server/KafkaApis.scala:1575, 4162, 4228 (broker API dispatch)
transaction-coordinator/src/main/java/org/apache/kafka/coordinator/transaction/RPCProducerIdManager.java:172
group-coordinator/src/main/java/org/apache/kafka/coordinator/group/GroupCoordinatorService.java:846, 1601, 2026

Reproduction

Each command is copy-paste runnable on a machine with JDK 25 and
Gradle installed. Wall-time estimates assume a 4 vCPU host (they
scale roughly linearly with cores for the benchmark, sub-linearly
for gradle).

# Clone and prepare (~1 minute).
git clone https://github.com/codeflash-ai/kafka.git
cd kafka
git fetch origin codeflash/errors-forcode-dense-array-20260507-pr
BASE=94b6886b12bf050ee4075584293355923e0223ac   # trunk
HEAD=67dde51                                     # PR tip

# JMH baseline (~11 minutes: 4 configs × 2 forks × (5+10)×5 s).
git checkout $BASE
# The benchmark class is new on this PR branch. To measure baseline,
# copy ONLY the benchmark file from HEAD while keeping trunk's Errors.java:
git checkout $HEAD -- jmh-benchmarks/src/main/java/org/apache/kafka/jmh/common/ErrorsForCodeBenchmark.java
./gradlew :jmh-benchmarks:clean :jmh-benchmarks:shadowJar
java -jar jmh-benchmarks/build/libs/kafka-jmh-benchmarks-*-all.jar \
  ".*ErrorsForCodeBenchmark.*" \
  -f 2 -wi 5 -w 5 -i 10 -r 5 \
  -rf json -rff /tmp/baseline.json
git checkout -- .  # restore clean trunk

# JMH feature (~11 minutes).
git checkout $HEAD
./gradlew :jmh-benchmarks:clean :jmh-benchmarks:shadowJar
java -jar jmh-benchmarks/build/libs/kafka-jmh-benchmarks-*-all.jar \
  ".*ErrorsForCodeBenchmark.*" \
  -f 2 -wi 5 -w 5 -i 10 -r 5 \
  -rf json -rff /tmp/feature.json

# Tests (~1 minute).
./gradlew :clients:checkstyleMain :clients:checkstyleTest :clients:spotbugsMain \
          :clients:test --tests "*ErrorsTest*"

E2E reproduction (optional; results will be noise-dominated at N=5
on a single-box setup, as the evidence above documents):

# Fresh KRaft broker on the trunk jar (baseline) or the feature jar:
git checkout $BASE   # or $HEAD
./gradlew :clients:jar :core:jar
rm -rf /tmp/kraft-combined-logs
UUID=$(bin/kafka-storage.sh random-uuid)
bin/kafka-storage.sh format -t $UUID -c config/server.properties --standalone
bin/kafka-server-start.sh config/server.properties &

bin/kafka-topics.sh --create --topic t --partitions 8 --replication-factor 1 \
    --bootstrap-server localhost:9092

# 500k warm-up producer run
bin/kafka-producer-perf-test.sh --topic t --num-records 500000 --record-size 1024 \
    --throughput -1 \
    --producer-props acks=1 linger.ms=5 batch.size=65536 \
      compression.type=lz4 buffer.memory=134217728 \
      bootstrap.servers=localhost:9092

# 5M-record measurement run
bin/kafka-producer-perf-test.sh --topic t --num-records 5000000 --record-size 1024 \
    --throughput -1 \
    --producer-props acks=1 linger.ms=5 batch.size=65536 \
      compression.type=lz4 buffer.memory=134217728 \
      bootstrap.servers=localhost:9092

# Consumer read (fresh group per run)
bin/kafka-consumer-perf-test.sh --topic t --messages 5000000 \
    --bootstrap-server localhost:9092 \
    --group e2e-$(date +%s%N) --timeout 600000

Repeat N=5 times per side, interleaved A-B-A-B-A-B-A-B-A-B.

Risks and limitations

Static initialiser work at class-load. The initialiser now
iterates Errors.values() twice (once to compute min/max, once
to populate the array) instead of once. This is ~270 array loads
at class-load, run exactly once per Errors classloader — an
irrelevant one-off cost.
Memory. Adds one Errors[] of size MAX_CODE − MIN_CODE + 1
to the Errors class's static state. At the current range
[−1, 133] this is 135 reference slots ≈ < 300 bytes including
array header. Retained for the life of the JVM. The prior
HashMap it replaces held the same entries with larger
per-entry overhead (HashMap.Node + autoboxed Short per
key), so this is a small net reduction.
Future code range. If a future Kafka release adds an error
code outside the current [−1, 133] range, the array grows to
accommodate it, with any gap slots left null and resolved via
the out-of-range fallback path (same behaviour as today). There
is no hard upper bound enforced at code time, so in the worst
case a code like Short.MAX_VALUE would allocate a 32 k-slot
array (~128 KB). This is an eventuality worth noting; if it
becomes a concern, the fallback could be reinstated as a
HashMap lookup for codes beyond a hard cap, preserving the
dense array for the common range.
Thread safety. The array reference is static final,
populated in the class initialiser, read without
synchronisation. This is the same pattern as the prior
HashMap field (static final Map) and is safe for concurrent
reads.
Measurement caveats. Measured on OpenJDK 25.0.2 on a 4-vCPU
AWS host. E2E was measured on the same host with broker +
producer + consumer sharing CPU (see the End-to-end section);
N=5 is insufficient to detect a sub-1% Amdahl-predicted E2E
signal, and the half-widths in the E2E table reflect this. JMH
results on non-HotSpot JVMs have not been measured.

Test plan

./gradlew :clients:test --tests "*ErrorsTest*" — 10 pass, 0 fail (10 total: 7 prior + 3 new in this PR)
./gradlew :clients:checkstyleMain :clients:checkstyleTest — pass
./gradlew :clients:spotbugsMain — pass
JMH microbench — raw JSON in codeflash-evidence/20260507-cand3-errors-forcode-delta.md with backing JSON files retained in the session directory
E2E — raw runs in codeflash-evidence/20260507-cand3-errors-forcode-e2e.md (N=5 single-box; result inconclusive as documented)

Target repo

This PR targets the fork codeflash-ai/kafka on the trunk branch,
NOT upstream apache/kafka. Opening an upstream PR would be an
explicit separate decision by the user.

Adds three tests that define the behavioral contract of Errors.forCode: - testForCodeReturnsMatchingForEveryDefinedCode asserts round-trip identity for every code present in Errors.values(): forCode(e.code()) == e. - testForCodeReturnsUnknownServerErrorForOutOfRangeCodes probes codes outside the defined range (including Short.MIN/MAX and just-past-edge values) and asserts the fallback to UNKNOWN_SERVER_ERROR. Codes that happen to be defined are skipped so the test stays robust as new errors are added. - testForCodeIsIdentityInverseOfCode asserts forCode composed with Errors#code is idempotent. These are behavior-preserving tests against the existing HashMap-based implementation; they lock the contract before any refactor of the lookup path.

Replaces the HashMap<Short, Errors> CODE_TO_ERROR with a dense Errors[] indexed by (code - MIN_CODE). Error codes are a near-contiguous short range (currently -1 to 133); MIN_CODE=-1 so UNKNOWN_SERVER_ERROR lands at index 0. Any gaps in the range remain null in the array, and forCode bounds-checks before indexing, then falls back to UNKNOWN_SERVER_ERROR with the same log.warn behaviour as the prior implementation. Why it works - Removes the HashMap.get() + HashMap.getNode() + Integer.equals() chain observed in consumer profiles (cumulative ~0.4% of consumer CPU and ~0.15% of broker CPU on an lz4 reference workload). - Removes the per-call Short autobox that would otherwise happen if the HashMap.get(short) callsite were devirtualized under heavy polymorphic load; the array path is allocation-free. - The array entries live together in a small contiguous region of memory (< 280 bytes at current MAX_CODE=133), so hot calls hit L1. Why it's correct - For every Errors enum value e, the new forCode(e.code()) returns e (indexing by (code - MIN_CODE) is a bijection on the defined range and the static initializer places each enum at its own index, retaining the original duplicate-code check). - For codes outside [MIN_CODE, MAX_CODE] the code short-circuits to the log.warn + UNKNOWN_SERVER_ERROR fallback, preserving the prior behaviour for unknown codes. - For codes inside the range that do not correspond to any defined enum, the array slot is null and the same fallback runs. - ExceptionInInitializerError still fires on duplicate codes (Errors loaded under the same ClassLoader) because we now guard against a pre-existing non-null slot at the same index. Contract verified by the equivalence tests committed in the previous commit: testForCodeReturnsMatchingForEveryDefinedCode, testForCodeReturnsUnknownServerErrorForOutOfRangeCodes, and testForCodeIsIdentityInverseOfCode. No behaviour change beyond the lookup mechanism. Public API is unchanged. No new allocations introduced; one Errors[] reference is retained at class-init for the life of the JVM.

@setup

Four distributions approximate real-world usage: - allZero: every call is NONE (code 0); the common happy path. - small: uniformly random over Errors.values(); exercises the table. - mixed: 80/10/10 NONE / UNKNOWN_SERVER_ERROR / random; realistic. - miss: codes outside the defined range; worst-case fallback. The harness preloads a shuffled code buffer at @setup and unrolls the measured loop x4 so the per-call cost dominates loop overhead. All results are consumed via Blackhole to prevent dead-code elimination. Config: @fork(2) @WarmUp(5x5s) @measurement(10x5s) — the default "full-confidence" Kafka profile used by prior candidate benchmarks in this repo.

…eep)

…(279) The e2e evidence file committed in the previous commit used t=3.182 (df=3) when computing N=5 half-widths; the correct t-value for N=5 is 2.776 (df=4). The CI-overlap verdict is unchanged by this correction (both producer and consumer 95% CIs still overlap with either t-value), but the exact half-width numbers in the table were over-estimated by a factor of ~1.146. This commit replaces those numbers with half-widths recomputed directly from the raw records/sec txt files in the session directory. This commit also: - Corrects the raw data rows for the trunk and feature consumer sequences so they match the actual per-run txt files (prior values were misattributed). - Updates the caller-count claim (274 -> 279) in both evidence files to match a fresh grep of the current tree: `rg -n "Errors\.forCode" --type java | grep -v '/test/' | grep -v '/jmh-benchmarks/' | wc -l` -> 279. No code or test changes — only evidence-file corrections.

mashraf-222 added 6 commits May 7, 2026 12:49

evidence: Candidate 3 — Errors.forCode dense array lookup (LIB PRIM k…

c628f63

…eep)

evidence: Candidate 3 E2E measurement (N=5 interleaved, single-box)

3ea4fd8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(clients): dense array lookup for Errors.forCode#2

perf(clients): dense array lookup for Errors.forCode#2
mashraf-222 wants to merge 6 commits intotrunkfrom
codeflash/errors-forcode-dense-array-20260507-pr

mashraf-222 commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mashraf-222 commented May 7, 2026

perf(clients): dense array lookup for Errors.forCode

Tier and claim

What changed

Why it works

Commit f8cba68 — refactor: Errors.forCode dense array lookup

Benchmark

JMH methodology

JMH results

End-to-end

Callers and impact scope

Errors.forCode(short)

Reproduction

Risks and limitations

Test plan

Target repo

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Commit `f8cba68` — `refactor: Errors.forCode dense array lookup`

`Errors.forCode(short)`