perf(operator): lookup table for Math.pow(10, n) in MathFunctions.round by mashraf-222 · Pull Request #3 · codeflash-ai/trino

mashraf-222 · 2026-04-27T16:00:03Z

Summary

Replace the per-row Math.pow(10, decimals) call in MathFunctions.round(double, long) and
roundReal(long, long) with a 19-entry bit-exact lookup table (POWERS_OF_TEN_DOUBLE).
Independent rerun measures +26% to +387% throughput across 10 BenchmarkRoundFunction
configurations at 30-sample JMH rigor. All 99.9% CIs non-overlapping. 51/51
TestMathFunctions pass. No regression in an unrelated-scalar regression check
(BenchmarkBigIntOperators, same module).

What Changed

core/trino-main/src/main/java/io/trino/operator/scalar/MathFunctions.java — one file,
21 lines (19+ / 2−). No public signatures changed, no new imports.
- New private static final double[] POWERS_OF_TEN_DOUBLE (19 entries: 10^0 .. 10^18).
- New private static double powerOfTen(long decimals) helper: bounds-checked lookup
  with a Math.pow(10, decimals) fallback for out-of-range inputs.
- Two call sites in round(double, long) and roundReal(long, long) swap
  Math.pow(10, decimals) → powerOfTen(decimals).

Why It Works

Math.pow(double, double) has IEEE-754 semantics and cannot be constant-folded by the
JIT when the exponent is a method parameter the compiler cannot prove is a compile-time
constant. In the SqlFunction dispatch path used by round(x, decimals), decimals is
supplied per call-site and the JIT therefore emits a real Math.pow call (an intrinsic
that routes through a slow path for non-special arguments). Replacing the call with a
19-entry bounds-checked array load removes that per-row cost entirely.

JVM-level effects observed:

The *Actual benchmarks (which go through MathFunctions.round) drop from
11–46M ops/s to ~54–55M ops/s — flat across decimals 0..4 on the After side, which
is the signature of the bottleneck having been fully removed (it no longer matters
which decimal is used).
The *Baseline benchmarks (which just call Math.round directly, source unchanged)
are stable within ±1.6% across branches — measurement environment is not drifting.

Why It's Correct

Bit-exact lookup values. Each literal POWERS_OF_TEN_DOUBLE[n] equals
Math.pow(10, n) bit-for-bit via Double.doubleToRawLongBits for n in [0, 18].
No precision change for any decimals in the lookup range.
Bounds-checked fallback. For decimals < 0 or decimals >= 19, control falls
through to Math.pow(10, decimals) — preserving the exact prior behavior (including
Double.POSITIVE_INFINITY / NaN / 0.0 edge cases).
Thread-safety. POWERS_OF_TEN_DOUBLE is static final and initialized at class
load with literal values. powerOfTen is a pure static method with no shared state.
Safe for concurrent invocation from multiple worker threads.
Allocation. Zero new allocations in the hot path (one array load replaces one
intrinsic call).
Tests. ./mvnw -pl core/trino-main test -Dtest='TestMathFunctions' →
51/51 pass, 0 failures. Covers integer/decimal/double/real rounding across
positive, negative, zero, large, and out-of-range decimal arguments (i.e., the
fallback path is exercised).
Style/static analysis. ./mvnw -pl core/trino-main validate → clean
(checkstyle + modernizer). No wildcard imports, braces on single-statement
conditionals, no @author.

Benchmark Methodology

Harness: project's own BenchmarkRoundFunction (JMH 1.37), unchanged. Inputs set per
@Setup via Math.random(); each benchmark method uses the JMH Blackhole convention
to prevent DCE.
Primary config: 3 forks × 8 warmup + 10 measurement × 500 ms, throughput (ops/s),
99.9% confidence intervals (JMH default). 30 samples per row.
JVM: Temurin 25.0.3 (OpenJDK 64-Bit Server VM, 25.0.3+9-LTS).
JVM args: -Xms2g -Xmx2g.
Host: shared AWS VM (Linux 6.17.0-1010-aws). No CPU pinning, no turbo-boost
control. See Risks.
Control: benchmark's own {double,float}Baseline variants — source NOT touched by
this change; they are the in-harness stability indicator.
Regression check harness: BenchmarkBigIntOperators (same module, unrelated scalar
ops) at @Fork(2) -wi 5 -i 8 -w 500ms -r 500ms.

Results

Primary — `BenchmarkRoundFunction.{double,float}Actual` (calls the change target)

Config (numberOfDecimals)	Before (ops/s)	After (ops/s)	Change
doubleActual 0	18,022,279 ± 483,750	54,830,598 ± 1,547,356	+204%
doubleActual 1	11,345,867 ± 218,445	55,088,195 ± 855,819	+386%
doubleActual 2	43,587,610 ± 759,410	54,873,971 ± 917,210	+26%
doubleActual 3	11,335,843 ± 199,692	54,899,798 ± 844,525	+384%
doubleActual 4	11,391,682 ± 210,533	55,461,154 ± 864,121	+387%
floatActual 0	17,639,185 ± 388,151	53,292,111 ± 912,942	+202%
floatActual 1	11,070,728 ± 210,508	53,125,045 ± 1,322,295	+380%
floatActual 2	42,601,056 ± 1,193,689	54,797,856 ± 1,024,909	+29%
floatActual 3	11,215,648 ± 215,000	54,488,734 ± 926,626	+386%
floatActual 4	11,131,120 ± 215,785	54,183,499 ± 870,480	+387%

All 10 rows: 99.9% CIs non-overlapping. Worst-case speedup +26%; best-case +387%.

Control — `BenchmarkRoundFunction.{double,float}Baseline` (source identical on both branches)

Config	Master	With change	Delta
doubleBaseline 0	20,851,999 ± 469,610	21,052,936 ± 700,633	+0.96%
doubleBaseline 1	13,101,400 ± 334,404	13,178,739 ± 212,654	+0.59%
doubleBaseline 2	53,772,971 ± 906,966	52,998,318 ± 942,592	−1.44%
doubleBaseline 3	13,107,369 ± 317,916	13,197,799 ± 283,491	+0.69%
doubleBaseline 4	13,058,875 ± 311,891	13,266,783 ± 273,155	+1.59%
floatBaseline 0	20,732,754 ± 434,061	20,912,909 ± 535,604	+0.87%
floatBaseline 1	13,048,808 ± 227,922	12,895,425 ± 323,583	−1.18%

Noise band ≈ ±1.6%. Measurement environment stable.

Regression check — `BenchmarkBigIntOperators` (unrelated scalar, same module)

16 samples/row, 99.9% CI:

Benchmark	Master	With change	Delta	CI overlap?
baseLineAdd	193,988,819 ± 12,334,681	194,253,152 ± 9,231,953	+0.14%	Yes
baseLineDivide	13,070,107 ± 656,337	13,181,312 ± 558,824	+0.85%	Yes
baseLineMultiply	128,693,285 ± 3,132,121	129,809,987 ± 3,397,608	+0.87%	Yes
baseLineNegate	350,956,869 ± 9,629,233	347,716,681 ± 13,075,593	−0.92%	Yes
baseLineSubtract	188,562,963 ± 6,570,374	188,298,235 ± 6,216,503	−0.14%	Yes
overflowChecksAdd	112,315,423 ± 5,760,297	113,182,751 ± 4,222,779	+0.77%	Yes
overflowChecksDivide	12,987,295 ± 528,168	13,100,536 ± 629,216	+0.87%	Yes
overflowChecksMultiply	70,903,932 ± 807,701	71,498,219 ± 1,267,001	+0.84%	Yes
overflowChecksNegate	219,961,306 ± 5,327,069	227,940,001 ± 5,262,004	+3.63%	Borderline (still within error bars)

8/9 within noise; the one +3.63% outlier is within its own error bar. No regression
attributable to this change.

Reproduction

# One-time environment (Trino requires Temurin/Oracle JDK 25; Ubuntu OpenJDK is
# rejected by the airbase enforcer)
curl -fsSL -o /tmp/temurin25.tar.gz 'https://api.adoptium.net/v3/binary/latest/25/ga/linux/x64/jdk/hotspot/normal/eclipse'
sudo mkdir -p /opt/temurin-25 && sudo tar -xzf /tmp/temurin25.tar.gz -C /opt/temurin-25 --strip-components=1
export JAVA_HOME=/opt/temurin-25 PATH=$JAVA_HOME/bin:$PATH

# Parent pom (one-time)
./mvnw -N install -DskipTests

# Build classpath + test classes for baseline
git checkout master
./mvnw -pl core/trino-main test-compile -q
./mvnw -pl core/trino-main dependency:build-classpath -Dmdep.outputFile=/tmp/cp.txt -Dmdep.includeScope=test -q
CP="core/trino-main/target/test-classes:core/trino-main/target/classes:$(cat /tmp/cp.txt)"

# Baseline run (expect ~12 min wall time at this rigor)
java -cp "$CP" -Xms2g -Xmx2g org.openjdk.jmh.Main \
  "io.trino.operator.scalar.BenchmarkRoundFunction" \
  -f 3 -wi 8 -i 10 -w 500ms -r 500ms -rf json -rff /tmp/before.json

# With change
git checkout perf/mathfunctions-round-pow10-lookup     # or this PR's head
./mvnw -pl core/trino-main test-compile -q
java -cp "$CP" -Xms2g -Xmx2g org.openjdk.jmh.Main \
  "io.trino.operator.scalar.BenchmarkRoundFunction" \
  -f 3 -wi 8 -i 10 -w 500ms -r 500ms -rf json -rff /tmp/after.json

Callers / Impact Scope

MathFunctions.round(double, long) and roundReal(long, long) are the SQL
round(x, decimals) implementations for double and real types. They are called
once per row when a query uses round(col, n) with a runtime-resolved (or column-valued)
decimals argument. The speedup applies to every such row in a scan.

When decimals is a literal that the planner can bind at compile time, the SQL engine
may short-circuit to a different code path — not measured here. This PR's win is
concretely on the general-purpose round(x, decimals) evaluator; end-to-end query-level
impact on a specific workload would need its own measurement.

Risks and Limitations

Shared-VM measurement environment. Benchmarks were run on an AWS VM without CPU
pinning or turbo-boost control. The per-config magnitude of the speedup (+26% to
+387%) dwarfs the control noise (±1.6%), so the direction and tier are robust, but a
reviewer running on a different host should expect the exact ratios to shift.
decimals 2 sees smaller gain (+26% / +29%). This is because the master path for
decimals == 2 is already fast (Math.pow(10, 2) == 100.0, which HotSpot sometimes
recognizes as a cheap case on some JIT heuristics). The lookup still wins.
decimals < 0 or ≥ 19 hit the fallback (same Math.pow as before). This is
documented in the helper but not separately benchmarked; the fallback preserves prior
behavior.
No end-to-end SQL query benchmark. Micro-benchmark evidence only.
No -prof perfnorm or -prof gc. Attribution is from the diff ("remove a
Math.pow call") and throughput numbers, not from instruction-level profiler output.

Test Plan

./mvnw -pl core/trino-main test -Dtest='TestMathFunctions' → 51/51 pass, 0
failures.
./mvnw -pl core/trino-main validate → checkstyle + modernizer clean.
BenchmarkRoundFunction at 30-sample rigor, control stable, non-overlapping CIs.
BenchmarkBigIntOperators unrelated-scalar regression check — no regression.
Reviewer reproducing the benchmark per the Reproduction section (not run on CI).

Disclosure

This change was drafted by a codeflash-agent autonomous optimization session and then
independently re-benchmarked before this PR was opened. The agent's reported speedups
(1.24×–5.03×) match the reviewer's reproduction (1.26×–4.87×) within 5% row-by-row, so
the numbers in this PR are the reviewer's 30-sample figures presented directly —
consistent with the agent's own report.

…athFunctions.round For double/real round(num, decimals), decimals is typically in [0, 18] but Math.pow(10, decimals) must be called per row because JIT cannot prove the argument is a compile-time constant. Precompute 10^n for n in [0, 18] as a static double[] and read via bounds-checked index. The lookup values are bit-exact matches of Math.pow(10, n) (verified via doubleToRawLongBits), so the behavior is unchanged. Negative or out-of-range decimals fall through to Math.pow(10, decimals). JMH BenchmarkRoundFunction (2 forks x 5 warmup x 10 measurement iterations, 500ms each, two independent baseline and optimized runs to rule out JIT artifact): decimals baseline optimized speedup double 0 18.8M ops/s 57.7M ops/s 3.07x double 1 11.5M ops/s 57.8M ops/s 5.03x double 2 46.0M ops/s 58.2M ops/s 1.26x double 3 11.7M ops/s 57.0M ops/s 4.87x double 4 11.8M ops/s 58.3M ops/s 4.94x float 0 18.6M ops/s 57.2M ops/s 3.07x float 1 11.8M ops/s 57.5M ops/s 4.87x float 2 44.9M ops/s 57.1M ops/s 1.27x float 3 11.8M ops/s 57.2M ops/s 4.85x float 4 11.7M ops/s 56.8M ops/s 4.85x 99% confidence intervals do not overlap. All 51 TestMathFunctions tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(operator): lookup table for Math.pow(10, n) in MathFunctions.round#3

perf(operator): lookup table for Math.pow(10, n) in MathFunctions.round#3
mashraf-222 wants to merge 1 commit intomasterfrom
perf/mathfunctions-round-pow10-lookup

mashraf-222 commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mashraf-222 commented Apr 27, 2026

Summary

What Changed

Why It Works

Why It's Correct

Benchmark Methodology

Results

Primary — BenchmarkRoundFunction.{double,float}Actual (calls the change target)

Control — BenchmarkRoundFunction.{double,float}Baseline (source identical on both branches)

Regression check — BenchmarkBigIntOperators (unrelated scalar, same module)

Reproduction

Callers / Impact Scope

Risks and Limitations

Test Plan

Disclosure

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Primary — `BenchmarkRoundFunction.{double,float}Actual` (calls the change target)

Control — `BenchmarkRoundFunction.{double,float}Baseline` (source identical on both branches)

Regression check — `BenchmarkBigIntOperators` (unrelated scalar, same module)