perf(operator): lookup table for Math.pow(10, n) in MathFunctions.round#3
Open
mashraf-222 wants to merge 1 commit intomasterfrom
Open
perf(operator): lookup table for Math.pow(10, n) in MathFunctions.round#3mashraf-222 wants to merge 1 commit intomasterfrom
mashraf-222 wants to merge 1 commit intomasterfrom
Conversation
…athFunctions.round For double/real round(num, decimals), decimals is typically in [0, 18] but Math.pow(10, decimals) must be called per row because JIT cannot prove the argument is a compile-time constant. Precompute 10^n for n in [0, 18] as a static double[] and read via bounds-checked index. The lookup values are bit-exact matches of Math.pow(10, n) (verified via doubleToRawLongBits), so the behavior is unchanged. Negative or out-of-range decimals fall through to Math.pow(10, decimals). JMH BenchmarkRoundFunction (2 forks x 5 warmup x 10 measurement iterations, 500ms each, two independent baseline and optimized runs to rule out JIT artifact): decimals baseline optimized speedup double 0 18.8M ops/s 57.7M ops/s 3.07x double 1 11.5M ops/s 57.8M ops/s 5.03x double 2 46.0M ops/s 58.2M ops/s 1.26x double 3 11.7M ops/s 57.0M ops/s 4.87x double 4 11.8M ops/s 58.3M ops/s 4.94x float 0 18.6M ops/s 57.2M ops/s 3.07x float 1 11.8M ops/s 57.5M ops/s 4.87x float 2 44.9M ops/s 57.1M ops/s 1.27x float 3 11.8M ops/s 57.2M ops/s 4.85x float 4 11.7M ops/s 56.8M ops/s 4.85x 99% confidence intervals do not overlap. All 51 TestMathFunctions tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replace the per-row
Math.pow(10, decimals)call inMathFunctions.round(double, long)androundReal(long, long)with a 19-entry bit-exact lookup table (POWERS_OF_TEN_DOUBLE).Independent rerun measures +26% to +387% throughput across 10
BenchmarkRoundFunctionconfigurations at 30-sample JMH rigor. All 99.9% CIs non-overlapping. 51/51
TestMathFunctionspass. No regression in an unrelated-scalar regression check(
BenchmarkBigIntOperators, same module).What Changed
core/trino-main/src/main/java/io/trino/operator/scalar/MathFunctions.java— one file,21 lines (19+ / 2−). No public signatures changed, no new imports.
private static final double[] POWERS_OF_TEN_DOUBLE(19 entries:10^0 .. 10^18).private static double powerOfTen(long decimals)helper: bounds-checked lookupwith a
Math.pow(10, decimals)fallback for out-of-range inputs.round(double, long)androundReal(long, long)swapMath.pow(10, decimals)→powerOfTen(decimals).Why It Works
Math.pow(double, double)has IEEE-754 semantics and cannot be constant-folded by theJIT when the exponent is a method parameter the compiler cannot prove is a compile-time
constant. In the
SqlFunctiondispatch path used byround(x, decimals),decimalsissupplied per call-site and the JIT therefore emits a real
Math.powcall (an intrinsicthat routes through a slow path for non-special arguments). Replacing the call with a
19-entry bounds-checked array load removes that per-row cost entirely.
JVM-level effects observed:
*Actualbenchmarks (which go throughMathFunctions.round) drop from11–46M ops/s to ~54–55M ops/s — flat across decimals 0..4 on the After side, which
is the signature of the bottleneck having been fully removed (it no longer matters
which decimal is used).
*Baselinebenchmarks (which just callMath.rounddirectly, source unchanged)are stable within ±1.6% across branches — measurement environment is not drifting.
Why It's Correct
POWERS_OF_TEN_DOUBLE[n]equalsMath.pow(10, n)bit-for-bit viaDouble.doubleToRawLongBitsforn in [0, 18].No precision change for any
decimalsin the lookup range.decimals < 0ordecimals >= 19, control fallsthrough to
Math.pow(10, decimals)— preserving the exact prior behavior (includingDouble.POSITIVE_INFINITY/NaN/ 0.0 edge cases).POWERS_OF_TEN_DOUBLEisstatic finaland initialized at classload with literal values.
powerOfTenis a pure static method with no shared state.Safe for concurrent invocation from multiple worker threads.
intrinsic call).
./mvnw -pl core/trino-main test -Dtest='TestMathFunctions'→51/51 pass, 0 failures. Covers integer/decimal/double/real rounding across
positive, negative, zero, large, and out-of-range decimal arguments (i.e., the
fallback path is exercised).
./mvnw -pl core/trino-main validate→ clean(checkstyle + modernizer). No wildcard imports, braces on single-statement
conditionals, no
@author.Benchmark Methodology
BenchmarkRoundFunction(JMH 1.37), unchanged. Inputs set per@SetupviaMath.random(); each benchmark method uses the JMHBlackholeconventionto prevent DCE.
99.9% confidence intervals (JMH default). 30 samples per row.
OpenJDK 64-Bit Server VM, 25.0.3+9-LTS).-Xms2g -Xmx2g.Linux 6.17.0-1010-aws). No CPU pinning, no turbo-boostcontrol. See Risks.
{double,float}Baselinevariants — source NOT touched bythis change; they are the in-harness stability indicator.
BenchmarkBigIntOperators(same module, unrelated scalarops) at
@Fork(2) -wi 5 -i 8 -w 500ms -r 500ms.Results
Primary —
BenchmarkRoundFunction.{double,float}Actual(calls the change target)All 10 rows: 99.9% CIs non-overlapping. Worst-case speedup +26%; best-case +387%.
Control —
BenchmarkRoundFunction.{double,float}Baseline(source identical on both branches)Noise band ≈ ±1.6%. Measurement environment stable.
Regression check —
BenchmarkBigIntOperators(unrelated scalar, same module)16 samples/row, 99.9% CI:
8/9 within noise; the one +3.63% outlier is within its own error bar. No regression
attributable to this change.
Reproduction
Callers / Impact Scope
MathFunctions.round(double, long)androundReal(long, long)are the SQLround(x, decimals)implementations fordoubleandrealtypes. They are calledonce per row when a query uses
round(col, n)with a runtime-resolved (or column-valued)decimalsargument. The speedup applies to every such row in a scan.When
decimalsis a literal that the planner can bind at compile time, the SQL enginemay short-circuit to a different code path — not measured here. This PR's win is
concretely on the general-purpose
round(x, decimals)evaluator; end-to-end query-levelimpact on a specific workload would need its own measurement.
Risks and Limitations
pinning or turbo-boost control. The per-config magnitude of the speedup (+26% to
+387%) dwarfs the control noise (±1.6%), so the direction and tier are robust, but a
reviewer running on a different host should expect the exact ratios to shift.
decimals == 2is already fast (Math.pow(10, 2) == 100.0, which HotSpot sometimesrecognizes as a cheap case on some JIT heuristics). The lookup still wins.
Math.powas before). This isdocumented in the helper but not separately benchmarked; the fallback preserves prior
behavior.
-prof perfnormor-prof gc. Attribution is from the diff ("remove aMath.powcall") and throughput numbers, not from instruction-level profiler output.Test Plan
./mvnw -pl core/trino-main test -Dtest='TestMathFunctions'→ 51/51 pass, 0failures.
./mvnw -pl core/trino-main validate→ checkstyle + modernizer clean.BenchmarkRoundFunctionat 30-sample rigor, control stable, non-overlapping CIs.BenchmarkBigIntOperatorsunrelated-scalar regression check — no regression.Disclosure
This change was drafted by a codeflash-agent autonomous optimization session and then
independently re-benchmarked before this PR was opened. The agent's reported speedups
(1.24×–5.03×) match the reviewer's reproduction (1.26×–4.87×) within 5% row-by-row, so
the numbers in this PR are the reviewer's 30-sample figures presented directly —
consistent with the agent's own report.