bench: first_last remove noisy benchmarks, add update_batch#21487
bench: first_last remove noisy benchmarks, add update_batch#21487alamb merged 7 commits intoapache:mainfrom
Conversation
|
cc @alamb |
| ) | ||
| b.iter_custom(|iters| { | ||
| // Every `evaluate` call mutates the accumulator, so prebuild `iters` accumulators | ||
| let mut accumulators: Vec<Box<dyn GroupsAccumulator>> = (0..iters) |
There was a problem hiding this comment.
Why are we creating more than one accumulator?
I think the important benchmark is a single accumuatlor and then merge multiple batches per iteration. This seems like it merges one batch per iteration across multiple accumulators?
There was a problem hiding this comment.
It's for scaling runs. One run of evaluate / update_bench / merge_bench is diminishingly fast, so multiple accumulators are created just to run a benchmark function on them. We cannot just run 1000 iterations for evaluate/... since it mutates the state.
However, for update_bench / merge_bench we can merge multiple batches for one bench run - I'll try this
There was a problem hiding this comment.
Why are we creating more than one accumulator?
I think the important benchmark is a single accumuatlor and then merge multiple batches per iteration. This seems like it merges one batch per iteration across multiple accumulators?
I've implemented these ideas, but the new update/merge benchmarks are still quite noisy, fluctuating by up to 10%. Still, the effect of UDF optimisation, like in the original PR, is clearly visible.
|
Thanks @theirix |
Which issue does this PR close?
Rationale for this change
Reliable benchmarks for GroupsAccumulator operations for FIRST_VALUE, LAST_VALUE
What changes are included in this PR?
As discussed in perf: optimise
first_value,last_valueaggregate function #21383 (comment), it's better to remove noisy fast benchmarks - doneAdded bench for Accumulator (allows for measuring one of the improvements in the "perf" PR) - reliable measurement
Added initial benchmarks for update_batch, merge_batch - they test heavy paths in first_last, but still unpredictable in performance. The gain ranges from -20% to 20%, while statistical significance is good (p=0.0), the running time is higher (on the order of microseconds), and the variance is low (less than 10% with microseconds).
I would suggest dropping bench (3), while keeping it in this PR for future reference.
Are these changes tested?
first_value,last_valueaggregate function #21383 with the baseline.Raw output:
Details
Benchmarking first_value evaluate_bench nulls=0%, filter=false, first(2): Collecting 100 samples in estimated 8.1994 s (10k ite first_value evaluate_bench nulls=0%, filter=false, first(2) time: [7.1169 µs 7.4442 µs 7.8068 µs] change: [−59.076% −56.116% −53.288%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) low mild 2 (2.00%) high mild 3 (3.00%) high severeBenchmarking first_value evaluate_bench nulls=0%, filter=false, all: Collecting 100 samples in estimated 8.8861 s (10k iteratio
first_value evaluate_bench nulls=0%, filter=false, all
time: [61.417 µs 62.884 µs 64.445 µs]
change: [+16.139% +18.982% +21.955%] (p = 0.00 < 0.05)
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
Benchmarking first_value update_bench nulls=0%, filter=false: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.5s, enable flat sampling, or reduce sample count to 50.
first_value update_bench nulls=0%, filter=false
time: [698.62 µs 715.94 µs 734.97 µs]
change: [+14.291% +18.719% +23.681%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
8 (8.00%) high mild
3 (3.00%) high severe
Benchmarking first_value merge_bench nulls=0%, filter=false: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.9s, enable flat sampling, or reduce sample count to 50.
first_value merge_bench nulls=0%, filter=false
time: [790.57 µs 803.35 µs 816.87 µs]
change: [+21.098% +22.993% +24.543%] (p = 0.00 < 0.05)
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
11 (11.00%) low mild
2 (2.00%) high mild
1 (1.00%) high severe
Benchmarking first_value evaluate_bench nulls=0%, filter=true, first(2): Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.5s, enable flat sampling, or reduce sample count to 60.
Benchmarking first_value evaluate_bench nulls=0%, filter=true, first(2): Collecting 100 samples in estimated 5.5210 s (5050 ite
first_value evaluate_bench nulls=0%, filter=true, first(2)
time: [6.9505 µs 7.2537 µs 7.5774 µs]
change: [−58.159% −56.529% −54.753%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) low mild
5 (5.00%) high mild
Benchmarking first_value evaluate_bench nulls=0%, filter=true, all: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.2s, enable flat sampling, or reduce sample count to 60.
Benchmarking first_value evaluate_bench nulls=0%, filter=true, all: Collecting 100 samples in estimated 6.1760 s (5050 iteratio
first_value evaluate_bench nulls=0%, filter=true, all
time: [61.186 µs 62.361 µs 63.591 µs]
change: [+22.124% +25.679% +29.909%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
2 (2.00%) high mild
4 (4.00%) high severe
Benchmarking first_value update_bench nulls=0%, filter=true: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.5s, enable flat sampling, or reduce sample count to 50.
first_value update_bench nulls=0%, filter=true
time: [1.0514 ms 1.0802 ms 1.1132 ms]
change: [+7.1670% +10.215% +13.276%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
7 (7.00%) high mild
first_value merge_bench nulls=0%, filter=true
time: [1.0923 ms 1.1098 ms 1.1277 ms]
change: [+3.0074% +5.0368% +7.0955%] (p = 0.00 < 0.05)
Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
Benchmarking first_value trivial_update_bench nulls=0%, ignore_nulls=false: Collecting 100 samples in estimated 5.0097 s (2.2M
first_value trivial_update_bench nulls=0%, ignore_nulls=false
time: [622.71 ns 631.10 ns 640.70 ns]
change: [−50.114% −49.416% −48.705%] (p = 0.00 < 0.05)
Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
8 (8.00%) high mild
Benchmarking first_value trivial_update_bench nulls=0%, ignore_nulls=true: Collecting 100 samples in estimated 5.0045 s (2.2M i
first_value trivial_update_bench nulls=0%, ignore_nulls=true
time: [679.38 ns 694.90 ns 712.37 ns]
change: [−43.205% −41.668% −39.912%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
Benchmarking first_value evaluate_bench nulls=90%, filter=false, first(2): Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.2s, enable flat sampling, or reduce sample count to 60.
Benchmarking first_value evaluate_bench nulls=90%, filter=false, first(2): Collecting 100 samples in estimated 5.2118 s (5050 i
first_value evaluate_bench nulls=90%, filter=false, first(2)
time: [7.4166 µs 7.8253 µs 8.2654 µs]
change: [−48.325% −45.555% −42.419%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
Benchmarking first_value evaluate_bench nulls=90%, filter=false, all: Collecting 100 samples in estimated 9.9570 s (10k iterati
first_value evaluate_bench nulls=90%, filter=false, all
time: [58.517 µs 59.706 µs 60.970 µs]
change: [−15.981% −11.746% −7.7891%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
Benchmarking first_value update_bench nulls=90%, filter=false: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.4s, enable flat sampling, or reduce sample count to 50.
first_value update_bench nulls=90%, filter=false
time: [780.07 µs 791.71 µs 804.95 µs]
change: [+5.9470% +8.0014% +10.066%] (p = 0.00 < 0.05)
Performance has regressed.
Benchmarking first_value merge_bench nulls=90%, filter=false: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.1s, enable flat sampling, or reduce sample count to 50.
first_value merge_bench nulls=90%, filter=false
time: [958.86 µs 970.42 µs 981.14 µs]
change: [+18.439% +20.316% +22.356%] (p = 0.00 < 0.05)
Performance has regressed.
Benchmarking first_value evaluate_bench nulls=90%, filter=true, first(2): Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.0s, enable flat sampling, or reduce sample count to 60.
Benchmarking first_value evaluate_bench nulls=90%, filter=true, first(2): Collecting 100 samples in estimated 6.0288 s (5050 it
first_value evaluate_bench nulls=90%, filter=true, first(2)
time: [6.8537 µs 7.1723 µs 7.5444 µs]
change: [−56.908% −54.593% −52.231%] (p = 0.00 < 0.05)
Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low mild
4 (4.00%) high mild
2 (2.00%) high severe
Benchmarking first_value evaluate_bench nulls=90%, filter=true, all: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.9s, enable flat sampling, or reduce sample count to 60.
Benchmarking first_value evaluate_bench nulls=90%, filter=true, all: Collecting 100 samples in estimated 5.8762 s (5050 iterati
first_value evaluate_bench nulls=90%, filter=true, all
time: [63.052 µs 64.334 µs 65.771 µs]
change: [−7.9370% −3.3294% +1.2769%] (p = 0.18 > 0.05)
No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) low mild
3 (3.00%) high mild
2 (2.00%) high severe
Benchmarking first_value update_bench nulls=90%, filter=true: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.5s, enable flat sampling, or reduce sample count to 50.
first_value update_bench nulls=90%, filter=true
time: [973.31 µs 987.93 µs 1.0051 ms]
change: [−14.982% −13.123% −11.211%] (p = 0.00 < 0.05)
Performance has improved.
first_value merge_bench nulls=90%, filter=true
time: [1.0484 ms 1.0733 ms 1.1015 ms]
change: [−13.327% −8.5896% −4.0916%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
7 (7.00%) high mild
5 (5.00%) high severe
Benchmarking first_value trivial_update_bench nulls=90%, ignore_nulls=false: Collecting 100 samples in estimated 5.0029 s (2.2M
first_value trivial_update_bench nulls=90%, ignore_nulls=false
time: [531.48 ns 540.09 ns 549.00 ns]
change: [−53.396% −51.782% −50.199%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high mild
Benchmarking first_value trivial_update_bench nulls=90%, ignore_nulls=true: Collecting 100 samples in estimated 5.0038 s (2.2M
first_value trivial_update_bench nulls=90%, ignore_nulls=true
time: [915.47 ns 940.51 ns 964.57 ns]
change: [−42.061% −40.291% −38.529%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
5 (5.00%) high mild
Benchmarking last_value evaluate_bench nulls=0%, filter=false, first(2): Collecting 100 samples in estimated 9.3836 s (10k iter
last_value evaluate_bench nulls=0%, filter=false, first(2)
time: [7.0199 µs 7.3045 µs 7.6053 µs]
change: [−77.228% −67.635% −58.431%] (p = 0.00 < 0.05)
Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
4 (4.00%) high mild
2 (2.00%) high severe
Benchmarking last_value evaluate_bench nulls=0%, filter=false, all: Collecting 100 samples in estimated 8.4206 s (10k iteration
last_value evaluate_bench nulls=0%, filter=false, all
time: [59.921 µs 61.048 µs 62.232 µs]
change: [+4.8794% +8.3094% +11.439%] (p = 0.00 < 0.05)
Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
Benchmarking last_value update_bench nulls=0%, filter=false: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.9s, enable flat sampling, or reduce sample count to 50.
last_value update_bench nulls=0%, filter=false
time: [700.36 µs 713.46 µs 726.83 µs]
change: [+9.3963% +11.898% +14.265%] (p = 0.00 < 0.05)
Performance has regressed.
Benchmarking last_value merge_bench nulls=0%, filter=false: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.9s, enable flat sampling, or reduce sample count to 50.
last_value merge_bench nulls=0%, filter=false
time: [836.97 µs 858.80 µs 884.97 µs]
change: [+23.796% +29.496% +37.261%] (p = 0.00 < 0.05)
Performance has regressed.
Found 19 outliers among 100 measurements (19.00%)
8 (8.00%) high mild
11 (11.00%) high severe
Benchmarking last_value evaluate_bench nulls=0%, filter=true, first(2): Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.3s, enable flat sampling, or reduce sample count to 60.
Benchmarking last_value evaluate_bench nulls=0%, filter=true, first(2): Collecting 100 samples in estimated 6.3338 s (5050 iter
last_value evaluate_bench nulls=0%, filter=true, first(2)
time: [7.3937 µs 7.8961 µs 8.4694 µs]
change: [−50.815% −47.152% −43.605%] (p = 0.00 < 0.05)
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
6 (6.00%) high mild
4 (4.00%) high severe
Benchmarking last_value evaluate_bench nulls=0%, filter=true, all: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.5s, enable flat sampling, or reduce sample count to 60.
Benchmarking last_value evaluate_bench nulls=0%, filter=true, all: Collecting 100 samples in estimated 6.4877 s (5050 iteration
last_value evaluate_bench nulls=0%, filter=true, all
time: [68.529 µs 72.235 µs 76.626 µs]
change: [−10.658% −4.0677% +2.9060%] (p = 0.26 > 0.05)
No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) high mild
2 (2.00%) high severe
last_value update_bench nulls=0%, filter=true
time: [1.0699 ms 1.0877 ms 1.1064 ms]
change: [−10.966% −8.9078% −6.9281%] (p = 0.00 < 0.05)
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
4 (4.00%) low mild
4 (4.00%) high mild
2 (2.00%) high severe
last_value merge_bench nulls=0%, filter=true
time: [1.2981 ms 1.3350 ms 1.3750 ms]
change: [+3.5348% +7.0744% +10.890%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
6 (6.00%) high mild
Benchmarking last_value trivial_update_bench nulls=0%, ignore_nulls=false: Collecting 100 samples in estimated 5.0073 s (2.3M i
last_value trivial_update_bench nulls=0%, ignore_nulls=false
time: [675.96 ns 691.10 ns 707.11 ns]
change: [−52.010% −51.201% −50.366%] (p = 0.00 < 0.05)
Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
1 (1.00%) low severe
11 (11.00%) low mild
4 (4.00%) high mild
1 (1.00%) high severe
Benchmarking last_value trivial_update_bench nulls=0%, ignore_nulls=true: Collecting 100 samples in estimated 5.0026 s (2.5M it
last_value trivial_update_bench nulls=0%, ignore_nulls=true
time: [722.68 ns 752.73 ns 786.26 ns]
change: [−47.665% −45.601% −43.316%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
Benchmarking last_value evaluate_bench nulls=90%, filter=false, first(2): Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.4s, enable flat sampling, or reduce sample count to 60.
Benchmarking last_value evaluate_bench nulls=90%, filter=false, first(2): Collecting 100 samples in estimated 6.4252 s (5050 it
last_value evaluate_bench nulls=90%, filter=false, first(2)
time: [7.1491 µs 7.5298 µs 7.9733 µs]
change: [−58.264% −56.205% −53.750%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe
Benchmarking last_value evaluate_bench nulls=90%, filter=false, all: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.7s, enable flat sampling, or reduce sample count to 60.
Benchmarking last_value evaluate_bench nulls=90%, filter=false, all: Collecting 100 samples in estimated 6.6840 s (5050 iterati
last_value evaluate_bench nulls=90%, filter=false, all
time: [61.176 µs 63.371 µs 65.812 µs]
change: [−6.7601% −3.4004% +0.0769%] (p = 0.05 > 0.05)
No change in performance detected.
Found 22 outliers among 100 measurements (22.00%)
11 (11.00%) low mild
3 (3.00%) high mild
8 (8.00%) high severe
Benchmarking last_value update_bench nulls=90%, filter=false: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.3s, enable flat sampling, or reduce sample count to 50.
last_value update_bench nulls=90%, filter=false
time: [998.11 µs 1.0286 ms 1.0609 ms]
change: [+6.6045% +9.8044% +13.224%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
7 (7.00%) high mild
Benchmarking last_value merge_bench nulls=90%, filter=false: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.9s, enable flat sampling, or reduce sample count to 50.
last_value merge_bench nulls=90%, filter=false
time: [1.1076 ms 1.1214 ms 1.1338 ms]
change: [+5.6096% +9.2265% +12.860%] (p = 0.00 < 0.05)
Performance has regressed.
Benchmarking last_value evaluate_bench nulls=90%, filter=true, first(2): Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.7s, enable flat sampling, or reduce sample count to 60.
Benchmarking last_value evaluate_bench nulls=90%, filter=true, first(2): Collecting 100 samples in estimated 6.7049 s (5050 ite
last_value evaluate_bench nulls=90%, filter=true, first(2)
time: [7.3496 µs 7.7714 µs 8.2713 µs]
change: [−58.463% −54.783% −50.403%] (p = 0.00 < 0.05)
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
6 (6.00%) high mild
3 (3.00%) high severe
Benchmarking last_value evaluate_bench nulls=90%, filter=true, all: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.6s, enable flat sampling, or reduce sample count to 60.
Benchmarking last_value evaluate_bench nulls=90%, filter=true, all: Collecting 100 samples in estimated 6.6016 s (5050 iteratio
last_value evaluate_bench nulls=90%, filter=true, all
time: [58.656 µs 59.702 µs 60.853 µs]
change: [+0.5598% +3.4200% +6.1524%] (p = 0.01 < 0.05)
Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
last_value update_bench nulls=90%, filter=true
time: [1.2175 ms 1.2303 ms 1.2426 ms]
change: [−0.2830% +1.3593% +3.0630%] (p = 0.12 > 0.05)
No change in performance detected.
Found 26 outliers among 100 measurements (26.00%)
13 (13.00%) low severe
1 (1.00%) low mild
8 (8.00%) high mild
4 (4.00%) high severe
last_value merge_bench nulls=90%, filter=true
time: [1.3002 ms 1.3176 ms 1.3346 ms]
change: [−10.464% −7.8770% −5.2989%] (p = 0.00 < 0.05)
Performance has improved.
Benchmarking last_value trivial_update_bench nulls=90%, ignore_nulls=false: Collecting 100 samples in estimated 5.0064 s (2.3M
last_value trivial_update_bench nulls=90%, ignore_nulls=false
time: [533.70 ns 539.66 ns 545.06 ns]
change: [−63.723% −62.553% −61.383%] (p = 0.00 < 0.05)
Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
4 (4.00%) low severe
5 (5.00%) low mild
7 (7.00%) high mild
3 (3.00%) high severe
Benchmarking last_value trivial_update_bench nulls=90%, ignore_nulls=true: Collecting 100 samples in estimated 5.0065 s (1.6M i
last_value trivial_update_bench nulls=90%, ignore_nulls=true
time: [1.4710 µs 1.4893 µs 1.5120 µs]
change: [−34.994% −33.485% −31.932%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
cargo bench --bench first_last -- --baseline main4 1158.42s user 40.19s system 140% cpu 14:11.29 total
Are there any user-facing changes?