Skip to content

Add stopping criterion sample count#341

Open
oleksandr-pavlyk wants to merge 3 commits intoNVIDIA:mainfrom
oleksandr-pavlyk:add-stopping-criterion-sample-count
Open

Add stopping criterion sample count#341
oleksandr-pavlyk wants to merge 3 commits intoNVIDIA:mainfrom
oleksandr-pavlyk:add-stopping-criterion-sample-count

Conversation

@oleksandr-pavlyk
Copy link
Copy Markdown
Collaborator

This PR builds on top of #338, so it should be merged after #338 is merged.


This PR add stopping criterion sample-count with target-samples integral parameter, which allows users to collect deterministic number of samples.

### "sample-count" Stopping Criterion Parameters

* `--target-samples <count>`
  * Stop after at least `<count>` samples are collected.
  * Default is 100 samples.
  * The total number of collected samples is
    `max(--min-samples, --target-samples)`.
  * Applies to the most recent `--benchmark`, or all benchmarks if specified
    before any `--benchmark` arguments.

Together with --warmup-runs <count> this permits users to replicate their prior art of running fixed warm-up count and fixed timed runs, but still benefit from NVBench's benchmark reporting facilities, and more accurate GPU timing.

The sample-count stopping criterion is essentially the "fixed" custom stopping criterion from examples/custom_criterion.cu, but interaction with --min-samples has changed.

  • Previously, stopping criterion was applied once m_total_samples > m_min_samples, so running ./build/bin/nvbench.example.cpp20.custom_criterion --min-samples 50 would collect 51 samples.

  • This PR replaced that condition with m_total_samples >= m_min_samples instead.

    • Edge case is handled stdrel stopping criterion's do_is_finished() method to ensure m_noise_tracker.back() is safe to call.
  • The custom_criterion.cu example needs to be changed to implement an alternative stopping criterion in the future.

CLI option --warmup-runs implemented and documented.

The warm-up counts is enforced to always be positive.
This is necessary to ensure that JIT-ting has occurred,
and use of blocking kernel would not result in time-outs.

Test is option parser is added.
Because warm-up runs are executed without use of blocking kernel,
the blocking kernel was not jitted until actual measurements were
collected. The module loading cost incurred during the first run
shows as elevated CPU time noise value for the first measurement
as noted in NVIDIA#339

This PR adds `this->block_stream(); this->unblock_stream();` prior
to executing warm-up loop with use of blocking kernel disabled.

This ensures that blocking kernel is instantiated during the warm-up,
but it no other kernel is launched between its launch and stream sync
thus avoiding deadlocking.
--stopping-criterion sample-count --target-samples 100 would stop once
max(--min-samples, --target-samples) samples are collected
@oleksandr-pavlyk oleksandr-pavlyk marked this pull request as ready for review May 1, 2026 18:09
@oleksandr-pavlyk oleksandr-pavlyk self-assigned this May 5, 2026
@github-project-automation github-project-automation Bot moved this to Todo in CCCL May 5, 2026
@jrhemstad jrhemstad moved this from Todo to In Review in CCCL May 5, 2026
@oleksandr-pavlyk oleksandr-pavlyk added the type: enhancement New feature or request. label May 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: enhancement New feature or request.

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

2 participants