Skip to content

profiler development tool#317

Merged
mjreno merged 7 commits into
modflowpy:developfrom
mjreno:profile
May 19, 2026
Merged

profiler development tool#317
mjreno merged 7 commits into
modflowpy:developfrom
mjreno:profile

Conversation

@mjreno
Copy link
Copy Markdown
Contributor

@mjreno mjreno commented May 18, 2026

A lightweight write-performance profiling suite for flopy4, comparing write times across flopy4 list, array ASCII, and array NetCDF formats against flopy3 baselines. Scripts cover three test MODFLOW 6 models spanning a range of grid sizes, package types, and data densities. Near-term, the tool is intended to support the IO optimization efforts allowing targeted, iterative profiling of the flopy4 write path

To run locally and create output summaries (needs modflow6-largetestmodels local path for full testing):

pixi run -e dev python docs/profile/run_all.py \
    --runs 5 \
    --models-root ../modflow6-largetestmodels \
    --output docs/profile/results/results.json \
    --report docs/profile/results/report.md

Current summary:

Sparse list packages (WEL, CHD — few active cells)
  flopy4 wins, consistently 3–8×. Holds across all models: frenchman-flat (~0.20s vs 0.86s), test1000 sparse scenario
  (~0.22s vs 1.93s), and test1005 (~0.73s vs 4.74s). The flopy4 list write path is simply faster for small row counts.

Dense array packages (Rcha, CONSTANT)
  flopy4 wins, roughly 8×. Both detect the uniform value and write a single CONSTANT line per period, but flopy4
  resolves that path significantly faster (~0.24s vs 2.0s for 582K cells).

Dense array packages (Rcha, INTERNAL)
  flopy4 wins, roughly 7×. For heterogeneous per-cell values both write full INTERNAL blocks; flopy4's numpy
  backed write path is faster (~0.82s vs 5.7s).

Dense list packages (Rch with hundreds of thousands of cells)
  flopy3 wins, but the gap narrowed significantly after targeted fixes. flopy4 improved from ~350s to ~22–25s for
  582K cells × 3 periods (now completes under the 30s threshold); flopy3 is still faster at ~12s (~2× gap vs the
  original ~30× gap). The correct answer for dense data is the array variant regardless.

Grid-based ASCII (WELG/CHDG) for sparse data
  Comparable to list for test1000 (1.74M sparse elements, ~0.22s each). Slower than list for frenchman-flat (7.5M
  elements, 1.04s vs 0.20s) because you're writing an enormous mostly-NODATA array for a handful of active cells.
  NetCDF structured (see below) outperforms ASCII grid in both cases.

Array NetCDF — flopy4 only, no flopy3 equivalent
  netcdf_structured is the fastest write format across all models and scenarios tested — faster than both list ASCII
  and array ASCII.

netcdf_mesh behaves differently: comparable to structured on small models (~0.28s on frenchman-flat) but
  1.7-1.8s on test1000 — 13–20× slower than structured on the same data. The mesh format writes a separate
  layered geometry file whose cost scales with model size; structured does not.

  Bottom line
  - netcdf_structured is the fastest write format tested and is recommended for array packages
  - For sparse packages, flopy4 list is faster than flopy3 list across the board
  - For dense data, use array packages (not list) — flopy4 wins clearly; NetCDF structured wins further
  - The one scenario where flopy3 beats flopy4 is dense list input, which is the wrong tool regardless of framework; after fixes the gap is now ~2× rather than ~30×

@mjreno mjreno marked this pull request as draft May 19, 2026 01:50
@mjreno mjreno changed the title profile sanity check profiler development tool May 19, 2026
@mjreno mjreno marked this pull request as ready for review May 19, 2026 16:51
@mjreno mjreno merged commit 5119a70 into modflowpy:develop May 19, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant