Skip to content

Simple adaptive integration#658

Merged
krzywon merged 42 commits into
masterfrom
ticket-535-adaptive-integration
May 15, 2026
Merged

Simple adaptive integration#658
krzywon merged 42 commits into
masterfrom
ticket-535-adaptive-integration

Conversation

@pkienzle
Copy link
Copy Markdown
Contributor

@pkienzle pkienzle commented Jul 30, 2025

Ready for review

This replaces PR #608 using a simple heuristic based on qr to set the number of integration points. (See #248)

Accuracy is usually comparable to a 10000 point gaussian integration. The target is 1e-10 difference for small shapes (20 nm), and 2e-5 for large shapes (20 μm), though it isn't always achieved. For example, the following has a much as 3.5% error over some of its range:

$ python -m sasmodels.compare background=0 core_shell_cylinder -ngauss=0,10000 -engine=single,double! -random=83174 -nq=1000 -pars -neval=10
Randomize using -random=83174
scale: 0.291548
background: 0
sld_core: 8.92736
sld_shell: 11.2834
sld_solvent: 10.7719
radius: 291.224
thickness: 33528.1
length: 1.53983
GPU[32] t=11.76 ms, intensity=36122148864
DLL[64] t=357.14 ms, intensity=36122229862
|GPU[32]-DLL[64]|            max:1.888e+03  median:8.888e-03  98%:1.430e+03  rms:3.090e+02  zero-offset:+8.237e+01
|(GPU[32]-DLL[64])/DLL[64]|  max:3.448e-02  median:5.121e-06  98%:2.561e-03  rms:1.933e-03  zero-offset:+2.920e-04

Because we include a 20 point gaussian integration scheme, speed is frequently faster than the fixed 76 point gaussian integration in master, at least for small shapes. For large shapes it can be several times slower than the fixed scheme, though the increase in accuracy easily justifies the cost.

I've added explore/check_adaptive.py to systematically check all models with difference aspect ratios (rod, disk, cube) for both small and large shapes. Because the cost for a 10000 point gaussian with nested integration is so high accuracy checks are limited to a few points per model at high q.

The grid size for 2D adaptive integration is now limited to 100000 points and the outer loop is limited to 500 points, so the following are possible: (500x76, 500x20, 76x500, 76x76, 76x20, 20x5000, 20x500, 20x76, 20x20)

Latex triaxial ellipsoid model (see example/simul_fit.py) is now acceptable, albeit much slower than the pure 76x76 grid. Accuracy is improved compared to 76x76. Because 20x20 grids are available speed is usually much faster, at least for CPU models. GPU may still need some tuning.

Status

@pkienzle
Copy link
Copy Markdown
Contributor Author

Example of bad triaxial ellipsoid (20% error):

$ python -m sasmodels.compare background=0 triaxial_ellipsoid -ngauss=0,10000 -engine=single,single! -nq=30 -random=716856 -pars
Randomize using -random=716856
scale: 0.00343363
background: 0
sld: 11.2141
sld_solvent: 10.9297
radius_equat_minor: 41.5349
radius_equat_major: 9142.92
radius_polar: 74.0436
GPU[32] t=58.31 ms, intensity=33
DLL[32] t=12646.33 ms, intensity=33
|GPU[32]-DLL[32]|            max:1.941e-03  median:1.907e-06  98%:1.849e-03  rms:6.002e-04  zero-offset:+2.745e-04
|(GPU[32]-DLL[32])/DLL[32]|  max:1.884e-01  median:1.179e-06  98%:1.810e-01  rms:5.891e-02  zero-offset:+2.319e-02

The fixed 76 point integration scheme works better for this example (0.3% error).

Maybe it is worth exploring Lebedev and other surface quadrature schemes for these nested integrals. It is messy, though, because not all of them are of the form ∫∫ F(q) sin(θ) dφ dθ.

@butlerpd
Copy link
Copy Markdown
Member

butlerpd commented Oct 21, 2025

This was briefly discussed at today's fortnightly call and tagged as of interest to the upcoming camp. Question is whether it provides a minimal change to provide a reasonable speedup. It is noted that this PR not only adds the new adaptive integation it changes all the model files that currently use the GaussXX methods with this one. Probably would have been cleaner as two separate PRs?

Also at issue is what to do with the integration speedup already proposed a few years earlier and sitting in #608
Michael Wagner agreed to look at this.

NOTE: there are conflicts that will need to be resolved before this can be merged

@DrPaulSharp DrPaulSharp force-pushed the ticket-535-adaptive-integration branch from bffeaf0 to 615df71 Compare November 3, 2025 16:18
@pkienzle
Copy link
Copy Markdown
Contributor Author

This works well for rotationally symmetric shapes that only use 1D integrals.

Performance is unsatisfactory on shapes such as triaxial ellipsoid that need 2D integrals.

I could revert changes for those models until we've had a chance to explore other schemes such as Lebedev or Fibonacci.

@pkienzle
Copy link
Copy Markdown
Contributor Author

List of shapes with 2D integrals:

  • triaxial_ellipsoid
  • elliptical_cylinder, core_shell_bicelle_elliptical[_belt_rough]
  • parallelepiped, core_shell_parallelepiped, rectangular_prism, hollow_rectangular_prism[_thin_walls]
  • [bcc|fcc|sc]_paracrystal
  • barbell, capped_cylinder
  • superball, octehedron_truncated, nanoprisms
  • pringle

For these shapes the computational cost is quadratic in the number of integration points, so it is not feasible to fit large shapes accurately.

Consider returning NaN for q values that require more than a million evaluations to get better than 3e-3 accuracy. If these q are dropped from the residuals calculation the fit can still proceed for the low q points but the high q points will be ignored. This may end up biasing the fit toward large shapes since the estimated log likelihood will be reduced.

Triaxial ellipsoid, the five rectangular prisms and the three elliptical cylinders should be reasonably accurate for dimensions below 1 μm, though they can take several seconds per evaluation. [I only tested triaxial ellipsoid, parallelepiped and elliptical cylinder; the others follow the same code patterns so they are probably good but should still be tested.]

@butlerpd
Copy link
Copy Markdown
Member

would unrolling the integral to distribute on GPU's help the speed?

@pkienzle pkienzle mentioned this pull request Apr 13, 2026
@pkienzle
Copy link
Copy Markdown
Contributor Author

would unrolling the integral to distribute on GPU's help the speed?

Yes, but not much. With 15000 cores and 150 q points evaluated in parallel we could potentially see a 100x improvement over the current speed. For a 1 μm cube this would turn a 5 s evaluation into a 0.05 s evaluation. But cost is growing as (qr)² or worse, so a 10 μm cube would be back at 5 s again. We need better algorithms for USAXS/USANS calculations.

@pkienzle
Copy link
Copy Markdown
Contributor Author

... except that USAXS/USANS will be at lower q, so in practice it shouldn't be a problem.

The issue is with slit resolution, which pulls from a very high q values. With $q^4$ fall off the large q values don't contribute much to the resolution integral, so it didn't matter that they weren't computed accurately. This PR will make the resolution calculation take too long for accuracy that it doesn't need.

A couple of options:

  • limit the number of gaussian points when computing q values far above the nominal q
  • replace I(q) with a power law function when computing q values far above the nominal q
  • add an additional function for each model ∫I(q)dq from q0 to infinity to use in slit resolution calculations

All of these will require icky code in the interface between resolution function and model calculations.

Given that it'll break USAXS/USANS, I don't think we should merge this PR until we figure out how to handle slit resolution.

@pkienzle
Copy link
Copy Markdown
Contributor Author

Model integration is now limited to fewer than 200000 (θ, φ) grid points per q value.

python -m explore/check_adaptive.py systematically checks all models for speed and accuracy using rod, disk and cube aspect ratios. Target evaluation time is less than 2 s for 201 points. Also checking that speed is not worse than 2x compared to the current 76-point gaussian (it is frequently faster, but sometimes much slower).

Small models target the SANS range (size < 20 nm) with relative error ε < 1e-10 over q in [0.001, 1].

Large models target the USANS range (size < 20 μm) with relative error ε < 5e-5 for q < 0.002 and ε < 0.2 for q < 0.1 (slit resolution correction).

Performance for most models seems acceptable, though pringle could use some work.

Here's the results for the current run, showing where the models to not achieve the target performance:

== Speed and accuracy tests for all adaptive integration models ==
* target evaluation time is 2 s (running on a mac M2 chip)
*     q in [1e-5, 1] with 40 points per decade for 201 points total
*     warns if the adaptive model is 2x slower than a 76-point gaussian
* large models tested against 5000 point gaussian integration
*     q=[5e-4, 1e-3, 2e-3] with tol=1e-5 relative (measured q)
*     q=[0.01, 0.1] with tol=0.2 relative (slit resolution limits)
* small models tested against 5000 point gaussian integration
*     q in [1e-3, 1] with 1 points per decade
!!!! These tests run very slowly --- don't use as part of CI !!!!



=== small rods: a=20 b=40 c=200 ===
stacked_disks background=0 thick_core=18.0 thick_layer=1.0 radius=20.0 n_stacking=10 sigma_d=1.0 sld_core=0 sld_layer=1 sld_solvent=0
  qvalue [1.00000000000000e-01]
  target [1.86796449661114e-02]
  actual [1.86796449683030e-02]
  relerr [1.17328227786191e-10]
capped_cylinder background=0 radius=10.0 radius_cap=20.0 length=194.64101615137756 sld=1 sld_solvent=0
  qvalue [1.00000000000000e+00]
  target [1.60757256812765e-05]
  actual [1.60759828022906e-05]
  relerr [1.59943643754886e-05]
pringle background=0 radius=20.0 thickness=200 alpha=0.5 beta=0.5 sld=1 sld_solvent=0
  qvalue [1.00000000000000e+00]
  target [8.94852694171906e-05]
  actual [8.90702124575539e-05]
  relerr [4.63827133046482e-03]
triaxial_ellipsoid background=0 sld=1 sld_solvent=0 radius_equat_minor=20 radius_equat_major=40 radius_polar=200
  qvalue [1.00000000000000e+00]
  target [5.55750011411858e-05]
  actual [5.52135920592964e-05]
  relerr [6.50308725988708e-03]


=== small disks: a=180 b=200 c=40 ===
capped_cylinder background=0 radius=90.0 radius_cap=100.0 length=0 sld=1 sld_solvent=0
  qvalue [1.00000000000000e+00]
  target [2.75569192104127e-05]
  actual [2.75569192194922e-05]
  relerr [3.29481259999601e-10]
core_shell_bicelle_elliptical_belt_rough background=0 radius=81.0 x_core=1.123456790123457 thick_rim=9.0 thick_face=9.0 length=22.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0 sigma=0.9
  qvalue [1.00000000000000e+00]
  target [1.57161805207350e-04]
  actual [1.57161805250852e-04]
  relerr [2.76794683857568e-10]
core_shell_bicelle_elliptical background=0 radius=81.0 x_core=1.123456790123457 thick_rim=9.0 thick_face=9.0 length=22.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0
  qvalue [1.00000000000000e+00]
  target [2.35631493977926e-04]
  actual [2.35631494069685e-04]
  relerr [3.89416052174107e-10]
core_shell_parallelepiped background=0 sld_core=0 sld_a=1 sld_b=1 sld_c=1 sld_solvent=0 length_a=162.0 length_b=182.0 length_c=22.0 thick_rim_a=9.0 thick_rim_b=9.0 thick_rim_c=9.0
  qvalue [1.00000000000000e+00]
  target [2.42326923165659e-04]
  actual [2.42333667936019e-04]
  relerr [2.78333512098069e-05]
elliptical_cylinder background=0 radius_minor=90.0 axis_ratio=5.0 length=40 sld=1 sld_solvent=0
  qvalue [1.00000000000000e+00]
  target [6.12554809035540e-05]
  actual [6.12554809913063e-05]
  relerr [1.43256288968885e-09]
hollow_rectangular_prism_thin_walls background=0 sld=1 sld_solvent=0 length_a=180 b2a_ratio=1.1111111111111112 c2a_ratio=0.2222222222222222
  qvalue [1.00000000000000e-01 1.00000000000000e+00]
  target [2.75787141463227e-02 3.07578043101286e-04]
  actual [2.75787141752361e-02 3.07450050065018e-04]
  relerr [1.04839812898515e-09 4.16131902580191e-04]
hollow_rectangular_prism background=0 sld=1 sld_solvent=0 length_a=180 b2a_ratio=1.1111111111111112 c2a_ratio=0.2222222222222222 thickness=9.0
  qvalue [1.00000000000000e-01 1.00000000000000e+00]
  target [1.60102657069033e-01 4.21190511154568e-04]
  actual [1.60102657153591e-01 4.21207536067096e-04]
  relerr [5.28147592961804e-10 4.04209308561229e-05]
parallelepiped background=0 sld=1 sld_solvent=0 length_a=180 length_b=200 length_c=40
  qvalue [1.00000000000000e-01 1.00000000000000e+00]
  target [6.54471658075875e-01 6.73870249394712e-05]
  actual [6.54471658146500e-01 6.74084672293595e-05]
  relerr [1.07910887192754e-10 3.18196120210968e-04]
pringle background=0 radius=100.0 thickness=40 alpha=0.09999999999999998 beta=0.09999999999999998 sld=1 sld_solvent=0
  qvalue [1.00000000000000e-01 1.00000000000000e+00]
  target [2.18733343077075e-01 6.83106606401085e-05]
  actual [2.18755844943364e-01 6.84997042754681e-05]
  relerr [1.02873507866670e-04 2.76741043913438e-03]
rectangular_prism background=0 sld=1 sld_solvent=0 length_a=180 b2a_ratio=1.1111111111111112 c2a_ratio=0.2222222222222222
  qvalue [1.00000000000000e-01 1.00000000000000e+00]
  target [6.54471658079301e-01 6.73870249688441e-05]
  actual [6.54471658146502e-01 6.74084672274739e-05]
  relerr [1.02679635813960e-10 3.18195656206445e-04]


=== small cubes: a=200 b=200 c=200 ===
core_shell_bicelle_elliptical background=0 radius=90.0 x_core=1.0 thick_rim=10.0 thick_face=10.0 length=180.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0
  qvalue [1.00000000000000e+00]
  target [1.02492851673532e-04]
  actual [1.02492851684422e-04]
  relerr [1.06252347489489e-10]
core_shell_parallelepiped background=0 sld_core=0 sld_a=1 sld_b=1 sld_c=1 sld_solvent=0 length_a=180.0 length_b=180.0 length_c=180.0 thick_rim_a=10.0 thick_rim_b=10.0 thick_rim_c=10.0
  qvalue [1.00000000000000e+00]
  target [6.98345673054726e-05]
  actual [7.00984945279586e-05]
  relerr [3.77932065264299e-03]
hollow_rectangular_prism_thin_walls background=0 sld=1 sld_solvent=0 length_a=200 b2a_ratio=1.0 c2a_ratio=1.0
  qvalue [1.00000000000000e-01 1.00000000000000e+00]
  target [1.04904095454152e-01 8.56430787049287e-04]
  actual [1.04904094452217e-01 8.56144929699485e-04]
  relerr [9.55096178891463e-09 3.33777526595498e-04]
hollow_rectangular_prism background=0 sld=1 sld_solvent=0 length_a=200 b2a_ratio=1.0 c2a_ratio=1.0 thickness=10.0
  qvalue [1.00000000000000e-01 1.00000000000000e+00]
  target [9.82008425165961e-01 2.42978141719527e-04]
  actual [9.82008421794487e-01 2.42955329124303e-04]
  relerr [3.43324355830293e-09 9.38874380336769e-05]
parallelepiped background=0 sld=1 sld_solvent=0 length_a=200 length_b=200 length_c=200
  qvalue [1.00000000000000e-01 1.00000000000000e+00]
  target [8.65241732031019e-02 1.16766869284687e-05]
  actual [8.65241716425501e-02 1.16947856496101e-05]
  relerr [1.80360212514061e-08 1.54998770218593e-03]
pringle background=0 radius=100.0 thickness=200 alpha=0.001 beta=0.001 sld=1 sld_solvent=0
  qvalue [1.00000000000000e+00]
  target [2.87134261111185e-05]
  actual [2.86855009494164e-05]
  relerr [9.72547183815323e-04]
rectangular_prism background=0 sld=1 sld_solvent=0 length_a=200 b2a_ratio=1.0 c2a_ratio=1.0
  qvalue [1.00000000000000e-01 1.00000000000000e+00]
  target [8.65241732033467e-02 1.16766869304376e-05]
  actual [8.65241716425503e-02 1.16947856493809e-05]
  relerr [1.80388478401046e-08 1.54998751367380e-03]


=== big rods: a=1000 b=2000 c=200000 ===
stacked_disks background=0 thick_core=18000.0 thick_layer=1000.0 radius=1000.0 n_stacking=10 sigma_d=1000.0 sld_core=0 sld_layer=1 sld_solvent=0
  qvalue [5.00000000000000e-04 1.00000000000000e-03 2.00000000000000e-03]
  target [2.96917938719141e+04 2.60914421213193e+04 1.42434747458794e+04]
  actual [2.96507625109331e+04 2.60745206784249e+04 1.42398416282128e+04]
  relerr [1.38190912809033e-03 6.48543795156304e-04 2.55072426594440e-04]
barbell background=0 radius=500.0 radius_bell=1000.0 length=196267.94919243112 sld=1 sld_solvent=0
  qvalue [2.00000000000000e-03 1.00000000000000e-01]
  target [9.96690490148662e+04 3.69725105775608e-02]
  actual [9.84699740235045e+04 8.97299419744450e-03]
  relerr [1.20305651876225e-02 7.57306332264860e-01]
capped_cylinder background=0 radius=500.0 radius_cap=1000.0 length=199732.05080756888 sld=1 sld_solvent=0
  qvalue [2.00000000000000e-03 1.00000000000000e-01]
  target [9.54509605532157e+04 3.75843968790354e-02]
  actual [9.50819794058331e+04 8.75212961204810e-03]
  relerr [3.86566196132542e-03 7.67133961462342e-01]
pringle background=0 radius=1000.0 thickness=200000 alpha=0.5 beta=0.5 sld=1 sld_solvent=0
  qvalue [5.00000000000000e-04 1.00000000000000e-03 2.00000000000000e-03 1.00000000000000e-02 1.00000000000000e-01]
  target [5.29285175436828e+05 2.25418078593777e+05 6.00092800236531e+04 5.04753936370161e+02 8.73018584913834e-02]
  actual [5.29442866649065e+05 2.26829405624553e+05 6.03425846714318e+04 2.53341235858443e+02 6.97729804115215e-03]
  relerr [2.97932418204737e-04 6.26093097581434e-03 5.55421840834110e-03 4.98089628224999e-01 9.20078470702422e-01]
! ** pringle is slow: 21.9 s for 201 points in [1e-05, 1.0]
triaxial_ellipsoid background=0 sld=1 sld_solvent=0 radius_equat_minor=1000 radius_equat_major=2000 radius_polar=200000
  qvalue [5.00000000000000e-04 1.00000000000000e-03 2.00000000000000e-03]
  target [2.76304345549817e+06 9.28680416872306e+05 1.11806895058781e+05]
  actual [2.77380253296110e+06 8.56140969167502e+05 1.35838503298750e+05]
  relerr [3.89392263864717e-03 7.81102372645142e-02 2.14938517229500e-01]
! ** gauss-76 is better than adaptive for {model_name} at some q values
  qvalue [2.00000000000000e-03]
  relerr [2.14938517229500e-01]
  rel-76 [1.16121876367224e-03]


=== big disks: a=180000 b=200000 c=1000 ===
barbell background=0 radius=90000.0 radius_bell=100000.0 length=0 sld=1 sld_solvent=0
  qvalue [2.00000000000000e-03]
  target [7.27129821730912e+02]
  actual [7.41190530349738e+02]
  relerr [1.93372740308673e-02]
capped_cylinder background=0 radius=90000.0 radius_cap=100000.0 length=0 sld=1 sld_solvent=0
  qvalue [2.00000000000000e-03]
  target [1.72595023131101e+03]
  actual [1.75785915885211e+03]
  relerr [1.84877448736521e-02]
elliptical_cylinder background=0 radius_minor=90000.0 axis_ratio=200.0 length=1000 sld=1 sld_solvent=0
  qvalue [5.00000000000000e-04 1.00000000000000e-03 2.00000000000000e-03]
  target [2.45889265741404e+06 5.75220887915049e+05 1.10114203285981e+05]
  actual [2.45711448458698e+06 5.75304384192084e+05 1.10121261853637e+05]
  relerr [7.23160005252941e-04 1.45155154809525e-04 6.41022451726544e-05]
! ** elliptical_cylinder is slow: 2.3 s for 201 points in [1e-05, 1.0]
hollow_rectangular_prism background=0 sld=1 sld_solvent=0 length_a=180000 b2a_ratio=1.1111111111111112 c2a_ratio=0.005555555555555556 thickness=9000.0
  qvalue [1.00000000000000e-01]
  target [9.50759234453635e-04]
  actual [7.44395165727090e-04]
  relerr [2.17051868915199e-01]
parallelepiped background=0 sld=1 sld_solvent=0 length_a=180000 length_b=200000 length_c=1000
  qvalue [1.00000000000000e-01]
  target [4.41200774157535e-04]
  actual [3.38306441963541e-04]
  relerr [2.33214305642298e-01]
pringle background=0 radius=100000.0 thickness=1000 alpha=0.09999999999999998 beta=0.09999999999999998 sld=1 sld_solvent=0
  qvalue [5.00000000000000e-04 1.00000000000000e-03 2.00000000000000e-03]
  target [1.37488167338956e+02 1.98707910686554e+01 2.57324811716904e+00]
  actual [1.25957204092523e+02 1.65063126716671e+01 2.33684264436572e+00]
  relerr [8.38687682701073e-02 1.69317788374091e-01 9.18704540094648e-02]
! ** pringle is slow: 39.3 s for 201 points in [1e-05, 1.0]


=== big cubes: a=200000 b=200000 c=200000 ===
stacked_disks background=0 thick_core=18000.0 thick_layer=1000.0 radius=100000.0 n_stacking=10 sigma_d=1000.0 sld_core=0 sld_layer=1 sld_solvent=0
  qvalue [5.00000000000000e-04]
  target [9.02716549915439e+03]
  actual [9.02052754443698e+03]
  relerr [7.35331009277812e-04]
core_shell_parallelepiped background=0 sld_core=0 sld_a=1 sld_b=1 sld_c=1 sld_solvent=0 length_a=180000.0 length_b=180000.0 length_c=180000.0 thick_rim_a=10000.0 thick_rim_b=10000.0 thick_rim_c=10000.0
  qvalue [1.00000000000000e-03 1.00000000000000e-01]
  target [6.98345673054720e+04 2.36105090032928e-04]
  actual [7.00984945279583e+04 6.73462507203931e-05]
  relerr [3.77932065264824e-03 7.14761546601978e-01]
hollow_rectangular_prism_thin_walls background=0 sld=1 sld_solvent=0 length_a=200000 b2a_ratio=1.0 c2a_ratio=1.0
  qvalue [1.00000000000000e-03 1.00000000000000e-01]
  target [8.56430787049287e+02 1.13816015343293e-01]
  actual [8.56144929699486e+02 5.55607842901889e-02]
  relerr [3.33777526595066e-04 5.11836852462231e-01]
hollow_rectangular_prism background=0 sld=1 sld_solvent=0 length_a=200000 b2a_ratio=1.0 c2a_ratio=1.0 thickness=10000.0
  qvalue [1.00000000000000e-03 1.00000000000000e-01]
  target [2.42978141719527e+05 9.87931292106438e-04]
  actual [2.42955329124302e+05 4.03638319935387e-04]
  relerr [9.38874380342003e-05 5.91430777463521e-01]
parallelepiped background=0 sld=1 sld_solvent=0 length_a=200000 length_b=200000 length_c=200000
  qvalue [1.00000000000000e-03 1.00000000000000e-01]
  target [1.16766869284687e+04 2.57757352046674e-05]
  actual [1.16947856496101e+04 7.97730885917925e-06]
  relerr [1.54998770218589e-03 6.90510908967797e-01]
pringle background=0 radius=100000.0 thickness=200000 alpha=0.001 beta=0.001 sld=1 sld_solvent=0
  qvalue [5.00000000000000e-04 1.00000000000000e-03 2.00000000000000e-03 1.00000000000000e-01]
  target [5.31579388633462e+05 5.27862463514218e+04 4.82947216221278e+03 6.11226958747059e-05]
  actual [4.61289228961633e+05 5.06119805022712e+04 4.70845458421073e+03 3.86634456557850e-06]
  relerr [1.32228903480484e-01 4.11900068566250e-02 2.50581376053736e-02 9.36744534738716e-01]
! ** pringle is slow: 34.6 s for 201 points in [1e-05, 1.0]
rectangular_prism background=0 sld=1 sld_solvent=0 length_a=200000 b2a_ratio=1.0 c2a_ratio=1.0
  qvalue [1.00000000000000e-03 1.00000000000000e-01]
  target [1.16766869304377e+04 3.55638693182220e-05]
  actual [1.16947856493809e+04 1.77632602060632e-05]
  relerr [1.54998751367394e-03 5.00525096211571e-01]

@pkienzle
Copy link
Copy Markdown
Contributor Author

GPU performance on mac M2 is bad compared to cpu for small models, though large models see some benefit.

Speed check cpu vs gpu on mac ocl, after setting gpu_speed_check and speed_only in explore/check_adaptive.py:

$ python explore/check_adaptive.py          
== Speed and accuracy tests for all adaptive integration models ==
* target evaluation time is 2 s (running on a mac M2 chip)
*     q in [1e-5, 1] with 40 points per decade for 201 points total
*     warns if the adaptive model is 2x slower than a 76-point gaussian
* large models tested against 5000 point gaussian integration
*     q=[5e-4, 1e-3, 2e-3] with tol=1e-5 relative (measured q)
*     q=[0.01, 0.1] with tol=0.2 relative (slit resolution limits)
* small models tested against 5000 point gaussian integration
*     q in [1e-3, 1] with 1 points per decade
!!!! These tests run very slowly --- don't use as part of CI !!!!



=== small rods: a=20 b=40 c=200 ===
core_shell_bicelle background=0 radius=18.0 thick_rim=2.0 thick_face=2.0 length=196.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0
  cpu double: 0.5 ms  gpu single: 3.9 ms  [*** 6.1x slow down]
core_shell_cylinder background=0 radius=18.0 thickness=2.0 length=196.0 sld_core=0 sld_shell=1 sld_solvent=0
  cpu double: 0.3 ms  gpu single: 3.9 ms  [*** 12.0x slow down]
core_shell_ellipsoid background=0 radius_equat_core=18.0 x_core=5.444444444444445 thick_shell=2.0 x_polar_shell=1 sld_core=0 sld_shell=1 sld_solvent=0
  cpu double: 0.2 ms  gpu single: 2.2 ms  [*** 11.2x slow down]
cylinder background=0 sld=1 sld_solvent=0 radius=20.0 length=200
  cpu double: 0.2 ms  gpu single: 2.6 ms  [*** 11.1x slow down]
ellipsoid background=0 sld=1 sld_solvent=0 radius_equatorial=20.0 radius_polar=100.0
  cpu double: 0.1 ms  gpu single: 1.6 ms  [*** 12.8x slow down]
flexible_cylinder background=0 length=2000 kuhn_length=200 radius=40 sld=1 sld_solvent=0
  cpu double: 0.1 ms  gpu single: 0.7 ms  [*** 7.1x slow down]
hollow_cylinder background=0 radius=18.0 thickness=2.0 length=200 sld=1 sld_solvent=0
  cpu double: 0.3 ms  gpu single: 3.1 ms  [*** 10.1x slow down]
core_shell_bicelle_elliptical_belt_rough background=0 radius=9.0 x_core=2.111111111111111 thick_rim=1.0 thick_face=1.0 length=198.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0 sigma=0.1
  cpu double: 6.8 ms  gpu single: 45.9 ms  [*** 5.7x slow down]
core_shell_bicelle_elliptical background=0 radius=9.0 x_core=2.111111111111111 thick_rim=1.0 thick_face=1.0 length=198.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0
  cpu double: 6.9 ms  gpu single: 45.8 ms  [*** 5.6x slow down]
core_shell_parallelepiped background=0 sld_core=0 sld_a=1 sld_b=1 sld_c=1 sld_solvent=0 length_a=18.0 length_b=38.0 length_c=198.0 thick_rim_a=1.0 thick_rim_b=1.0 thick_rim_c=1.0
  cpu double: 4.8 ms  gpu single: 41.4 ms  [*** 7.7x slow down]
elliptical_cylinder background=0 radius_minor=10.0 axis_ratio=0.2 length=200 sld=1 sld_solvent=0
  cpu double: 2.5 ms  gpu single: 29.2 ms  [*** 10.5x slow down]
hollow_rectangular_prism_thin_walls background=0 sld=1 sld_solvent=0 length_a=20 b2a_ratio=2.0 c2a_ratio=10.0
  cpu double: 3.0 ms  gpu single: 24.5 ms  [*** 7.1x slow down]
hollow_rectangular_prism background=0 sld=1 sld_solvent=0 length_a=20 b2a_ratio=2.0 c2a_ratio=10.0 thickness=1.0
  cpu double: 4.2 ms  gpu single: 39.5 ms  [*** 8.3x slow down]
parallelepiped background=0 sld=1 sld_solvent=0 length_a=20 length_b=40 length_c=200
  cpu double: 2.9 ms  gpu single: 25.8 ms  [*** 7.8x slow down]
rectangular_prism background=0 sld=1 sld_solvent=0 length_a=20 b2a_ratio=2.0 c2a_ratio=10.0
  cpu double: 2.8 ms  gpu single: 24.4 ms  [*** 7.6x slow down]


=== small disks: a=180 b=200 c=40 ===
core_shell_ellipsoid background=0 radius_equat_core=90.0 x_core=0.1111111111111111 thick_shell=10.0 x_polar_shell=1 sld_core=0 sld_shell=1 sld_solvent=0
  cpu double: 0.2 ms  gpu single: 1.2 ms  [*** 6.1x slow down]
cylinder background=0 sld=1 sld_solvent=0 radius=100.0 length=40
  cpu double: 0.2 ms  gpu single: 1.3 ms  [*** 5.6x slow down]
ellipsoid background=0 sld=1 sld_solvent=0 radius_equatorial=100.0 radius_polar=20.0
  cpu double: 0.1 ms  gpu single: 0.9 ms  [*** 8.3x slow down]
flexible_cylinder background=0 length=400 kuhn_length=40 radius=200 sld=1 sld_solvent=0
  cpu double: 0.1 ms  gpu single: 0.6 ms  [*** 7.3x slow down]
triaxial_ellipsoid background=0 sld=1 sld_solvent=0 radius_equat_minor=180 radius_equat_major=200 radius_polar=40
  cpu double: 29.0 ms  gpu single: 110.6 ms  [*** 2.8x slow down]


=== small cubes: a=200 b=200 c=200 ===
core_shell_ellipsoid background=0 radius_equat_core=90.0 x_core=1.0 thick_shell=10.0 x_polar_shell=1 sld_core=0 sld_shell=1 sld_solvent=0
  cpu double: 0.2 ms  gpu single: 1.2 ms  [*** 6.6x slow down]
cylinder background=0 sld=1 sld_solvent=0 radius=100.0 length=200
  cpu double: 0.2 ms  gpu single: 1.3 ms  [*** 5.5x slow down]
ellipsoid background=0 sld=1 sld_solvent=0 radius_equatorial=100.0 radius_polar=100.0
  cpu double: 0.1 ms  gpu single: 0.9 ms  [*** 8.8x slow down]
flexible_cylinder background=0 length=2000 kuhn_length=200 radius=200 sld=1 sld_solvent=0
  cpu double: 0.1 ms  gpu single: 0.5 ms  [*** 7.2x slow down]
elliptical_cylinder background=0 radius_minor=100.0 axis_ratio=1.0 length=200 sld=1 sld_solvent=0
  cpu double: 11.0 ms  gpu single: 214.6 ms  [*** 18.5x slow down]
triaxial_ellipsoid background=0 sld=1 sld_solvent=0 radius_equat_minor=200 radius_equat_major=200 radius_polar=200
  cpu double: 30.2 ms  gpu single: 110.0 ms  [*** 2.6x slow down]


=== big rods: a=1000 b=2000 c=200000 ===
flexible_cylinder background=0 length=2000000 kuhn_length=200000 radius=2000 sld=1 sld_solvent=0
  cpu double: 0.1 ms  gpu single: 0.6 ms  [*** 8.1x slow down]
barbell background=0 radius=500.0 radius_bell=1000.0 length=196267.94919243112 sld=1 sld_solvent=0
  cpu double: 778.3 ms  gpu single: 334.8 ms  [56% speed up]
capped_cylinder background=0 radius=500.0 radius_cap=1000.0 length=199732.05080756888 sld=1 sld_solvent=0
  cpu double: 771.3 ms  gpu single: 319.3 ms  [58% speed up]
core_shell_bicelle_elliptical_belt_rough background=0 radius=450.0 x_core=2.111111111111111 thick_rim=50.0 thick_face=50.0 length=199900.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0 sigma=5.0
  cpu double: 1297.7 ms  gpu single: 549.2 ms  [57% speed up]
core_shell_bicelle_elliptical background=0 radius=450.0 x_core=2.111111111111111 thick_rim=50.0 thick_face=50.0 length=199900.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0
  cpu double: 1302.1 ms  gpu single: 549.4 ms  [57% speed up]
core_shell_parallelepiped background=0 sld_core=0 sld_a=1 sld_b=1 sld_c=1 sld_solvent=0 length_a=900.0 length_b=1900.0 length_c=199900.0 thick_rim_a=50.0 thick_rim_b=50.0 thick_rim_c=50.0
  cpu double: 891.7 ms  gpu single: 497.2 ms  [44% speed up]
elliptical_cylinder background=0 radius_minor=500.0 axis_ratio=0.01 length=200000 sld=1 sld_solvent=0
  cpu double: 599.2 ms  gpu single: 335.3 ms  [44% speed up]
hollow_rectangular_prism background=0 sld=1 sld_solvent=0 length_a=1000 b2a_ratio=2.0 c2a_ratio=200.0 thickness=50.0
  cpu double: 778.6 ms  gpu single: 485.6 ms  [37% speed up]
parallelepiped background=0 sld=1 sld_solvent=0 length_a=1000 length_b=2000 length_c=200000
  cpu double: 546.8 ms  gpu single: 309.3 ms  [43% speed up]
rectangular_prism background=0 sld=1 sld_solvent=0 length_a=1000 b2a_ratio=2.0 c2a_ratio=200.0
  cpu double: 512.8 ms  gpu single: 296.2 ms  [42% speed up]


=== big disks: a=180000 b=200000 c=1000 ===
flexible_cylinder background=0 length=10000 kuhn_length=1000 radius=200000 sld=1 sld_solvent=0
  cpu double: 0.1 ms  gpu single: 0.6 ms  [*** 8.1x slow down]
barbell background=0 radius=90000.0 radius_bell=100000.0 length=0 sld=1 sld_solvent=0
  cpu double: 1046.0 ms  gpu single: 337.2 ms  [67% speed up]
capped_cylinder background=0 radius=90000.0 radius_cap=100000.0 length=0 sld=1 sld_solvent=0
  cpu double: 936.5 ms  gpu single: 337.3 ms  [63% speed up]
core_shell_bicelle_elliptical_belt_rough background=0 radius=81000.0 x_core=1.123456790123457 thick_rim=9000.0 thick_face=9000.0 length=-17000.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0 sigma=900.0
  cpu double: 1874.4 ms  gpu single: 836.5 ms  [55% speed up]
core_shell_bicelle_elliptical background=0 radius=81000.0 x_core=1.123456790123457 thick_rim=9000.0 thick_face=9000.0 length=-17000.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0
  cpu double: 1874.6 ms  gpu single: 839.9 ms  [55% speed up]
core_shell_parallelepiped background=0 sld_core=0 sld_a=1 sld_b=1 sld_c=1 sld_solvent=0 length_a=162000.0 length_b=182000.0 length_c=-17000.0 thick_rim_a=9000.0 thick_rim_b=9000.0 thick_rim_c=9000.0
  cpu double: 1721.5 ms  gpu single: 749.8 ms  [56% speed up]
elliptical_cylinder background=0 radius_minor=90000.0 axis_ratio=200.0 length=1000 sld=1 sld_solvent=0
! ** elliptical_cylinder is slow: 2.2 s for 201 points in [1e-05, 1.0]
  cpu double: 2225.0 ms  gpu single: 523.9 ms  [76% speed up]
hollow_rectangular_prism_thin_walls background=0 sld=1 sld_solvent=0 length_a=180000 b2a_ratio=1.1111111111111112 c2a_ratio=0.005555555555555556
  cpu double: 883.0 ms  gpu single: 453.4 ms  [48% speed up]
hollow_rectangular_prism background=0 sld=1 sld_solvent=0 length_a=180000 b2a_ratio=1.1111111111111112 c2a_ratio=0.005555555555555556 thickness=9000.0
  cpu double: 1633.8 ms  gpu single: 715.9 ms  [56% speed up]
parallelepiped background=0 sld=1 sld_solvent=0 length_a=180000 length_b=200000 length_c=1000
  cpu double: 1113.5 ms  gpu single: 463.5 ms  [58% speed up]
rectangular_prism background=0 sld=1 sld_solvent=0 length_a=180000 b2a_ratio=1.1111111111111112 c2a_ratio=0.005555555555555556
  cpu double: 1075.5 ms  gpu single: 437.0 ms  [59% speed up]
triaxial_ellipsoid background=0 sld=1 sld_solvent=0 radius_equat_minor=180000 radius_equat_major=200000 radius_polar=1000
  cpu double: 600.6 ms  gpu single: 259.9 ms  [56% speed up]


=== big cubes: a=200000 b=200000 c=200000 ===
flexible_cylinder background=0 length=2000000 kuhn_length=200000 radius=200000 sld=1 sld_solvent=0
  cpu double: 0.1 ms  gpu single: 0.6 ms  [*** 8.2x slow down]
barbell background=0 radius=100000.0 radius_bell=100000.0 length=0.0 sld=1 sld_solvent=0
  cpu double: 1000.5 ms  gpu single: 336.3 ms  [66% speed up]
capped_cylinder background=0 radius=100000.0 radius_cap=100000.0 length=0.0 sld=1 sld_solvent=0
  cpu double: 1000.8 ms  gpu single: 335.6 ms  [66% speed up]
core_shell_bicelle_elliptical_belt_rough background=0 radius=90000.0 x_core=1.0 thick_rim=10000.0 thick_face=10000.0 length=180000.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0 sigma=1000.0
  cpu double: 1765.2 ms  gpu single: 835.1 ms  [52% speed up]
core_shell_bicelle_elliptical background=0 radius=90000.0 x_core=1.0 thick_rim=10000.0 thick_face=10000.0 length=180000.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0
  cpu double: 1764.4 ms  gpu single: 839.7 ms  [52% speed up]
core_shell_parallelepiped background=0 sld_core=0 sld_a=1 sld_b=1 sld_c=1 sld_solvent=0 length_a=180000.0 length_b=180000.0 length_c=180000.0 thick_rim_a=10000.0 thick_rim_b=10000.0 thick_rim_c=10000.0
  cpu double: 1726.7 ms  gpu single: 751.9 ms  [56% speed up]
elliptical_cylinder background=0 radius_minor=100000.0 axis_ratio=1.0 length=200000 sld=1 sld_solvent=0
  cpu double: 912.3 ms  gpu single: 523.2 ms  [42% speed up]
hollow_rectangular_prism_thin_walls background=0 sld=1 sld_solvent=0 length_a=200000 b2a_ratio=1.0 c2a_ratio=1.0
  cpu double: 888.5 ms  gpu single: 453.8 ms  [48% speed up]
hollow_rectangular_prism background=0 sld=1 sld_solvent=0 length_a=200000 b2a_ratio=1.0 c2a_ratio=1.0 thickness=10000.0
  cpu double: 1641.9 ms  gpu single: 714.1 ms  [56% speed up]
parallelepiped background=0 sld=1 sld_solvent=0 length_a=200000 length_b=200000 length_c=200000
  cpu double: 1112.4 ms  gpu single: 464.0 ms  [58% speed up]
rectangular_prism background=0 sld=1 sld_solvent=0 length_a=200000 b2a_ratio=1.0 c2a_ratio=1.0
  cpu double: 1076.6 ms  gpu single: 436.8 ms  [59% speed up]

@pkienzle pkienzle marked this pull request as draft May 14, 2026 03:07
@pkienzle pkienzle marked this pull request as ready for review May 14, 2026 20:41
Copy link
Copy Markdown
Collaborator

@krzywon krzywon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tentatively approving this, but will hold off merging to think about the consequences of this a little more.

My test results using GPU and only comparing speed are quoted below. The only tests that were slower with the GPU were models that originally took 1 ms or less to calculate. I would say this 'slow-down', where a users DREAM fit might now take 5-10 seconds instead of a second or two won't be noticed. The speed up for the slower models, many 10x faster than before, will, however, be appreciated.

== Speed and accuracy tests for all adaptive integration models ==

  • target evaluation time is 2 s (running on a mac M2 chip)
    • q in [1e-5, 1] with 40 points per decade for 201 points total
  • large models tested against 5000 point gaussian integration
    • q=[5e-4, 1e-3, 2e-3] with tol=1e-5 relative (measured q)
    • q=[0.01, 0.1] with tol=0.2 relative (slit resolution limits)
  • small models tested against 5000 point gaussian integration
    • q in [1e-3, 1] with 1 points per decade

!!!! These tests run very slowly --- don't use as part of CI !!!!

=== small rods: a=20 b=40 c=200 ===
core_shell_ellipsoid background=0 radius_equat_core=18.0 x_core=5.444444444444445 thick_shell=2.0 x_polar_shell=1 sld_core=0 sld_shell=1 sld_solvent=0
cpu double: 0.6 ms gpu: 1.5 ms [gpu 2.4x slower] ***
cylinder background=0 sld=1 sld_solvent=0 radius=20.0 length=200
cpu double: 0.7 ms gpu: 1.6 ms [gpu 2.4x slower] ***
ellipsoid background=0 sld=1 sld_solvent=0 radius_equatorial=20.0 radius_polar=100.0
cpu double: 0.3 ms gpu: 1.5 ms [gpu 5.6x slower] ***
flexible_cylinder background=0 length=2000 kuhn_length=200 radius=40 sld=1 sld_solvent=0
cpu double: 0.2 ms gpu: 0.9 ms [gpu 5.4x slower] ***
hollow_cylinder background=0 radius=18.0 thickness=2.0 length=200 sld=1 sld_solvent=0
cpu double: 0.8 ms gpu: 1.8 ms [gpu 2.3x slower] ***
barbell background=0 radius=10.0 radius_bell=20.0 length=125.35898384862244 sld=1 sld_solvent=0
cpu double: 12.9 ms gpu: 3.7 ms [gpu 3.5x faster]
capped_cylinder background=0 radius=10.0 radius_cap=20.0 length=194.64101615137756 sld=1 sld_solvent=0
cpu double: 10.1 ms gpu: 2.9 ms [gpu 3.5x faster]
pringle background=0 radius=20.0 thickness=200 alpha=0.5 beta=0.5 sld=1 sld_solvent=0
cpu double: 754.2 ms gpu: 85.4 ms [gpu 8.8x faster]
triaxial_ellipsoid background=0 sld=1 sld_solvent=0 radius_equat_minor=20 radius_equat_major=40 radius_polar=200
cpu double: 34.4 ms gpu: 7.8 ms [gpu 4.4x faster]

=== small disks: a=180 b=200 c=40 ===
core_shell_ellipsoid background=0 radius_equat_core=90.0 x_core=0.1111111111111111 thick_shell=10.0 x_polar_shell=1 sld_core=0 sld_shell=1 sld_solvent=0
cpu double: 0.7 ms gpu: 1.9 ms [gpu 2.7x slower] ***
cylinder background=0 sld=1 sld_solvent=0 radius=100.0 length=40
cpu double: 0.7 ms gpu: 2.2 ms [gpu 3.1x slower] ***
ellipsoid background=0 sld=1 sld_solvent=0 radius_equatorial=100.0 radius_polar=20.0
cpu double: 0.3 ms gpu: 2.0 ms [gpu 6.2x slower] ***
flexible_cylinder background=0 length=400 kuhn_length=40 radius=200 sld=1 sld_solvent=0
cpu double: 0.2 ms gpu: 1.3 ms [gpu 6.1x slower] ***
hollow_cylinder background=0 radius=90.0 thickness=10.0 length=40 sld=1 sld_solvent=0
cpu double: 1.0 ms gpu: 4.7 ms [gpu 4.6x slower] ***
capped_cylinder background=0 radius=90.0 radius_cap=100.0 length=0 sld=1 sld_solvent=0
cpu double: 30.6 ms gpu: 4.5 ms [gpu 6.8x faster]
core_shell_bicelle_elliptical_belt_rough background=0 radius=81.0 x_core=1.123456790123457 thick_rim=9.0 thick_face=9.0 length=22.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0 sigma=0.9
cpu double: 79.0 ms gpu: 26.4 ms [gpu 3.0x faster]
core_shell_bicelle_elliptical background=0 radius=81.0 x_core=1.123456790123457 thick_rim=9.0 thick_face=9.0 length=22.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0
cpu double: 85.4 ms gpu: 26.1 ms [gpu 3.3x faster]
core_shell_parallelepiped background=0 sld_core=0 sld_a=1 sld_b=1 sld_c=1 sld_solvent=0 length_a=162.0 length_b=182.0 length_c=22.0 thick_rim_a=9.0 thick_rim_b=9.0 thick_rim_c=9.0
cpu double: 44.0 ms gpu: 16.2 ms [gpu 2.7x faster]
elliptical_cylinder background=0 radius_minor=90.0 axis_ratio=5.0 length=40 sld=1 sld_solvent=0
cpu double: 177.3 ms gpu: 14.8 ms [gpu 12.0x faster]
hollow_rectangular_prism_thin_walls background=0 sld=1 sld_solvent=0 length_a=180 b2a_ratio=1.1111111111111112 c2a_ratio=0.2222222222222222
cpu double: 26.7 ms gpu: 7.3 ms [gpu 3.7x faster]
hollow_rectangular_prism background=0 sld=1 sld_solvent=0 length_a=180 b2a_ratio=1.1111111111111112 c2a_ratio=0.2222222222222222 thickness=9.0
cpu double: 28.2 ms gpu: 13.7 ms [gpu 2.1x faster]
pringle background=0 radius=100.0 thickness=40 alpha=0.09999999999999998 beta=0.09999999999999998 sld=1 sld_solvent=0
cpu double: 1987.8 ms gpu: 126.2 ms [gpu 15.7x faster]
triaxial_ellipsoid background=0 sld=1 sld_solvent=0 radius_equat_minor=180 radius_equat_major=200 radius_polar=40
cpu double: 40.7 ms gpu: 6.7 ms [gpu 6.1x faster]

=== small cubes: a=200 b=200 c=200 ===
core_shell_ellipsoid background=0 radius_equat_core=90.0 x_core=1.0 thick_shell=10.0 x_polar_shell=1 sld_core=0 sld_shell=1 sld_solvent=0
cpu double: 0.6 ms gpu: 1.6 ms [gpu 2.7x slower] ***
cylinder background=0 sld=1 sld_solvent=0 radius=100.0 length=200
cpu double: 0.7 ms gpu: 2.2 ms [gpu 3.0x slower] ***
ellipsoid background=0 sld=1 sld_solvent=0 radius_equatorial=100.0 radius_polar=100.0
cpu double: 0.3 ms gpu: 1.7 ms [gpu 5.7x slower] ***
flexible_cylinder background=0 length=2000 kuhn_length=200 radius=200 sld=1 sld_solvent=0
cpu double: 0.2 ms gpu: 1.2 ms [gpu 5.5x slower] ***
hollow_cylinder background=0 radius=90.0 thickness=10.0 length=200 sld=1 sld_solvent=0
cpu double: 1.1 ms gpu: 3.5 ms [gpu 3.1x slower] ***
barbell background=0 radius=100.0 radius_bell=100.0 length=0.0 sld=1 sld_solvent=0
cpu double: 20.0 ms gpu: 5.2 ms [gpu 3.9x faster]
capped_cylinder background=0 radius=100.0 radius_cap=100.0 length=0.0 sld=1 sld_solvent=0
cpu double: 19.7 ms gpu: 3.5 ms [gpu 5.6x faster]
core_shell_parallelepiped background=0 sld_core=0 sld_a=1 sld_b=1 sld_c=1 sld_solvent=0 length_a=180.0 length_b=180.0 length_c=180.0 thick_rim_a=10.0 thick_rim_b=10.0 thick_rim_c=10.0
cpu double: 31.8 ms gpu: 15.1 ms [gpu 2.1x faster]
hollow_rectangular_prism_thin_walls background=0 sld=1 sld_solvent=0 length_a=200 b2a_ratio=1.0 c2a_ratio=1.0
cpu double: 26.0 ms gpu: 7.5 ms [gpu 3.5x faster]
hollow_rectangular_prism background=0 sld=1 sld_solvent=0 length_a=200 b2a_ratio=1.0 c2a_ratio=1.0 thickness=10.0
cpu double: 34.8 ms gpu: 13.9 ms [gpu 2.5x faster]
parallelepiped background=0 sld=1 sld_solvent=0 length_a=200 length_b=200 length_c=200
cpu double: 22.7 ms gpu: 11.0 ms [gpu 2.1x faster]
pringle background=0 radius=100.0 thickness=200 alpha=0.001 beta=0.001 sld=1 sld_solvent=0
cpu double: 341.8 ms gpu: 20.7 ms [gpu 16.5x faster]
rectangular_prism background=0 sld=1 sld_solvent=0 length_a=200 b2a_ratio=1.0 c2a_ratio=1.0
cpu double: 21.1 ms gpu: 8.9 ms [gpu 2.4x faster]
triaxial_ellipsoid background=0 sld=1 sld_solvent=0 radius_equat_minor=200 radius_equat_major=200 radius_polar=200
cpu double: 28.0 ms gpu: 6.8 ms [gpu 4.1x faster]

=== big rods: a=1000 b=2000 c=200000 ===
core_shell_bicelle background=0 radius=900.0 thick_rim=100.0 thick_face=100.0 length=199800.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0
cpu double: 94.9 ms gpu: 5.4 ms [gpu 17.5x faster]
core_shell_cylinder background=0 radius=900.0 thickness=100.0 length=199800.0 sld_core=0 sld_shell=1 sld_solvent=0
cpu double: 104.8 ms gpu: 5.6 ms [gpu 18.8x faster]
core_shell_ellipsoid background=0 radius_equat_core=900.0 x_core=111.0 thick_shell=100.0 x_polar_shell=1 sld_core=0 sld_shell=1 sld_solvent=0
cpu double: 43.7 ms gpu: 3.0 ms [gpu 14.6x faster]
cylinder background=0 sld=1 sld_solvent=0 radius=1000.0 length=200000
cpu double: 64.1 ms gpu: 3.9 ms [gpu 16.5x faster]
ellipsoid background=0 sld=1 sld_solvent=0 radius_equatorial=1000.0 radius_polar=100000.0
cpu double: 19.7 ms gpu: 2.9 ms [gpu 6.9x faster]
flexible_cylinder background=0 length=2000000 kuhn_length=200000 radius=2000 sld=1 sld_solvent=0
cpu double: 0.3 ms gpu: 1.3 ms [gpu 4.5x slower] ***
hollow_cylinder background=0 radius=900.0 thickness=100.0 length=200000 sld=1 sld_solvent=0
cpu double: 86.7 ms gpu: 8.3 ms [gpu 10.5x faster]
stacked_disks background=0 thick_core=18000.0 thick_layer=1000.0 radius=1000.0 n_stacking=10 sigma_d=1000.0 sld_core=0 sld_layer=1 sld_solvent=0
cpu double: 152.5 ms gpu: 11.0 ms [gpu 13.9x faster]
barbell background=0 radius=500.0 radius_bell=1000.0 length=196267.94919243112 sld=1 sld_solvent=0
! ** barbell is slow: 2.6 s for 201 points in [1e-05, 1.0]
cpu double: 2648.0 ms gpu: 131.0 ms [gpu 20.2x faster]
capped_cylinder background=0 radius=500.0 radius_cap=1000.0 length=199732.05080756888 sld=1 sld_solvent=0
! ** capped_cylinder is slow: 2.1 s for 201 points in [1e-05, 1.0]
cpu double: 2054.1 ms gpu: 127.4 ms [gpu 16.1x faster]
core_shell_bicelle_elliptical_belt_rough background=0 radius=450.0 x_core=2.111111111111111 thick_rim=50.0 thick_face=50.0 length=199900.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0 sigma=5.0
cpu double: 563.1 ms gpu: 26.7 ms [gpu 21.1x faster]
core_shell_bicelle_elliptical background=0 radius=450.0 x_core=2.111111111111111 thick_rim=50.0 thick_face=50.0 length=199900.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0
cpu double: 568.4 ms gpu: 28.1 ms [gpu 20.2x faster]
core_shell_parallelepiped background=0 sld_core=0 sld_a=1 sld_b=1 sld_c=1 sld_solvent=0 length_a=900.0 length_b=1900.0 length_c=199900.0 thick_rim_a=50.0 thick_rim_b=50.0 thick_rim_c=50.0
cpu double: 424.4 ms gpu: 17.1 ms [gpu 24.8x faster]
elliptical_cylinder background=0 radius_minor=500.0 axis_ratio=0.01 length=200000 sld=1 sld_solvent=0
cpu double: 265.6 ms gpu: 14.5 ms [gpu 18.4x faster]
flexible_cylinder_elliptical background=0 length=2000000 kuhn_length=200000 radius=1000 axis_ratio=2.0 sld=1 sld_solvent=0
cpu double: 12.1 ms gpu: 3.7 ms [gpu 3.2x faster]
hollow_rectangular_prism_thin_walls background=0 sld=1 sld_solvent=0 length_a=1000 b2a_ratio=2.0 c2a_ratio=200.0
cpu double: 425.7 ms gpu: 19.7 ms [gpu 21.6x faster]
hollow_rectangular_prism background=0 sld=1 sld_solvent=0 length_a=1000 b2a_ratio=2.0 c2a_ratio=200.0 thickness=50.0
cpu double: 1149.9 ms gpu: 39.2 ms [gpu 29.3x faster]
parallelepiped background=0 sld=1 sld_solvent=0 length_a=1000 length_b=2000 length_c=200000
cpu double: 1087.3 ms gpu: 43.5 ms [gpu 25.0x faster]
pringle background=0 radius=1000.0 thickness=200000 alpha=0.5 beta=0.5 sld=1 sld_solvent=0
! ** pringle is slow: 70.7 s for 201 points in [1e-05, 1.0]
cpu double: 70700.5 ms gpu: 179.9 ms [gpu 393.0x faster]
rectangular_prism background=0 sld=1 sld_solvent=0 length_a=1000 b2a_ratio=2.0 c2a_ratio=200.0
cpu double: 288.6 ms gpu: 10.3 ms [gpu 28.1x faster]
triaxial_ellipsoid background=0 sld=1 sld_solvent=0 radius_equat_minor=1000 radius_equat_major=2000 radius_polar=200000
cpu double: 302.4 ms gpu: 14.5 ms [gpu 20.9x faster]

=== big disks: a=180000 b=200000 c=1000 ===
core_shell_bicelle background=0 radius=90000.0 thick_rim=10000.0 thick_face=10000.0 length=-19000.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0
cpu double: 202.5 ms gpu: 6.7 ms [gpu 30.3x faster]
core_shell_cylinder background=0 radius=90000.0 thickness=10000.0 length=-19000.0 sld_core=0 sld_shell=1 sld_solvent=0
cpu double: 213.7 ms gpu: 7.1 ms [gpu 30.2x faster]
core_shell_ellipsoid background=0 radius_equat_core=90000.0 x_core=-0.10555555555555556 thick_shell=10000.0 x_polar_shell=1 sld_core=0 sld_shell=1 sld_solvent=0
cpu double: 80.6 ms gpu: 4.1 ms [gpu 19.7x faster]
cylinder background=0 sld=1 sld_solvent=0 radius=100000.0 length=1000
cpu double: 125.9 ms gpu: 6.4 ms [gpu 19.6x faster]
ellipsoid background=0 sld=1 sld_solvent=0 radius_equatorial=100000.0 radius_polar=500.0
cpu double: 36.3 ms gpu: 3.8 ms [gpu 9.6x faster]
flexible_cylinder background=0 length=10000 kuhn_length=1000 radius=200000 sld=1 sld_solvent=0
cpu double: 0.6 ms gpu: 2.4 ms [gpu 4.0x slower] ***
hollow_cylinder background=0 radius=90000.0 thickness=10000.0 length=1000 sld=1 sld_solvent=0
cpu double: 198.4 ms gpu: 13.7 ms [gpu 14.4x faster]
stacked_disks background=0 thick_core=90.0 thick_layer=5.0 radius=100000.0 n_stacking=10 sigma_d=5.0 sld_core=0 sld_layer=1 sld_solvent=0
cpu double: 472.1 ms gpu: 10.2 ms [gpu 46.5x faster]
barbell background=0 radius=90000.0 radius_bell=100000.0 length=0 sld=1 sld_solvent=0
! ** barbell is slow: 7.8 s for 201 points in [1e-05, 1.0]
cpu double: 7831.3 ms gpu: 145.6 ms [gpu 53.8x faster]
capped_cylinder background=0 radius=90000.0 radius_cap=100000.0 length=0 sld=1 sld_solvent=0
! ** capped_cylinder is slow: 7.0 s for 201 points in [1e-05, 1.0]
cpu double: 7014.7 ms gpu: 134.7 ms [gpu 52.1x faster]
core_shell_bicelle_elliptical_belt_rough background=0 radius=81000.0 x_core=1.123456790123457 thick_rim=9000.0 thick_face=9000.0 length=-17000.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0 sigma=900.0
cpu double: 1976.7 ms gpu: 64.0 ms [gpu 30.9x faster]
core_shell_bicelle_elliptical background=0 radius=81000.0 x_core=1.123456790123457 thick_rim=9000.0 thick_face=9000.0 length=-17000.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0
cpu double: 1906.4 ms gpu: 61.7 ms [gpu 30.9x faster]
core_shell_parallelepiped background=0 sld_core=0 sld_a=1 sld_b=1 sld_c=1 sld_solvent=0 length_a=162000.0 length_b=182000.0 length_c=-17000.0 thick_rim_a=9000.0 thick_rim_b=9000.0 thick_rim_c=9000.0
cpu double: 1031.6 ms gpu: 19.1 ms [gpu 53.9x faster]
elliptical_cylinder background=0 radius_minor=90000.0 axis_ratio=200.0 length=1000 sld=1 sld_solvent=0
! ** elliptical_cylinder is slow: 2.0 s for 201 points in [1e-05, 1.0]
cpu double: 2025.1 ms gpu: 39.4 ms [gpu 51.4x faster]
flexible_cylinder_elliptical background=0 length=10000 kuhn_length=1000 radius=180000 axis_ratio=1.1111111111111112 sld=1 sld_solvent=0
cpu double: 141.8 ms gpu: 7.7 ms [gpu 18.4x faster]
hollow_rectangular_prism_thin_walls background=0 sld=1 sld_solvent=0 length_a=180000 b2a_ratio=1.1111111111111112 c2a_ratio=0.005555555555555556
cpu double: 765.7 ms gpu: 10.0 ms [gpu 76.4x faster]
hollow_rectangular_prism background=0 sld=1 sld_solvent=0 length_a=180000 b2a_ratio=1.1111111111111112 c2a_ratio=0.005555555555555556 thickness=9000.0
cpu double: 863.0 ms gpu: 36.8 ms [gpu 23.4x faster]
parallelepiped background=0 sld=1 sld_solvent=0 length_a=180000 length_b=200000 length_c=1000
cpu double: 557.8 ms gpu: 13.1 ms [gpu 42.6x faster]
pringle background=0 radius=100000.0 thickness=1000 alpha=0.09999999999999998 beta=0.09999999999999998 sld=1 sld_solvent=0
! ** pringle is slow: 31.5 s for 201 points in [1e-05, 1.0]
cpu double: 31505.9 ms gpu: 469.0 ms [gpu 67.2x faster]
rectangular_prism background=0 sld=1 sld_solvent=0 length_a=180000 b2a_ratio=1.1111111111111112 c2a_ratio=0.005555555555555556
cpu double: 371.4 ms gpu: 10.3 ms [gpu 36.1x faster]
triaxial_ellipsoid background=0 sld=1 sld_solvent=0 radius_equat_minor=180000 radius_equat_major=200000 radius_polar=1000
cpu double: 287.5 ms gpu: 16.2 ms [gpu 17.8x faster]

=== big cubes: a=200000 b=200000 c=200000 ===
core_shell_bicelle background=0 radius=90000.0 thick_rim=10000.0 thick_face=10000.0 length=180000.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0
cpu double: 157.0 ms gpu: 5.7 ms [gpu 27.5x faster]
core_shell_cylinder background=0 radius=90000.0 thickness=10000.0 length=180000.0 sld_core=0 sld_shell=1 sld_solvent=0
cpu double: 135.3 ms gpu: 5.9 ms [gpu 22.8x faster]
core_shell_ellipsoid background=0 radius_equat_core=90000.0 x_core=1.0 thick_shell=10000.0 x_polar_shell=1 sld_core=0 sld_shell=1 sld_solvent=0
cpu double: 50.2 ms gpu: 3.6 ms [gpu 14.1x faster]
cylinder background=0 sld=1 sld_solvent=0 radius=100000.0 length=200000
cpu double: 80.1 ms gpu: 4.5 ms [gpu 17.9x faster]
ellipsoid background=0 sld=1 sld_solvent=0 radius_equatorial=100000.0 radius_polar=100000.0
cpu double: 21.1 ms gpu: 2.2 ms [gpu 9.4x faster]
flexible_cylinder background=0 length=2000000 kuhn_length=200000 radius=200000 sld=1 sld_solvent=0
cpu double: 0.3 ms gpu: 1.3 ms [gpu 3.7x slower] ***
hollow_cylinder background=0 radius=90000.0 thickness=10000.0 length=200000 sld=1 sld_solvent=0
cpu double: 100.7 ms gpu: 8.6 ms [gpu 11.7x faster]
stacked_disks background=0 thick_core=18000.0 thick_layer=1000.0 radius=100000.0 n_stacking=10 sigma_d=1000.0 sld_core=0 sld_layer=1 sld_solvent=0
cpu double: 229.2 ms gpu: 10.9 ms [gpu 21.0x faster]
barbell background=0 radius=100000.0 radius_bell=100000.0 length=0.0 sld=1 sld_solvent=0
! ** barbell is slow: 3.0 s for 201 points in [1e-05, 1.0]
cpu double: 3019.8 ms gpu: 131.3 ms [gpu 23.0x faster]
capped_cylinder background=0 radius=100000.0 radius_cap=100000.0 length=0.0 sld=1 sld_solvent=0
! ** capped_cylinder is slow: 3.1 s for 201 points in [1e-05, 1.0]
cpu double: 3073.2 ms gpu: 133.3 ms [gpu 23.1x faster]
core_shell_bicelle_elliptical_belt_rough background=0 radius=90000.0 x_core=1.0 thick_rim=10000.0 thick_face=10000.0 length=180000.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0 sigma=1000.0
cpu double: 853.5 ms gpu: 31.3 ms [gpu 27.3x faster]
core_shell_bicelle_elliptical background=0 radius=90000.0 x_core=1.0 thick_rim=10000.0 thick_face=10000.0 length=180000.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0
cpu double: 841.5 ms gpu: 29.1 ms [gpu 28.9x faster]
core_shell_parallelepiped background=0 sld_core=0 sld_a=1 sld_b=1 sld_c=1 sld_solvent=0 length_a=180000.0 length_b=180000.0 length_c=180000.0 thick_rim_a=10000.0 thick_rim_b=10000.0 thick_rim_c=10000.0
cpu double: 505.0 ms gpu: 17.8 ms [gpu 28.3x faster]
elliptical_cylinder background=0 radius_minor=100000.0 axis_ratio=1.0 length=200000 sld=1 sld_solvent=0
cpu double: 506.8 ms gpu: 15.2 ms [gpu 33.4x faster]
flexible_cylinder_elliptical background=0 length=2000000 kuhn_length=200000 radius=200000 axis_ratio=1.0 sld=1 sld_solvent=0
cpu double: 56.0 ms gpu: 5.6 ms [gpu 10.0x faster]
hollow_rectangular_prism_thin_walls background=0 sld=1 sld_solvent=0 length_a=200000 b2a_ratio=1.0 c2a_ratio=1.0
cpu double: 360.5 ms gpu: 8.3 ms [gpu 43.2x faster]
hollow_rectangular_prism background=0 sld=1 sld_solvent=0 length_a=200000 b2a_ratio=1.0 c2a_ratio=1.0 thickness=10000.0
cpu double: 528.7 ms gpu: 15.6 ms [gpu 34.0x faster]
parallelepiped background=0 sld=1 sld_solvent=0 length_a=200000 length_b=200000 length_c=200000
cpu double: 307.3 ms gpu: 11.8 ms [gpu 26.1x faster]
pringle background=0 radius=100000.0 thickness=200000 alpha=0.001 beta=0.001 sld=1 sld_solvent=0
! ** pringle is slow: 12.2 s for 201 points in [1e-05, 1.0]
cpu double: 12232.7 ms gpu: 266.8 ms [gpu 45.9x faster]
rectangular_prism background=0 sld=1 sld_solvent=0 length_a=200000 b2a_ratio=1.0 c2a_ratio=1.0
cpu double: 274.2 ms gpu: 9.5 ms [gpu 28.8x faster]
triaxial_ellipsoid background=0 sld=1 sld_solvent=0 radius_equat_minor=200000 radius_equat_major=200000 radius_polar=200000
cpu double: 179.7 ms gpu: 18.0 ms [gpu 10.0x faster]

// To force a fixed rather than adaptive integration scheme, replace [..., "lib/adaptive.c", ...]
// with [..., "lib/gauss<n>.c", "lib/nonadaptive.c", ...] in your source lists.

// Hack for barbell and capped cylinder keeps the outer integral to 76 points or fewer
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither of the models listed here are importing nonadaptive.c. Is this still used?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nonadaptive.c is used with generate.set_integration_size. It is invoked with sasmodels.compare -ngauss=76 for example to set a fixed order 76 gaussian integration.

@butlerpd
Copy link
Copy Markdown
Member

I agree -- this sounds like a major win and would be nice to include. I note that there is now a conflict with Octahedron_truncated.c which I assume is due the recent cleanup in octahedron model? But that also begs the question of how this PR is different from PR #710 (Adaptive integration for truncated octahedron)? It seems like this supersedes it?

@pkienzle
Copy link
Copy Markdown
Contributor Author

truncated octahedron had a bunch of renames of parameters and even the model name. This was initially set on top of the adaptive integration branch. When it looked like the adaptive integration branch wasn't going to be merged, a separate pr with just the renames was created. Merge the adaptive integration one and delete the other if this PR is applied, otherwise merge the other and redo the adaptive integration changes (including the reordering of the axes).

@pkienzle
Copy link
Copy Markdown
Contributor Author

My test results using GPU and only comparing speed are quoted below. The only tests that were slower with the GPU were models that originally took 1 ms or less to calculate. I would say this 'slow-down', where a users DREAM fit might now take 5-10 seconds instead of a second or two won't be noticed. The speed up for the slower models, many 10x faster than before, will, however, be appreciated.

There is also the speed of adaptive (or adaptive with gpu) relative to the existing 76-point gaussian (it is faster for small shapes but much slower for large shapes) compared to the change in accuracy. This is a complicated tradeoff.

Note that accuracy improvement is highly nonlinear. It goes from being excellent to poor with a small change in integration order.

@krzywon
Copy link
Copy Markdown
Collaborator

krzywon commented May 15, 2026

I'm going ahead and merging this for the alpha

@krzywon krzywon merged commit 55c5b9c into master May 15, 2026
26 checks passed
@krzywon krzywon deleted the ticket-535-adaptive-integration branch May 15, 2026 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants