Simple adaptive integration#658
Conversation
|
Example of bad triaxial ellipsoid (20% error): $ python -m sasmodels.compare background=0 triaxial_ellipsoid -ngauss=0,10000 -engine=single,single! -nq=30 -random=716856 -pars
Randomize using -random=716856
scale: 0.00343363
background: 0
sld: 11.2141
sld_solvent: 10.9297
radius_equat_minor: 41.5349
radius_equat_major: 9142.92
radius_polar: 74.0436
GPU[32] t=58.31 ms, intensity=33
DLL[32] t=12646.33 ms, intensity=33
|GPU[32]-DLL[32]| max:1.941e-03 median:1.907e-06 98%:1.849e-03 rms:6.002e-04 zero-offset:+2.745e-04
|(GPU[32]-DLL[32])/DLL[32]| max:1.884e-01 median:1.179e-06 98%:1.810e-01 rms:5.891e-02 zero-offset:+2.319e-02The fixed 76 point integration scheme works better for this example (0.3% error). Maybe it is worth exploring Lebedev and other surface quadrature schemes for these nested integrals. It is messy, though, because not all of them are of the form ∫∫ F(q) sin(θ) dφ dθ. |
|
This was briefly discussed at today's fortnightly call and tagged as of interest to the upcoming camp. Question is whether it provides a minimal change to provide a reasonable speedup. It is noted that this PR not only adds the new adaptive integation it changes all the model files that currently use the GaussXX methods with this one. Probably would have been cleaner as two separate PRs? Also at issue is what to do with the integration speedup already proposed a few years earlier and sitting in #608
|
bffeaf0 to
615df71
Compare
|
This works well for rotationally symmetric shapes that only use 1D integrals. Performance is unsatisfactory on shapes such as triaxial ellipsoid that need 2D integrals. I could revert changes for those models until we've had a chance to explore other schemes such as Lebedev or Fibonacci. |
…sasmodels into ticket-535-adaptive-integration
|
List of shapes with 2D integrals:
For these shapes the computational cost is quadratic in the number of integration points, so it is not feasible to fit large shapes accurately. Consider returning NaN for q values that require more than a million evaluations to get better than 3e-3 accuracy. If these q are dropped from the residuals calculation the fit can still proceed for the low q points but the high q points will be ignored. This may end up biasing the fit toward large shapes since the estimated log likelihood will be reduced. Triaxial ellipsoid, the five rectangular prisms and the three elliptical cylinders should be reasonably accurate for dimensions below 1 μm, though they can take several seconds per evaluation. [I only tested triaxial ellipsoid, parallelepiped and elliptical cylinder; the others follow the same code patterns so they are probably good but should still be tested.] |
|
would unrolling the integral to distribute on GPU's help the speed? |
Yes, but not much. With 15000 cores and 150 q points evaluated in parallel we could potentially see a 100x improvement over the current speed. For a 1 μm cube this would turn a 5 s evaluation into a 0.05 s evaluation. But cost is growing as (qr)² or worse, so a 10 μm cube would be back at 5 s again. We need better algorithms for USAXS/USANS calculations. |
|
... except that USAXS/USANS will be at lower q, so in practice it shouldn't be a problem. The issue is with slit resolution, which pulls from a very high q values. With A couple of options:
All of these will require icky code in the interface between resolution function and model calculations. Given that it'll break USAXS/USANS, I don't think we should merge this PR until we figure out how to handle slit resolution. |
|
Model integration is now limited to fewer than 200000 (θ, φ) grid points per q value.
Small models target the SANS range (size < 20 nm) with relative error ε < 1e-10 over q in [0.001, 1]. Large models target the USANS range (size < 20 μm) with relative error ε < 5e-5 for q < 0.002 and ε < 0.2 for q < 0.1 (slit resolution correction). Performance for most models seems acceptable, though Here's the results for the current run, showing where the models to not achieve the target performance: |
|
GPU performance on mac M2 is bad compared to cpu for small models, though large models see some benefit. Speed check cpu vs gpu on mac ocl, after setting gpu_speed_check and speed_only in |
krzywon
left a comment
There was a problem hiding this comment.
Tentatively approving this, but will hold off merging to think about the consequences of this a little more.
My test results using GPU and only comparing speed are quoted below. The only tests that were slower with the GPU were models that originally took 1 ms or less to calculate. I would say this 'slow-down', where a users DREAM fit might now take 5-10 seconds instead of a second or two won't be noticed. The speed up for the slower models, many 10x faster than before, will, however, be appreciated.
== Speed and accuracy tests for all adaptive integration models ==
- target evaluation time is 2 s (running on a mac M2 chip)
- q in [1e-5, 1] with 40 points per decade for 201 points total
- large models tested against 5000 point gaussian integration
- q=[5e-4, 1e-3, 2e-3] with tol=1e-5 relative (measured q)
- q=[0.01, 0.1] with tol=0.2 relative (slit resolution limits)
- small models tested against 5000 point gaussian integration
- q in [1e-3, 1] with 1 points per decade
!!!! These tests run very slowly --- don't use as part of CI !!!!
=== small rods: a=20 b=40 c=200 ===
core_shell_ellipsoid background=0 radius_equat_core=18.0 x_core=5.444444444444445 thick_shell=2.0 x_polar_shell=1 sld_core=0 sld_shell=1 sld_solvent=0
cpu double: 0.6 ms gpu: 1.5 ms [gpu 2.4x slower] ***
cylinder background=0 sld=1 sld_solvent=0 radius=20.0 length=200
cpu double: 0.7 ms gpu: 1.6 ms [gpu 2.4x slower] ***
ellipsoid background=0 sld=1 sld_solvent=0 radius_equatorial=20.0 radius_polar=100.0
cpu double: 0.3 ms gpu: 1.5 ms [gpu 5.6x slower] ***
flexible_cylinder background=0 length=2000 kuhn_length=200 radius=40 sld=1 sld_solvent=0
cpu double: 0.2 ms gpu: 0.9 ms [gpu 5.4x slower] ***
hollow_cylinder background=0 radius=18.0 thickness=2.0 length=200 sld=1 sld_solvent=0
cpu double: 0.8 ms gpu: 1.8 ms [gpu 2.3x slower] ***
barbell background=0 radius=10.0 radius_bell=20.0 length=125.35898384862244 sld=1 sld_solvent=0
cpu double: 12.9 ms gpu: 3.7 ms [gpu 3.5x faster]
capped_cylinder background=0 radius=10.0 radius_cap=20.0 length=194.64101615137756 sld=1 sld_solvent=0
cpu double: 10.1 ms gpu: 2.9 ms [gpu 3.5x faster]
pringle background=0 radius=20.0 thickness=200 alpha=0.5 beta=0.5 sld=1 sld_solvent=0
cpu double: 754.2 ms gpu: 85.4 ms [gpu 8.8x faster]
triaxial_ellipsoid background=0 sld=1 sld_solvent=0 radius_equat_minor=20 radius_equat_major=40 radius_polar=200
cpu double: 34.4 ms gpu: 7.8 ms [gpu 4.4x faster]=== small disks: a=180 b=200 c=40 ===
core_shell_ellipsoid background=0 radius_equat_core=90.0 x_core=0.1111111111111111 thick_shell=10.0 x_polar_shell=1 sld_core=0 sld_shell=1 sld_solvent=0
cpu double: 0.7 ms gpu: 1.9 ms [gpu 2.7x slower] ***
cylinder background=0 sld=1 sld_solvent=0 radius=100.0 length=40
cpu double: 0.7 ms gpu: 2.2 ms [gpu 3.1x slower] ***
ellipsoid background=0 sld=1 sld_solvent=0 radius_equatorial=100.0 radius_polar=20.0
cpu double: 0.3 ms gpu: 2.0 ms [gpu 6.2x slower] ***
flexible_cylinder background=0 length=400 kuhn_length=40 radius=200 sld=1 sld_solvent=0
cpu double: 0.2 ms gpu: 1.3 ms [gpu 6.1x slower] ***
hollow_cylinder background=0 radius=90.0 thickness=10.0 length=40 sld=1 sld_solvent=0
cpu double: 1.0 ms gpu: 4.7 ms [gpu 4.6x slower] ***
capped_cylinder background=0 radius=90.0 radius_cap=100.0 length=0 sld=1 sld_solvent=0
cpu double: 30.6 ms gpu: 4.5 ms [gpu 6.8x faster]
core_shell_bicelle_elliptical_belt_rough background=0 radius=81.0 x_core=1.123456790123457 thick_rim=9.0 thick_face=9.0 length=22.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0 sigma=0.9
cpu double: 79.0 ms gpu: 26.4 ms [gpu 3.0x faster]
core_shell_bicelle_elliptical background=0 radius=81.0 x_core=1.123456790123457 thick_rim=9.0 thick_face=9.0 length=22.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0
cpu double: 85.4 ms gpu: 26.1 ms [gpu 3.3x faster]
core_shell_parallelepiped background=0 sld_core=0 sld_a=1 sld_b=1 sld_c=1 sld_solvent=0 length_a=162.0 length_b=182.0 length_c=22.0 thick_rim_a=9.0 thick_rim_b=9.0 thick_rim_c=9.0
cpu double: 44.0 ms gpu: 16.2 ms [gpu 2.7x faster]
elliptical_cylinder background=0 radius_minor=90.0 axis_ratio=5.0 length=40 sld=1 sld_solvent=0
cpu double: 177.3 ms gpu: 14.8 ms [gpu 12.0x faster]
hollow_rectangular_prism_thin_walls background=0 sld=1 sld_solvent=0 length_a=180 b2a_ratio=1.1111111111111112 c2a_ratio=0.2222222222222222
cpu double: 26.7 ms gpu: 7.3 ms [gpu 3.7x faster]
hollow_rectangular_prism background=0 sld=1 sld_solvent=0 length_a=180 b2a_ratio=1.1111111111111112 c2a_ratio=0.2222222222222222 thickness=9.0
cpu double: 28.2 ms gpu: 13.7 ms [gpu 2.1x faster]
pringle background=0 radius=100.0 thickness=40 alpha=0.09999999999999998 beta=0.09999999999999998 sld=1 sld_solvent=0
cpu double: 1987.8 ms gpu: 126.2 ms [gpu 15.7x faster]
triaxial_ellipsoid background=0 sld=1 sld_solvent=0 radius_equat_minor=180 radius_equat_major=200 radius_polar=40
cpu double: 40.7 ms gpu: 6.7 ms [gpu 6.1x faster]=== small cubes: a=200 b=200 c=200 ===
core_shell_ellipsoid background=0 radius_equat_core=90.0 x_core=1.0 thick_shell=10.0 x_polar_shell=1 sld_core=0 sld_shell=1 sld_solvent=0
cpu double: 0.6 ms gpu: 1.6 ms [gpu 2.7x slower] ***
cylinder background=0 sld=1 sld_solvent=0 radius=100.0 length=200
cpu double: 0.7 ms gpu: 2.2 ms [gpu 3.0x slower] ***
ellipsoid background=0 sld=1 sld_solvent=0 radius_equatorial=100.0 radius_polar=100.0
cpu double: 0.3 ms gpu: 1.7 ms [gpu 5.7x slower] ***
flexible_cylinder background=0 length=2000 kuhn_length=200 radius=200 sld=1 sld_solvent=0
cpu double: 0.2 ms gpu: 1.2 ms [gpu 5.5x slower] ***
hollow_cylinder background=0 radius=90.0 thickness=10.0 length=200 sld=1 sld_solvent=0
cpu double: 1.1 ms gpu: 3.5 ms [gpu 3.1x slower] ***
barbell background=0 radius=100.0 radius_bell=100.0 length=0.0 sld=1 sld_solvent=0
cpu double: 20.0 ms gpu: 5.2 ms [gpu 3.9x faster]
capped_cylinder background=0 radius=100.0 radius_cap=100.0 length=0.0 sld=1 sld_solvent=0
cpu double: 19.7 ms gpu: 3.5 ms [gpu 5.6x faster]
core_shell_parallelepiped background=0 sld_core=0 sld_a=1 sld_b=1 sld_c=1 sld_solvent=0 length_a=180.0 length_b=180.0 length_c=180.0 thick_rim_a=10.0 thick_rim_b=10.0 thick_rim_c=10.0
cpu double: 31.8 ms gpu: 15.1 ms [gpu 2.1x faster]
hollow_rectangular_prism_thin_walls background=0 sld=1 sld_solvent=0 length_a=200 b2a_ratio=1.0 c2a_ratio=1.0
cpu double: 26.0 ms gpu: 7.5 ms [gpu 3.5x faster]
hollow_rectangular_prism background=0 sld=1 sld_solvent=0 length_a=200 b2a_ratio=1.0 c2a_ratio=1.0 thickness=10.0
cpu double: 34.8 ms gpu: 13.9 ms [gpu 2.5x faster]
parallelepiped background=0 sld=1 sld_solvent=0 length_a=200 length_b=200 length_c=200
cpu double: 22.7 ms gpu: 11.0 ms [gpu 2.1x faster]
pringle background=0 radius=100.0 thickness=200 alpha=0.001 beta=0.001 sld=1 sld_solvent=0
cpu double: 341.8 ms gpu: 20.7 ms [gpu 16.5x faster]
rectangular_prism background=0 sld=1 sld_solvent=0 length_a=200 b2a_ratio=1.0 c2a_ratio=1.0
cpu double: 21.1 ms gpu: 8.9 ms [gpu 2.4x faster]
triaxial_ellipsoid background=0 sld=1 sld_solvent=0 radius_equat_minor=200 radius_equat_major=200 radius_polar=200
cpu double: 28.0 ms gpu: 6.8 ms [gpu 4.1x faster]=== big rods: a=1000 b=2000 c=200000 ===
core_shell_bicelle background=0 radius=900.0 thick_rim=100.0 thick_face=100.0 length=199800.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0
cpu double: 94.9 ms gpu: 5.4 ms [gpu 17.5x faster]
core_shell_cylinder background=0 radius=900.0 thickness=100.0 length=199800.0 sld_core=0 sld_shell=1 sld_solvent=0
cpu double: 104.8 ms gpu: 5.6 ms [gpu 18.8x faster]
core_shell_ellipsoid background=0 radius_equat_core=900.0 x_core=111.0 thick_shell=100.0 x_polar_shell=1 sld_core=0 sld_shell=1 sld_solvent=0
cpu double: 43.7 ms gpu: 3.0 ms [gpu 14.6x faster]
cylinder background=0 sld=1 sld_solvent=0 radius=1000.0 length=200000
cpu double: 64.1 ms gpu: 3.9 ms [gpu 16.5x faster]
ellipsoid background=0 sld=1 sld_solvent=0 radius_equatorial=1000.0 radius_polar=100000.0
cpu double: 19.7 ms gpu: 2.9 ms [gpu 6.9x faster]
flexible_cylinder background=0 length=2000000 kuhn_length=200000 radius=2000 sld=1 sld_solvent=0
cpu double: 0.3 ms gpu: 1.3 ms [gpu 4.5x slower] ***
hollow_cylinder background=0 radius=900.0 thickness=100.0 length=200000 sld=1 sld_solvent=0
cpu double: 86.7 ms gpu: 8.3 ms [gpu 10.5x faster]
stacked_disks background=0 thick_core=18000.0 thick_layer=1000.0 radius=1000.0 n_stacking=10 sigma_d=1000.0 sld_core=0 sld_layer=1 sld_solvent=0
cpu double: 152.5 ms gpu: 11.0 ms [gpu 13.9x faster]
barbell background=0 radius=500.0 radius_bell=1000.0 length=196267.94919243112 sld=1 sld_solvent=0
! ** barbell is slow: 2.6 s for 201 points in [1e-05, 1.0]
cpu double: 2648.0 ms gpu: 131.0 ms [gpu 20.2x faster]
capped_cylinder background=0 radius=500.0 radius_cap=1000.0 length=199732.05080756888 sld=1 sld_solvent=0
! ** capped_cylinder is slow: 2.1 s for 201 points in [1e-05, 1.0]
cpu double: 2054.1 ms gpu: 127.4 ms [gpu 16.1x faster]
core_shell_bicelle_elliptical_belt_rough background=0 radius=450.0 x_core=2.111111111111111 thick_rim=50.0 thick_face=50.0 length=199900.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0 sigma=5.0
cpu double: 563.1 ms gpu: 26.7 ms [gpu 21.1x faster]
core_shell_bicelle_elliptical background=0 radius=450.0 x_core=2.111111111111111 thick_rim=50.0 thick_face=50.0 length=199900.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0
cpu double: 568.4 ms gpu: 28.1 ms [gpu 20.2x faster]
core_shell_parallelepiped background=0 sld_core=0 sld_a=1 sld_b=1 sld_c=1 sld_solvent=0 length_a=900.0 length_b=1900.0 length_c=199900.0 thick_rim_a=50.0 thick_rim_b=50.0 thick_rim_c=50.0
cpu double: 424.4 ms gpu: 17.1 ms [gpu 24.8x faster]
elliptical_cylinder background=0 radius_minor=500.0 axis_ratio=0.01 length=200000 sld=1 sld_solvent=0
cpu double: 265.6 ms gpu: 14.5 ms [gpu 18.4x faster]
flexible_cylinder_elliptical background=0 length=2000000 kuhn_length=200000 radius=1000 axis_ratio=2.0 sld=1 sld_solvent=0
cpu double: 12.1 ms gpu: 3.7 ms [gpu 3.2x faster]
hollow_rectangular_prism_thin_walls background=0 sld=1 sld_solvent=0 length_a=1000 b2a_ratio=2.0 c2a_ratio=200.0
cpu double: 425.7 ms gpu: 19.7 ms [gpu 21.6x faster]
hollow_rectangular_prism background=0 sld=1 sld_solvent=0 length_a=1000 b2a_ratio=2.0 c2a_ratio=200.0 thickness=50.0
cpu double: 1149.9 ms gpu: 39.2 ms [gpu 29.3x faster]
parallelepiped background=0 sld=1 sld_solvent=0 length_a=1000 length_b=2000 length_c=200000
cpu double: 1087.3 ms gpu: 43.5 ms [gpu 25.0x faster]
pringle background=0 radius=1000.0 thickness=200000 alpha=0.5 beta=0.5 sld=1 sld_solvent=0
! ** pringle is slow: 70.7 s for 201 points in [1e-05, 1.0]
cpu double: 70700.5 ms gpu: 179.9 ms [gpu 393.0x faster]
rectangular_prism background=0 sld=1 sld_solvent=0 length_a=1000 b2a_ratio=2.0 c2a_ratio=200.0
cpu double: 288.6 ms gpu: 10.3 ms [gpu 28.1x faster]
triaxial_ellipsoid background=0 sld=1 sld_solvent=0 radius_equat_minor=1000 radius_equat_major=2000 radius_polar=200000
cpu double: 302.4 ms gpu: 14.5 ms [gpu 20.9x faster]=== big disks: a=180000 b=200000 c=1000 ===
core_shell_bicelle background=0 radius=90000.0 thick_rim=10000.0 thick_face=10000.0 length=-19000.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0
cpu double: 202.5 ms gpu: 6.7 ms [gpu 30.3x faster]
core_shell_cylinder background=0 radius=90000.0 thickness=10000.0 length=-19000.0 sld_core=0 sld_shell=1 sld_solvent=0
cpu double: 213.7 ms gpu: 7.1 ms [gpu 30.2x faster]
core_shell_ellipsoid background=0 radius_equat_core=90000.0 x_core=-0.10555555555555556 thick_shell=10000.0 x_polar_shell=1 sld_core=0 sld_shell=1 sld_solvent=0
cpu double: 80.6 ms gpu: 4.1 ms [gpu 19.7x faster]
cylinder background=0 sld=1 sld_solvent=0 radius=100000.0 length=1000
cpu double: 125.9 ms gpu: 6.4 ms [gpu 19.6x faster]
ellipsoid background=0 sld=1 sld_solvent=0 radius_equatorial=100000.0 radius_polar=500.0
cpu double: 36.3 ms gpu: 3.8 ms [gpu 9.6x faster]
flexible_cylinder background=0 length=10000 kuhn_length=1000 radius=200000 sld=1 sld_solvent=0
cpu double: 0.6 ms gpu: 2.4 ms [gpu 4.0x slower] ***
hollow_cylinder background=0 radius=90000.0 thickness=10000.0 length=1000 sld=1 sld_solvent=0
cpu double: 198.4 ms gpu: 13.7 ms [gpu 14.4x faster]
stacked_disks background=0 thick_core=90.0 thick_layer=5.0 radius=100000.0 n_stacking=10 sigma_d=5.0 sld_core=0 sld_layer=1 sld_solvent=0
cpu double: 472.1 ms gpu: 10.2 ms [gpu 46.5x faster]
barbell background=0 radius=90000.0 radius_bell=100000.0 length=0 sld=1 sld_solvent=0
! ** barbell is slow: 7.8 s for 201 points in [1e-05, 1.0]
cpu double: 7831.3 ms gpu: 145.6 ms [gpu 53.8x faster]
capped_cylinder background=0 radius=90000.0 radius_cap=100000.0 length=0 sld=1 sld_solvent=0
! ** capped_cylinder is slow: 7.0 s for 201 points in [1e-05, 1.0]
cpu double: 7014.7 ms gpu: 134.7 ms [gpu 52.1x faster]
core_shell_bicelle_elliptical_belt_rough background=0 radius=81000.0 x_core=1.123456790123457 thick_rim=9000.0 thick_face=9000.0 length=-17000.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0 sigma=900.0
cpu double: 1976.7 ms gpu: 64.0 ms [gpu 30.9x faster]
core_shell_bicelle_elliptical background=0 radius=81000.0 x_core=1.123456790123457 thick_rim=9000.0 thick_face=9000.0 length=-17000.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0
cpu double: 1906.4 ms gpu: 61.7 ms [gpu 30.9x faster]
core_shell_parallelepiped background=0 sld_core=0 sld_a=1 sld_b=1 sld_c=1 sld_solvent=0 length_a=162000.0 length_b=182000.0 length_c=-17000.0 thick_rim_a=9000.0 thick_rim_b=9000.0 thick_rim_c=9000.0
cpu double: 1031.6 ms gpu: 19.1 ms [gpu 53.9x faster]
elliptical_cylinder background=0 radius_minor=90000.0 axis_ratio=200.0 length=1000 sld=1 sld_solvent=0
! ** elliptical_cylinder is slow: 2.0 s for 201 points in [1e-05, 1.0]
cpu double: 2025.1 ms gpu: 39.4 ms [gpu 51.4x faster]
flexible_cylinder_elliptical background=0 length=10000 kuhn_length=1000 radius=180000 axis_ratio=1.1111111111111112 sld=1 sld_solvent=0
cpu double: 141.8 ms gpu: 7.7 ms [gpu 18.4x faster]
hollow_rectangular_prism_thin_walls background=0 sld=1 sld_solvent=0 length_a=180000 b2a_ratio=1.1111111111111112 c2a_ratio=0.005555555555555556
cpu double: 765.7 ms gpu: 10.0 ms [gpu 76.4x faster]
hollow_rectangular_prism background=0 sld=1 sld_solvent=0 length_a=180000 b2a_ratio=1.1111111111111112 c2a_ratio=0.005555555555555556 thickness=9000.0
cpu double: 863.0 ms gpu: 36.8 ms [gpu 23.4x faster]
parallelepiped background=0 sld=1 sld_solvent=0 length_a=180000 length_b=200000 length_c=1000
cpu double: 557.8 ms gpu: 13.1 ms [gpu 42.6x faster]
pringle background=0 radius=100000.0 thickness=1000 alpha=0.09999999999999998 beta=0.09999999999999998 sld=1 sld_solvent=0
! ** pringle is slow: 31.5 s for 201 points in [1e-05, 1.0]
cpu double: 31505.9 ms gpu: 469.0 ms [gpu 67.2x faster]
rectangular_prism background=0 sld=1 sld_solvent=0 length_a=180000 b2a_ratio=1.1111111111111112 c2a_ratio=0.005555555555555556
cpu double: 371.4 ms gpu: 10.3 ms [gpu 36.1x faster]
triaxial_ellipsoid background=0 sld=1 sld_solvent=0 radius_equat_minor=180000 radius_equat_major=200000 radius_polar=1000
cpu double: 287.5 ms gpu: 16.2 ms [gpu 17.8x faster]=== big cubes: a=200000 b=200000 c=200000 ===
core_shell_bicelle background=0 radius=90000.0 thick_rim=10000.0 thick_face=10000.0 length=180000.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0
cpu double: 157.0 ms gpu: 5.7 ms [gpu 27.5x faster]
core_shell_cylinder background=0 radius=90000.0 thickness=10000.0 length=180000.0 sld_core=0 sld_shell=1 sld_solvent=0
cpu double: 135.3 ms gpu: 5.9 ms [gpu 22.8x faster]
core_shell_ellipsoid background=0 radius_equat_core=90000.0 x_core=1.0 thick_shell=10000.0 x_polar_shell=1 sld_core=0 sld_shell=1 sld_solvent=0
cpu double: 50.2 ms gpu: 3.6 ms [gpu 14.1x faster]
cylinder background=0 sld=1 sld_solvent=0 radius=100000.0 length=200000
cpu double: 80.1 ms gpu: 4.5 ms [gpu 17.9x faster]
ellipsoid background=0 sld=1 sld_solvent=0 radius_equatorial=100000.0 radius_polar=100000.0
cpu double: 21.1 ms gpu: 2.2 ms [gpu 9.4x faster]
flexible_cylinder background=0 length=2000000 kuhn_length=200000 radius=200000 sld=1 sld_solvent=0
cpu double: 0.3 ms gpu: 1.3 ms [gpu 3.7x slower] ***
hollow_cylinder background=0 radius=90000.0 thickness=10000.0 length=200000 sld=1 sld_solvent=0
cpu double: 100.7 ms gpu: 8.6 ms [gpu 11.7x faster]
stacked_disks background=0 thick_core=18000.0 thick_layer=1000.0 radius=100000.0 n_stacking=10 sigma_d=1000.0 sld_core=0 sld_layer=1 sld_solvent=0
cpu double: 229.2 ms gpu: 10.9 ms [gpu 21.0x faster]
barbell background=0 radius=100000.0 radius_bell=100000.0 length=0.0 sld=1 sld_solvent=0
! ** barbell is slow: 3.0 s for 201 points in [1e-05, 1.0]
cpu double: 3019.8 ms gpu: 131.3 ms [gpu 23.0x faster]
capped_cylinder background=0 radius=100000.0 radius_cap=100000.0 length=0.0 sld=1 sld_solvent=0
! ** capped_cylinder is slow: 3.1 s for 201 points in [1e-05, 1.0]
cpu double: 3073.2 ms gpu: 133.3 ms [gpu 23.1x faster]
core_shell_bicelle_elliptical_belt_rough background=0 radius=90000.0 x_core=1.0 thick_rim=10000.0 thick_face=10000.0 length=180000.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0 sigma=1000.0
cpu double: 853.5 ms gpu: 31.3 ms [gpu 27.3x faster]
core_shell_bicelle_elliptical background=0 radius=90000.0 x_core=1.0 thick_rim=10000.0 thick_face=10000.0 length=180000.0 sld_core=0 sld_face=1 sld_rim=1 sld_solvent=0
cpu double: 841.5 ms gpu: 29.1 ms [gpu 28.9x faster]
core_shell_parallelepiped background=0 sld_core=0 sld_a=1 sld_b=1 sld_c=1 sld_solvent=0 length_a=180000.0 length_b=180000.0 length_c=180000.0 thick_rim_a=10000.0 thick_rim_b=10000.0 thick_rim_c=10000.0
cpu double: 505.0 ms gpu: 17.8 ms [gpu 28.3x faster]
elliptical_cylinder background=0 radius_minor=100000.0 axis_ratio=1.0 length=200000 sld=1 sld_solvent=0
cpu double: 506.8 ms gpu: 15.2 ms [gpu 33.4x faster]
flexible_cylinder_elliptical background=0 length=2000000 kuhn_length=200000 radius=200000 axis_ratio=1.0 sld=1 sld_solvent=0
cpu double: 56.0 ms gpu: 5.6 ms [gpu 10.0x faster]
hollow_rectangular_prism_thin_walls background=0 sld=1 sld_solvent=0 length_a=200000 b2a_ratio=1.0 c2a_ratio=1.0
cpu double: 360.5 ms gpu: 8.3 ms [gpu 43.2x faster]
hollow_rectangular_prism background=0 sld=1 sld_solvent=0 length_a=200000 b2a_ratio=1.0 c2a_ratio=1.0 thickness=10000.0
cpu double: 528.7 ms gpu: 15.6 ms [gpu 34.0x faster]
parallelepiped background=0 sld=1 sld_solvent=0 length_a=200000 length_b=200000 length_c=200000
cpu double: 307.3 ms gpu: 11.8 ms [gpu 26.1x faster]
pringle background=0 radius=100000.0 thickness=200000 alpha=0.001 beta=0.001 sld=1 sld_solvent=0
! ** pringle is slow: 12.2 s for 201 points in [1e-05, 1.0]
cpu double: 12232.7 ms gpu: 266.8 ms [gpu 45.9x faster]
rectangular_prism background=0 sld=1 sld_solvent=0 length_a=200000 b2a_ratio=1.0 c2a_ratio=1.0
cpu double: 274.2 ms gpu: 9.5 ms [gpu 28.8x faster]
triaxial_ellipsoid background=0 sld=1 sld_solvent=0 radius_equat_minor=200000 radius_equat_major=200000 radius_polar=200000
cpu double: 179.7 ms gpu: 18.0 ms [gpu 10.0x faster]
| // To force a fixed rather than adaptive integration scheme, replace [..., "lib/adaptive.c", ...] | ||
| // with [..., "lib/gauss<n>.c", "lib/nonadaptive.c", ...] in your source lists. | ||
|
|
||
| // Hack for barbell and capped cylinder keeps the outer integral to 76 points or fewer |
There was a problem hiding this comment.
Neither of the models listed here are importing nonadaptive.c. Is this still used?
There was a problem hiding this comment.
nonadaptive.c is used with generate.set_integration_size. It is invoked with sasmodels.compare -ngauss=76 for example to set a fixed order 76 gaussian integration.
|
I agree -- this sounds like a major win and would be nice to include. I note that there is now a conflict with |
|
truncated octahedron had a bunch of renames of parameters and even the model name. This was initially set on top of the adaptive integration branch. When it looked like the adaptive integration branch wasn't going to be merged, a separate pr with just the renames was created. Merge the adaptive integration one and delete the other if this PR is applied, otherwise merge the other and redo the adaptive integration changes (including the reordering of the axes). |
There is also the speed of adaptive (or adaptive with gpu) relative to the existing 76-point gaussian (it is faster for small shapes but much slower for large shapes) compared to the change in accuracy. This is a complicated tradeoff. Note that accuracy improvement is highly nonlinear. It goes from being excellent to poor with a small change in integration order. |
|
I'm going ahead and merging this for the alpha |
Ready for review
This replaces PR #608 using a simple heuristic based on qr to set the number of integration points. (See #248)
Accuracy is usually comparable to a 10000 point gaussian integration. The target is 1e-10 difference for small shapes (20 nm), and 2e-5 for large shapes (20 μm), though it isn't always achieved. For example, the following has a much as 3.5% error over some of its range:
Because we include a 20 point gaussian integration scheme, speed is frequently faster than the fixed 76 point gaussian integration in master, at least for small shapes. For large shapes it can be several times slower than the fixed scheme, though the increase in accuracy easily justifies the cost.
I've added
explore/check_adaptive.pyto systematically check all models with difference aspect ratios (rod, disk, cube) for both small and large shapes. Because the cost for a 10000 point gaussian with nested integration is so high accuracy checks are limited to a few points per model at high q.The grid size for 2D adaptive integration is now limited to 100000 points and the outer loop is limited to 500 points, so the following are possible: (500x76, 500x20, 76x500, 76x76, 76x20, 20x5000, 20x500, 20x76, 20x20)
Latex triaxial ellipsoid model (see example/simul_fit.py) is now acceptable, albeit much slower than the pure 76x76 grid. Accuracy is improved compared to 76x76. Because 20x20 grids are available speed is usually much faster, at least for CPU models. GPU may still need some tuning.
Status