Agenda
- Alternative Random Number Generators Test Results
- EET
- Poisson sampling, RNG implementation strategy
- EET investigation strategy
- Task check-ins (5/12)
- ActivitySim Application & Analysis Guide?
- Explicit Telecommute Update?
- Model Calibration Partial Automation Design?
- Share existing calibration notebooks in consortium directory
- Telecommute Frequency Model Update and Tests (5/19)
Notes
Admin
Joe noted that ActivitySim meetings are being migrated to a new Zoom account to resolve a concurrent-meeting conflict in the Zephyr license. Participants should expect a batch of revised meeting invites with updated links over the coming days.
Alternative Random Number Generators Test Results (RSG)
David presented an overview of why RNG performance is a bottleneck for Explicit Error Terms (EET) and summarized his testing of alternative approaches.
Why EET draws so many more random numbers: Legacy ActivitySim draws one random number per chooser to make a choice. EET draws one random number per alternative per chooser (a Gumbel draw), which scales dramatically in two-zone models. For SANDAG, with ~500,000 workers, 6,000 TAZs, 30 sampled TAZs, and 30,000 MAZs, this produces on the order of 105 billion draws — making the current RNG a serious runtime bottleneck.
Why EET produces better scenario results: Because error terms are held constant across scenarios, only changes in deterministic components of the utility shift a person's choice. In the legacy approach, the same random number applied to shifted probability distributions can produce spurious mode switches — EET eliminates this instability.
Hash-based stateless RNG: David proposed and tested a stateless, hash-based RNG to avoid costly NumPy reseeding. The approach combines a global seed, chooser ID, model ID, alternative ID, and offset into a 64-bit integer, scrambles it with a fast hash mixer, and draws a uniform value (convertible to a Gumbel via log-log). Testing showed this approach passes uniformity and correlation checks, preserves ActivitySim's reproducibility invariant (same inputs → same output), and is substantially faster than the current reseeding-heavy approach.
RNG Investigation — Jeff Newman (Driftless Analytics)
Jeff independently pursued a complementary approach: using modern, compact-state RNGs (PCG64 or SFC64) with fast hash-based seeding, but retaining per-chooser state in memory to avoid reseeding within a step. Key points:
- The Mersenne Twister (MT19937) requires ~2.5 KB of state per chooser, making in-memory state retention impractical at scale. PCG64/SFC64 require only 16 bytes, making it feasible to hold state for all choosers simultaneously.
- By not reseeding within a step, the speed benefit of faster generators is fully realized — Jeff measured 50–100× speedup in random number draws alone.
- Jeff noted a theoretical concern about hash-based generators relative to purpose-built algorithms (risk of correlated streams at scale), but argued this is not a practical issue for ActivitySim, since each chooser draws only thousands of numbers from its seed, not billions. Adopting PCG64 would offer additional protection against external criticism as a recognized standard.
Both David and Jeff confirmed their benchmarks measured raw draw speed only, not full ActivitySim runs.
Discussion
Joel raised the key open question: how much of the EET runtime penalty is actually from random number draws versus from log calculations and data structure overhead? Jan confirmed that a significant share of current EET runtime stems from unvectorized implementation — random numbers are generated in Python loops and materialized as dense in-memory arrays before utilities are computed. This can be substantially reduced without touching the RNG itself.
Jan noted that the Poisson sampling implementation would eliminate the largest current bottleneck (sampling), leaving the final choice step as the primary remaining difference from Monte Carlo. The group agreed that profiling is the essential next step to isolate costs precisely before committing to an RNG implementation path.
Joe summarized consensus: proceed with Poisson sampling integration (low cost, clear benefit), and continue RNG investigation in parallel pending profiling results.
Action Items
| Action |
Owner |
| Proceed with Poisson sampling implementation |
Jan Zill (Joe to confirm funding with exec team offline) |
| David to send email aligning on scope for remaining RNG investigation task |
David Hensle → Joe Castiglione |
| Jan to share profiling code, timings, and prototype implementations at Thursday engineering meeting |
Jan Zill |
| David to run additional profiling on SANDAG model |
David Hensle |
Agenda
Notes
Admin
Joe noted that ActivitySim meetings are being migrated to a new Zoom account to resolve a concurrent-meeting conflict in the Zephyr license. Participants should expect a batch of revised meeting invites with updated links over the coming days.
Alternative Random Number Generators Test Results (RSG)
Presentation
David presented an overview of why RNG performance is a bottleneck for Explicit Error Terms (EET) and summarized his testing of alternative approaches.
Why EET draws so many more random numbers: Legacy ActivitySim draws one random number per chooser to make a choice. EET draws one random number per alternative per chooser (a Gumbel draw), which scales dramatically in two-zone models. For SANDAG, with ~500,000 workers, 6,000 TAZs, 30 sampled TAZs, and 30,000 MAZs, this produces on the order of 105 billion draws — making the current RNG a serious runtime bottleneck.
Why EET produces better scenario results: Because error terms are held constant across scenarios, only changes in deterministic components of the utility shift a person's choice. In the legacy approach, the same random number applied to shifted probability distributions can produce spurious mode switches — EET eliminates this instability.
Hash-based stateless RNG: David proposed and tested a stateless, hash-based RNG to avoid costly NumPy reseeding. The approach combines a global seed, chooser ID, model ID, alternative ID, and offset into a 64-bit integer, scrambles it with a fast hash mixer, and draws a uniform value (convertible to a Gumbel via log-log). Testing showed this approach passes uniformity and correlation checks, preserves ActivitySim's reproducibility invariant (same inputs → same output), and is substantially faster than the current reseeding-heavy approach.
RNG Investigation — Jeff Newman (Driftless Analytics)
Jeff independently pursued a complementary approach: using modern, compact-state RNGs (PCG64 or SFC64) with fast hash-based seeding, but retaining per-chooser state in memory to avoid reseeding within a step. Key points:
Both David and Jeff confirmed their benchmarks measured raw draw speed only, not full ActivitySim runs.
Discussion
Joel raised the key open question: how much of the EET runtime penalty is actually from random number draws versus from log calculations and data structure overhead? Jan confirmed that a significant share of current EET runtime stems from unvectorized implementation — random numbers are generated in Python loops and materialized as dense in-memory arrays before utilities are computed. This can be substantially reduced without touching the RNG itself.
Jan noted that the Poisson sampling implementation would eliminate the largest current bottleneck (sampling), leaving the final choice step as the primary remaining difference from Monte Carlo. The group agreed that profiling is the essential next step to isolate costs precisely before committing to an RNG implementation path.
Joe summarized consensus: proceed with Poisson sampling integration (low cost, clear benefit), and continue RNG investigation in parallel pending profiling results.
Action Items