Commit 693ca9e
[Fix] serialize worker job submissions to preserve worker_id ordering (#935)
* fix: serialize worker job submissions to preserve worker_id ordering
Worker threads were submitting Flux jobs concurrently, causing
executorlib_worker_id to not correspond to the Flux scheduling order.
This made worker_id unreliable for resource mapping (e.g. GPU assignment).
Two changes:
- Add threading.Event chain in BlockAllocationTaskScheduler so each
worker waits for the previous worker to finish submitting before
starting its own submission.
- Call self._future.jobid() after FluxExecutor.submit() to block until
the job is actually registered with the Flux broker, not just queued
in the async FluxExecutor.
* Update blockallocation.py
* Update spawner_flux.py
* Update spawner_flux.py
---------
Co-authored-by: Ilgar Baghishov <ilgar@login15.chn.perlmutter.nersc.gov>
Co-authored-by: Jan Janssen <jan-janssen@users.noreply.github.com>1 parent 2fa06c8 commit 693ca9e
2 files changed
Lines changed: 20 additions & 1 deletion
Lines changed: 18 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
| 4 | + | |
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
| |||
83 | 83 | | |
84 | 84 | | |
85 | 85 | | |
| 86 | + | |
| 87 | + | |
86 | 88 | | |
87 | 89 | | |
88 | 90 | | |
| |||
91 | 93 | | |
92 | 94 | | |
93 | 95 | | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
94 | 102 | | |
95 | 103 | | |
96 | 104 | | |
| |||
217 | 225 | | |
218 | 226 | | |
219 | 227 | | |
| 228 | + | |
| 229 | + | |
220 | 230 | | |
221 | 231 | | |
222 | 232 | | |
| |||
245 | 255 | | |
246 | 256 | | |
247 | 257 | | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
248 | 261 | | |
| 262 | + | |
| 263 | + | |
249 | 264 | | |
250 | 265 | | |
251 | 266 | | |
| |||
256 | 271 | | |
257 | 272 | | |
258 | 273 | | |
| 274 | + | |
| 275 | + | |
259 | 276 | | |
260 | 277 | | |
261 | 278 | | |
| |||
Lines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
146 | 146 | | |
147 | 147 | | |
148 | 148 | | |
| 149 | + | |
| 150 | + | |
149 | 151 | | |
150 | 152 | | |
151 | 153 | | |
| |||
0 commit comments