Skip to content

CA-426408: Consider chunks in task expiry calculation#7074

Merged
changlei-li merged 2 commits into
xapi-project:masterfrom
changlei-li:private/changleli/calculate-task-expiry
May 20, 2026
Merged

CA-426408: Consider chunks in task expiry calculation#7074
changlei-li merged 2 commits into
xapi-project:masterfrom
changlei-li:private/changleli/calculate-task-expiry

Conversation

@changlei-li
Copy link
Copy Markdown
Contributor

A test of VM with 239 VBDs (1 system disk + 238 iSCSI VBDs) fails because of Timed out while waiting on task xxx. In the fail task:

VM.start (task 61)
  → perform_atomics([..., Parallel("Devices.plug"), ...])
    → perform_atomic(Parallel("Devices.plug"))
      ↓
      Parallel:task=61.atoms=2.(Devices.plug (no qemu))
      │
      │ queue_atomics_and_wait(ops=[atom0, atom1])
      │   expiration(atom0) = atomic_expires_after(Nested_parallel(239))
      │                     = Float.max(3600, ...) = 3600s  ← BUG
      │   event_wait(task=79, timeout=3600s)
      │
      ├─ task 79 (worker): Nested_parallel:task=79.atoms=239.(VBDs.activate_epoch_and_plug RW)
      │    │
      │    │ queue_atomics_and_wait(239 ops, chunks of 10)
      │    ├─ chunk 0:  10 × Serial(VBD_set_active, VBD_epoch_begin, VBD_plug)
      │    ├─ chunk 1:  10 × ...
      │    ├─ ...
      │    └─ chunk 23:  9 × ...
      │    └─ total: 82 minutes
      │
      └─ task 80 (worker): Serial(VIF_set_active, VIF_plug)

The task expiry calculation for parallel and nested_parallel doesn't consider the multi chunks. Just get the max expiry in the parallel task list. In fact the parallel task split to 24 chunks to run.

A test of VM with 239 VBDs (1 system disk + 238 iSCSI VBDs)
fails because of `Timed out while waiting on task xxx`.
In the fail task:
```
VM.start (task 61)
  → perform_atomics([..., Parallel("Devices.plug"), ...])
    → perform_atomic(Parallel("Devices.plug"))
      ↓
      Parallel:task=61.atoms=2.(Devices.plug (no qemu))
      │
      │ queue_atomics_and_wait(ops=[atom0, atom1])
      │   expiration(atom0) = atomic_expires_after(Nested_parallel(239))
      │                     = Float.max(3600, ...) = 3600s  ← BUG
      │   event_wait(task=79, timeout=3600s)
      │
      ├─ task 79 (worker): Nested_parallel:task=79.atoms=239.(VBDs.activate_epoch_and_plug RW)
      │    │
      │    │ queue_atomics_and_wait(239 ops, chunks of 10)
      │    ├─ chunk 0:  10 × Serial(VBD_set_active, VBD_epoch_begin, VBD_plug)
      │    ├─ chunk 1:  10 × ...
      │    ├─ ...
      │    └─ chunk 23:  9 × ...
      │    └─ total: 82 minutes
      │
      └─ task 80 (worker): Serial(VIF_set_active, VIF_plug)
```
The task expiry calculation for parallel and nested_parallel
doesn't consider the multi chunks. Just get the max expiry in
the parallel task list. In fact the parallel task split to
24 chunks to run.

Signed-off-by: Changlei Li <changlei.li@citrix.com>
@changlei-li changlei-li requested a review from psafont May 15, 2026 06:22
@changlei-li
Copy link
Copy Markdown
Contributor Author

More log

2026-04-18T17:27:30.471958+00:00 gpuuk-43-04d xenopsd-xc: [debug||10 |Async.VM.start R:817db30a35e3|xenops_server] begin_Parallel:task=61.atoms=2.(Devices.plug (no qemu) VM=76c45aa2-26b8-29bb-6ce0-21b74e2fa867)
...
2026-04-18T17:27:30.489186+00:00 gpuuk-43-04d xenopsd-xc: [debug||11 |Parallel:task=61.atoms=2.(Devices.plug (no qemu) VM=76c45aa2-26b8-29bb-6ce0-21b74e2fa867)|xenops_server] begin_Nested_parallel:task=79.atoms=239.(VBDs.activate_epoch_and_plug RW VM=76c45aa2-26b8-29bb-6ce0-21b74e2fa867)
...
2026-04-18T18:27:30.831718+00:00 gpuuk-43-04d xenopsd-xc: [debug||10 |Async.VM.start R:817db30a35e3|xenops_server] end_Parallel:task=61.atoms=2.(Devices.plug (no qemu) VM=76c45aa2-26b8-29bb-6ce0-21b74e2fa867)
2026-04-18T18:27:30.831811+00:00 gpuuk-43-04d xenopsd-xc: [error||10 |Async.VM.start R:817db30a35e3|xenops_server] Timed out while waiting on task 79 (Parallel:task=61.atoms=2.(Devices.plug (no qemu) VM=76c45aa2-26b8-29bb-6ce0-21b74e2fa867))
...
2026-04-18T18:49:52.149702+00:00 gpuuk-43-04d xenopsd-xc: [debug||11 |Parallel:task=61.atoms=2.(Devices.plug (no qemu) VM=76c45aa2-26b8-29bb-6ce0-21b74e2fa867)|xenops_server] end_Nested_parallel:task=79.atoms=239.(VBDs.activate_epoch_and_plug RW VM=76c45aa2-26b8-29bb-6ce0-21b74e2fa867)

@last-genius
Copy link
Copy Markdown
Contributor

I wonder if this PR allows to increase the number of VBDs in the limit test back to 255? I remember testing an increased VBD limit with #6577 , and limit tests failing because of timeouts.

@lindig
Copy link
Copy Markdown
Contributor

lindig commented May 15, 2026

Could you summarise how the timeout was calculated before and after the change? I have not understood the difference. Do we want a timeout for the entire set of of plus, or per individual plug? And is the new idea to batch plus into groups, each of which has now a timeout?

Comment thread ocaml/xenopsd/lib/xenops_server.ml Outdated
(List.length ops + max_parallel_atoms - 1) / max_parallel_atoms
in
let per_chunk =
List.map atomic_expires_after ops |> List.fold_left Float.max 0.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the max of the individual ops. It's one of the ops, not a chunk.

Copy link
Copy Markdown
Member

@minglumlu minglumlu May 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, if I understand correctly, it could be something like:

let expires_of = List.map atomic_expires_after in
let max_of = List.fold_left Float.max 0. in
Xenops_utils.chunks max_parallel_atoms ops
|> List.map (fun ops -> ops |> expires_of |> max_of)
|> List.fold_left ( +. ) 0.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. This is more accurate.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct to say the timeout calculation follows the structure of the operation? For parallel ops take the maximum, for sequential ops take the sum - and this is evaluated bottom up over this tree of operations.

Copy link
Copy Markdown
Member

@minglumlu minglumlu May 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the parallel ops are split as chunks. The real parallel happens within each chunk. Chunks are serialized. Like there are 100 threads in parallel from programmer's point of view, but actually only 10 CPU cores under the hood.

@changlei-li
Copy link
Copy Markdown
Contributor Author

Could you summarise how the timeout was calculated before and after the change? I have not understood the difference. Do we want a timeout for the entire set of of plus, or per individual plug? And is the new idea to batch plus into groups, each of which has now a timeout?

Hi @lindig. In the original code
For task61, it has atom0: Nested_parallel(239 tasks). To calculate the timeout, it go though the Nested_parallel 239 tasks. Since every task has 3 Serial ops. So every task 60 min. Then it get the max timeout of all the task list, that is, 60 min as task61 timeout.

task79 is atom0 of task61. When it runs, it splits to 24 chunks, and executes in serial. In one chunks, 10 tasks executes in parallel. In every chunk, it calculates timeout, (will also be 60 min), so task 79 can complete(finally takes 82 min in the test), but task 61 can't wait so long.

In the view of task61, it thinks all Nested_parallel(239 tasks) execute in parallel. But that is wrong. It splits to 24 chunks in serial.

Copy link
Copy Markdown
Member

@psafont psafont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code previously expected all the tasks to happen in parallel, instead of being serialized in chunks, where tasks are parallelized.

The new approach is more lax than necessary when calculating the timeout. This is because uses the timeout for the longest atom for each chunk, instead of taking the longest of each chunk, and then adding it. This might cause timeouts to trigger much later if a task gets hung, but it's difficult to know if it's going to be a big issue or not.

@changlei-li
Copy link
Copy Markdown
Contributor Author

The code previously expected all the tasks to happen in parallel, instead of being serialized in chunks, where tasks are parallelized.

The new approach is more lax than necessary when calculating the timeout. This is because uses the timeout for the longest atom for each chunk, instead of taking the longest of each chunk, and then adding it. This might cause timeouts to trigger much later if a task gets hung, but it's difficult to know if it's going to be a big issue or not.

@psafont I will change to the calculation of Ming's comment, i.e. taking the longest of each chunk, and then adding it you mentioned and have a test.

Signed-off-by: Changlei Li <changlei.li@citrix.com>
@changlei-li
Copy link
Copy Markdown
Contributor Author

test:
New timeout for 24 chunks is 86400. No timeout error now.

2026-05-20T00:52:23.772418+00:00 genuk-21-09d xenopsd-xc: [debug||34 |Async.VM.start R:8491a7289fd6|xenops_server] begin_Parallel:task=64.atoms=2.(Devices.plug (no qemu) VM=839491b4-3069-4e26-e609-2e6718f0d3a5)
2026-05-20T00:52:23.779293+00:00 genuk-21-09d xenopsd-xc: [debug||34 |Async.VM.start R:8491a7289fd6|xenops_server] Task 90 expires_after=86400.000000
2026-05-20T00:52:23.782596+00:00 genuk-21-09d xenopsd-xc: [debug||43 |Parallel:task=64.atoms=2.(Devices.plug (no qemu) VM=839491b4-3069-4e26-e609-2e6718f0d3a5)|xenops_server] begin_Nested_parallel:task=90.atoms=239.(VBDs.activate_epoch_and_plug RW VM=839491b4-3069-4e26-e609-2e6718f0d3a5)
...
2026-05-20T03:23:22.279141+00:00 genuk-21-09d xenopsd-xc: [debug||43 |Parallel:task=64.atoms=2.(Devices.plug (no qemu) VM=839491b4-3069-4e26-e609-2e6718f0d3a5)|xenops_server] end_Nested_parallel:task=90.atoms=239.(VBDs.activate_epoch_and_plug RW VM=839491b4-3069-4e26-e609-2e6718f0d3a5)
2026-05-20T03:23:22.282805+00:00 genuk-21-09d xenopsd-xc: [debug||34 |Async.VM.start R:8491a7289fd6|xenops_server] end_Parallel:task=64.atoms=2.(Devices.plug (no qemu) VM=839491b4-3069-4e26-e609-2e6718f0d3a5)

@changlei-li changlei-li added this pull request to the merge queue May 20, 2026
Merged via the queue into xapi-project:master with commit 54a5bce May 20, 2026
16 checks passed
@changlei-li changlei-li deleted the private/changleli/calculate-task-expiry branch May 20, 2026 05:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants