CA-426408: Consider chunks in task expiry calculation#7074
Conversation
A test of VM with 239 VBDs (1 system disk + 238 iSCSI VBDs)
fails because of `Timed out while waiting on task xxx`.
In the fail task:
```
VM.start (task 61)
→ perform_atomics([..., Parallel("Devices.plug"), ...])
→ perform_atomic(Parallel("Devices.plug"))
↓
Parallel:task=61.atoms=2.(Devices.plug (no qemu))
│
│ queue_atomics_and_wait(ops=[atom0, atom1])
│ expiration(atom0) = atomic_expires_after(Nested_parallel(239))
│ = Float.max(3600, ...) = 3600s ← BUG
│ event_wait(task=79, timeout=3600s)
│
├─ task 79 (worker): Nested_parallel:task=79.atoms=239.(VBDs.activate_epoch_and_plug RW)
│ │
│ │ queue_atomics_and_wait(239 ops, chunks of 10)
│ ├─ chunk 0: 10 × Serial(VBD_set_active, VBD_epoch_begin, VBD_plug)
│ ├─ chunk 1: 10 × ...
│ ├─ ...
│ └─ chunk 23: 9 × ...
│ └─ total: 82 minutes
│
└─ task 80 (worker): Serial(VIF_set_active, VIF_plug)
```
The task expiry calculation for parallel and nested_parallel
doesn't consider the multi chunks. Just get the max expiry in
the parallel task list. In fact the parallel task split to
24 chunks to run.
Signed-off-by: Changlei Li <changlei.li@citrix.com>
|
More log |
|
I wonder if this PR allows to increase the number of VBDs in the limit test back to 255? I remember testing an increased VBD limit with #6577 , and limit tests failing because of timeouts. |
|
Could you summarise how the timeout was calculated before and after the change? I have not understood the difference. Do we want a timeout for the entire set of of plus, or per individual plug? And is the new idea to batch plus into groups, each of which has now a timeout? |
| (List.length ops + max_parallel_atoms - 1) / max_parallel_atoms | ||
| in | ||
| let per_chunk = | ||
| List.map atomic_expires_after ops |> List.fold_left Float.max 0. |
There was a problem hiding this comment.
This is the max of the individual ops. It's one of the ops, not a chunk.
There was a problem hiding this comment.
For example, if I understand correctly, it could be something like:
let expires_of = List.map atomic_expires_after in
let max_of = List.fold_left Float.max 0. in
Xenops_utils.chunks max_parallel_atoms ops
|> List.map (fun ops -> ops |> expires_of |> max_of)
|> List.fold_left ( +. ) 0.
There was a problem hiding this comment.
You are right. This is more accurate.
There was a problem hiding this comment.
Is it correct to say the timeout calculation follows the structure of the operation? For parallel ops take the maximum, for sequential ops take the sum - and this is evaluated bottom up over this tree of operations.
There was a problem hiding this comment.
I understand the parallel ops are split as chunks. The real parallel happens within each chunk. Chunks are serialized. Like there are 100 threads in parallel from programmer's point of view, but actually only 10 CPU cores under the hood.
Hi @lindig. In the original code task79 is atom0 of task61. When it runs, it splits to 24 chunks, and executes in serial. In one chunks, 10 tasks executes in parallel. In every chunk, it calculates timeout, (will also be 60 min), so task 79 can complete(finally takes 82 min in the test), but task 61 can't wait so long. In the view of task61, it thinks all Nested_parallel(239 tasks) execute in parallel. But that is wrong. It splits to 24 chunks in serial. |
psafont
left a comment
There was a problem hiding this comment.
The code previously expected all the tasks to happen in parallel, instead of being serialized in chunks, where tasks are parallelized.
The new approach is more lax than necessary when calculating the timeout. This is because uses the timeout for the longest atom for each chunk, instead of taking the longest of each chunk, and then adding it. This might cause timeouts to trigger much later if a task gets hung, but it's difficult to know if it's going to be a big issue or not.
@psafont I will change to the calculation of Ming's comment, i.e. taking the longest of each chunk, and then adding it you mentioned and have a test. |
Signed-off-by: Changlei Li <changlei.li@citrix.com>
|
test: |
A test of VM with 239 VBDs (1 system disk + 238 iSCSI VBDs) fails because of
Timed out while waiting on task xxx. In the fail task:The task expiry calculation for parallel and nested_parallel doesn't consider the multi chunks. Just get the max expiry in the parallel task list. In fact the parallel task split to 24 chunks to run.