Skip to content

Commit bd39a99

Browse files
Extend documentation with technical concepts and Mermaid diagrams
- Add docs/concepts.md explaining internal architecture, class hierarchy, and execution flow. - Include Mermaid class and sequence diagrams. - Add sphinxcontrib-mermaid to documentation dependencies and configuration. - Update docs/_toc.yml to include the new concepts page. Co-authored-by: jan-janssen <3854739+jan-janssen@users.noreply.github.com>
1 parent 0829868 commit bd39a99

4 files changed

Lines changed: 107 additions & 0 deletions

File tree

.ci_support/environment-docs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ dependencies:
99
- numpy
1010
- openmpi
1111
- sphinx
12+
- sphinxcontrib-mermaid
1213
- sphinx_rtd_theme
1314
- cloudpickle =3.1.2
1415
- h5py =3.16.0

docs/_config.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ launch_buttons:
1515

1616
sphinx:
1717
extra_extensions:
18+
- 'sphinxcontrib-mermaid'
1819
- 'sphinx.ext.autodoc'
1920
- 'sphinx.ext.napoleon'
2021
- 'sphinx.ext.viewcode'

docs/_toc.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ format: jb-book
22
root: README
33
chapters:
44
- file: installation.md
5+
- file: concepts.md
56
- file: 1-single-node.ipynb
67
- file: 2-hpc-cluster.ipynb
78
- file: 3-hpc-job.ipynb

docs/concepts.md

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
# Technical Concepts
2+
3+
The `executorlib` package is designed to up-scale Python functions for High Performance Computing (HPC) by extending the standard Python `Executor` interface. This document explains the underlying technical concepts and the internal architecture of `executorlib`.
4+
5+
## Internal Architecture
6+
7+
The `executorlib` library is structured into four primary modules:
8+
9+
* **`executor`**: Defines the user-facing `Executor` classes (e.g., `SingleNodeExecutor`, `SlurmClusterExecutor`, `SlurmJobExecutor`, `FluxClusterExecutor`, `FluxJobExecutor`). These classes provide the primary interface for users to submit tasks.
10+
* **`task_scheduler`**: Manages the distribution and scheduling of tasks. It handles task queues, resource allocation, and coordinates with spawners.
11+
* **`standalone`**: Contains utility functions and classes that do not depend on other internal modules. This includes serialization (using `cloudpickle`), ZMQ-based communication (`SocketInterface`), and input validation.
12+
* **`backend`**: Contains the code executed by the worker processes to perform the actual function calls.
13+
14+
## Class Hierarchy and Coupling
15+
16+
The following diagram illustrates the relationship between the main classes in `executorlib`.
17+
18+
```{mermaid}
19+
classDiagram
20+
class FutureExecutor {
21+
<<interface>>
22+
}
23+
class BaseExecutor {
24+
-_task_scheduler: TaskSchedulerBase
25+
+submit(fn, *args, **kwargs) Future
26+
+shutdown(wait)
27+
}
28+
class TaskSchedulerBase {
29+
-_future_queue: Queue
30+
-_process: Thread
31+
+submit(fn, *args, **kwargs) Future
32+
}
33+
class BaseSpawner {
34+
<<interface>>
35+
+bootup(command_lst)
36+
+shutdown(wait)
37+
}
38+
class SocketInterface {
39+
+send_dict(input_dict)
40+
+receive_dict() dict
41+
}
42+
43+
FutureExecutor <|-- BaseExecutor
44+
BaseExecutor o-- TaskSchedulerBase
45+
TaskSchedulerBase <|-- OneProcessTaskScheduler
46+
TaskSchedulerBase <|-- BlockAllocationTaskScheduler
47+
TaskSchedulerBase <|-- DependencyTaskScheduler
48+
TaskSchedulerBase <|-- FileTaskScheduler
49+
50+
OneProcessTaskScheduler o-- BaseSpawner
51+
BaseSpawner <|-- MpiExecSpawner
52+
BaseSpawner <|-- SrunSpawner
53+
BaseSpawner <|-- FluxPythonSpawner
54+
55+
OneProcessTaskScheduler ..> SocketInterface : uses
56+
```
57+
58+
## Execution Flow
59+
60+
When a user submits a function to an executor, several steps occur in the background to ensure the task is executed with the requested resources and the result is returned.
61+
62+
```{mermaid}
63+
sequenceDiagram
64+
participant User
65+
participant Executor
66+
participant TaskScheduler
67+
participant Spawner
68+
participant Backend
69+
70+
User->>Executor: submit(fn, args, resource_dict)
71+
Executor->>TaskScheduler: submit(fn, args, resource_dict)
72+
TaskScheduler->>TaskScheduler: Add to _future_queue
73+
TaskScheduler-->>User: Return Future object
74+
75+
Note over TaskScheduler, Spawner: Task loop in background thread
76+
77+
TaskScheduler->>Spawner: bootup(command)
78+
Spawner->>Backend: Start worker process
79+
TaskScheduler->>Backend: Send function and arguments (ZMQ/File)
80+
Backend->>Backend: Execute function
81+
Backend->>TaskScheduler: Send result (ZMQ/File)
82+
TaskScheduler->>User: Update Future with result
83+
```
84+
85+
## Communication Modes
86+
87+
`executorlib` supports two primary communication modes between the main process and the worker processes:
88+
89+
### Interactive Communication (ZMQ-based)
90+
Used by `SingleNodeExecutor` and `HPC Job Executor`. It leverages [ZeroMQ (ZMQ)](https://zeromq.org) and [cloudpickle](https://github.com/cloudpipe/cloudpickle) for high-performance, in-memory communication of Python objects. This mode is ideal for low-latency task distribution within an allocation.
91+
92+
### File-based Communication
93+
Used by the `HPC Cluster Executor`. It uses the filesystem to communicate between the main process and the individual HPC jobs. This mode is necessary when tasks are submitted as independent jobs to a scheduler like SLURM or Flux, where direct network communication between the login node and compute nodes might be restricted.
94+
95+
## Resource Management
96+
97+
One of the key features of `executorlib` is the ability to specify resources on a per-function-call basis using the `resource_dict`.
98+
99+
* **`cores`**: Number of MPI ranks or CPU cores.
100+
* **`threads_per_core`**: Number of OpenMP threads.
101+
* **`gpus_per_core`**: Number of GPUs.
102+
* **`cwd`**: Working directory for the task.
103+
104+
The `TaskScheduler` ensures that these resource requirements are translated into appropriate commands for the `Spawner` (e.g., `mpiexec`, `srun`, or `flux run`).

0 commit comments

Comments
 (0)