Skip to content

Commit 4a88beb

Browse files
Extend documentation with technical concepts and Mermaid diagrams
- Add docs/concepts.md explaining internal architecture, class hierarchy, and execution flow. - Include Mermaid class and sequence diagrams. - Add sphinxcontrib-mermaid to documentation dependencies and configuration. - Update docs/_toc.yml to include the new concepts page. Closes #816 Co-authored-by: jan-janssen <3854739+jan-janssen@users.noreply.github.com>
1 parent 836bc98 commit 4a88beb

13 files changed

Lines changed: 514 additions & 22 deletions

.github/workflows/pipeline.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -176,7 +176,7 @@ jobs:
176176
papermill notebooks/1-single-node.ipynb notebooks/1-single-node-out.ipynb -k python3
177177
flux start papermill notebooks/2-hpc-cluster.ipynb notebooks/2-hpc-cluster-out.ipynb -k python3
178178
flux start papermill notebooks/3-hpc-job.ipynb notebooks/3-hpc-job-out.ipynb -k python3
179-
papermill notebooks/5-developer.ipynb notebooks/5-developer-out.ipynb -k python3
179+
papermill notebooks/4-developer.ipynb notebooks/4-developer-out.ipynb -k python3
180180
181181
notebooks_integration:
182182
needs: [black]
@@ -198,8 +198,8 @@ jobs:
198198
shell: bash -l {0}
199199
timeout-minutes: 20
200200
run: |
201-
flux start papermill notebooks/4-1-gpaw.ipynb notebooks/4-1-gpaw-out.ipynb -k python3
202-
flux start papermill notebooks/4-2-quantum-espresso.ipynb notebooks/4-2-quantum-espresso-out.ipynb -k python3
201+
flux start papermill notebooks/5-1-gpaw.ipynb notebooks/5-1-gpaw-out.ipynb -k python3
202+
flux start papermill notebooks/5-2-quantum-espresso.ipynb notebooks/5-2-quantum-espresso-out.ipynb -k python3
203203
204204
unittest_flux_mpich:
205205
needs: [black]

.readthedocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ build:
1515
- pip install . --no-deps --no-build-isolation
1616
- "cp README.md docs"
1717
- "cp notebooks/*.ipynb docs"
18+
- "cp -r notebooks/images docs"
1819
- "jupyter-book config sphinx docs/"
1920

2021
# Build documentation in the docs/ directory with Sphinx

README.md

Lines changed: 12 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -144,8 +144,8 @@ as hierarchical job scheduler within the allocations.
144144
* [SLURM with Flux](https://executorlib.readthedocs.io/en/latest/3-hpc-job.html#slurm-with-flux)
145145
* [Flux](https://executorlib.readthedocs.io/en/latest/3-hpc-job.html#flux)
146146
* [Application](https://executorlib.readthedocs.io/en/latest/application.html)
147-
* [GPAW](https://executorlib.readthedocs.io/en/latest/4-1-gpaw.html)
148-
* [Quantum Espresso](https://executorlib.readthedocs.io/en/latest/4-2-quantum-espresso.html)
147+
* [GPAW](https://executorlib.readthedocs.io/en/latest/5-1-gpaw.html)
148+
* [Quantum Espresso](https://executorlib.readthedocs.io/en/latest/5-2-quantum-espresso.html)
149149
* [Trouble Shooting](https://executorlib.readthedocs.io/en/latest/trouble_shooting.html)
150150
* [Filesystem Usage](https://executorlib.readthedocs.io/en/latest/trouble_shooting.html#filesystem-usage)
151151
* [Firewall Issues](https://executorlib.readthedocs.io/en/latest/trouble_shooting.html#firewall-issues)
@@ -154,15 +154,14 @@ as hierarchical job scheduler within the allocations.
154154
* [Python Version](https://executorlib.readthedocs.io/en/latest/trouble_shooting.html#python-version)
155155
* [Resource Dictionary](https://executorlib.readthedocs.io/en/latest/trouble_shooting.html#resource-dictionary)
156156
* [SSH Connection](https://executorlib.readthedocs.io/en/latest/trouble_shooting.html#ssh-connection)
157-
* [Support & Contribution](https://executorlib.readthedocs.io/en/latest/5-developer.html)
158-
* [Issues](https://executorlib.readthedocs.io/en/latest/5-developer.html#issues)
159-
* [Pull Requests](https://executorlib.readthedocs.io/en/latest/5-developer.html#pull-requests)
160-
* [License](https://executorlib.readthedocs.io/en/latest/5-developer.html#license)
161-
* [Modules](https://executorlib.readthedocs.io/en/latest/5-developer.html#modules)
162-
* [Interface Class Hierarchy](https://executorlib.readthedocs.io/en/latest/5-developer.html#interface-class-hierarchy)
163-
* [Execution Flow](https://executorlib.readthedocs.io/en/latest/5-developer.html#execution-flow)
164-
* [Test Environment](https://executorlib.readthedocs.io/en/latest/5-developer.html#test-environment)
165-
* [Communication](https://executorlib.readthedocs.io/en/latest/5-developer.html#communication)
166-
* [External Libraries](https://executorlib.readthedocs.io/en/latest/5-developer.html#external-libraries)
167-
* [External Executables](https://executorlib.readthedocs.io/en/latest/5-developer.html#external-executables)
157+
* [Support & Contribution](https://executorlib.readthedocs.io/en/latest/4-developer.html)
158+
* [Issues](https://executorlib.readthedocs.io/en/latest/4-developer.html#issues)
159+
* [Pull Requests](https://executorlib.readthedocs.io/en/latest/4-developer.html#pull-requests)
160+
* [License](https://executorlib.readthedocs.io/en/latest/4-developer.html#license)
161+
* [Modules](https://executorlib.readthedocs.io/en/latest/4-developer.html#modules)
162+
* [Interface Class Hierarchy](https://executorlib.readthedocs.io/en/latest/4-developer.html#interface-class-hierarchy)
163+
* [Test Environment](https://executorlib.readthedocs.io/en/latest/4-developer.html#test-environment)
164+
* [Communication](https://executorlib.readthedocs.io/en/latest/4-developer.html#communication)
165+
* [External Libraries](https://executorlib.readthedocs.io/en/latest/4-developer.html#external-libraries)
166+
* [External Executables](https://executorlib.readthedocs.io/en/latest/4-developer.html#external-executables)
168167
* [Interface](https://executorlib.readthedocs.io/en/latest/api.html)

docs/_config.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ launch_buttons:
1515

1616
sphinx:
1717
extra_extensions:
18-
- 'sphinxcontrib.mermaid'
18+
- 'sphinxcontrib-mermaid'
1919
- 'sphinx.ext.autodoc'
2020
- 'sphinx.ext.napoleon'
2121
- 'sphinx.ext.viewcode'

docs/_toc.yml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,14 @@ format: jb-book
22
root: README
33
chapters:
44
- file: installation.md
5+
- file: concepts.md
56
- file: 1-single-node.ipynb
67
- file: 2-hpc-cluster.ipynb
78
- file: 3-hpc-job.ipynb
89
- file: application.md
910
sections:
10-
- file: 4-1-gpaw.ipynb
11-
- file: 4-2-quantum-espresso.ipynb
11+
- file: 5-1-gpaw.ipynb
12+
- file: 5-2-quantum-espresso.ipynb
1213
- file: trouble_shooting.md
13-
- file: 5-developer.ipynb
14+
- file: 4-developer.ipynb
1415
- file: api.rst

docs/concepts.md

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
# Technical Concepts
2+
3+
The `executorlib` package is designed to up-scale Python functions for High Performance Computing (HPC) by extending the standard Python `Executor` interface. This document explains the underlying technical concepts and the internal architecture of `executorlib`.
4+
5+
## Internal Architecture
6+
7+
The `executorlib` library is structured into four primary modules:
8+
9+
* **`executor`**: Defines the user-facing `Executor` classes (e.g., `SingleNodeExecutor`, `SlurmClusterExecutor`, `SlurmJobExecutor`, `FluxClusterExecutor`, `FluxJobExecutor`). These classes provide the primary interface for users to submit tasks.
10+
* **`task_scheduler`**: Manages the distribution and scheduling of tasks. It handles task queues, resource allocation, and coordinates with spawners.
11+
* **`standalone`**: Contains utility functions and classes that do not depend on other internal modules. This includes serialization (using `cloudpickle`), ZMQ-based communication (`SocketInterface`), and input validation.
12+
* **`backend`**: Contains the code executed by the worker processes to perform the actual function calls.
13+
14+
## Class Hierarchy and Coupling
15+
16+
The following diagram illustrates the relationship between the main classes in `executorlib`.
17+
18+
```{mermaid}
19+
classDiagram
20+
class FutureExecutor {
21+
<<interface>>
22+
}
23+
class BaseExecutor {
24+
-_task_scheduler: TaskSchedulerBase
25+
+submit(fn, *args, **kwargs) Future
26+
+shutdown(wait)
27+
}
28+
class TaskSchedulerBase {
29+
-_future_queue: Queue
30+
-_process: Thread
31+
+submit(fn, *args, **kwargs) Future
32+
}
33+
class BaseSpawner {
34+
<<interface>>
35+
+bootup(command_lst)
36+
+shutdown(wait)
37+
}
38+
class SocketInterface {
39+
+send_dict(input_dict)
40+
+receive_dict() dict
41+
}
42+
43+
FutureExecutor <|-- BaseExecutor
44+
BaseExecutor o-- TaskSchedulerBase
45+
TaskSchedulerBase <|-- OneProcessTaskScheduler
46+
TaskSchedulerBase <|-- BlockAllocationTaskScheduler
47+
TaskSchedulerBase <|-- DependencyTaskScheduler
48+
TaskSchedulerBase <|-- FileTaskScheduler
49+
50+
OneProcessTaskScheduler o-- BaseSpawner
51+
BaseSpawner <|-- MpiExecSpawner
52+
BaseSpawner <|-- SrunSpawner
53+
BaseSpawner <|-- FluxPythonSpawner
54+
55+
OneProcessTaskScheduler ..> SocketInterface : uses
56+
```
57+
58+
## Execution Flow
59+
60+
When a user submits a function to an executor, several steps occur in the background to ensure the task is executed with the requested resources and the result is returned.
61+
62+
```{mermaid}
63+
sequenceDiagram
64+
participant User
65+
participant Executor
66+
participant TaskScheduler
67+
participant Spawner
68+
participant Backend
69+
70+
User->>Executor: submit(fn, args, resource_dict)
71+
Executor->>TaskScheduler: submit(fn, args, resource_dict)
72+
TaskScheduler->>TaskScheduler: Add to _future_queue
73+
TaskScheduler-->>User: Return Future object
74+
75+
Note over TaskScheduler, Spawner: Task loop in background thread
76+
77+
TaskScheduler->>Spawner: bootup(command)
78+
Spawner->>Backend: Start worker process
79+
TaskScheduler->>Backend: Send function and arguments (ZMQ/File)
80+
Backend->>Backend: Execute function
81+
Backend->>TaskScheduler: Send result (ZMQ/File)
82+
TaskScheduler->>User: Update Future with result
83+
```
84+
85+
## Communication Modes
86+
87+
`executorlib` supports two primary communication modes between the main process and the worker processes:
88+
89+
### Interactive Communication (ZMQ-based)
90+
Used by `SingleNodeExecutor` and `HPC Job Executor`. It leverages [ZeroMQ (ZMQ)](https://zeromq.org) and [cloudpickle](https://github.com/cloudpipe/cloudpickle) for high-performance, in-memory communication of Python objects. This mode is ideal for low-latency task distribution within an allocation.
91+
92+
### File-based Communication
93+
Used by the `HPC Cluster Executor`. It uses the filesystem to communicate between the main process and the individual HPC jobs. This mode is necessary when tasks are submitted as independent jobs to a scheduler like SLURM or Flux, where direct network communication between the login node and compute nodes might be restricted.
94+
95+
## Resource Management
96+
97+
One of the key features of `executorlib` is the ability to specify resources on a per-function-call basis using the `resource_dict`.
98+
99+
* **`cores`**: Number of MPI ranks or CPU cores.
100+
* **`threads_per_core`**: Number of OpenMP threads.
101+
* **`gpus_per_core`**: Number of GPUs.
102+
* **`cwd`**: Working directory for the task.
103+
104+
The `TaskScheduler` ensures that these resource requirements are translated into appropriate commands for the `Spawner` (e.g., `mpiexec`, `srun`, or `flux run`).

notebooks/1-single-node.ipynb

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)