Update paper.md

jan-janssen · web-flow · commit 8ccc39e35be8 · 2025-02-14T20:34:09.000+01:00
diff --git a/docs/paper/paper.md b/docs/paper/paper.md
@@ -41,24 +41,9 @@ We distinguish two main use cases for such interfaces: either to request a separ
 # Features and Implementation
 Based on prior experience with the development of the pyiron workflow framework [@pyiron], the design philosophy of Executorlib is centered on the timeless principle of not reinventing the wheel. Rather than implementing its own job scheduler, Executorlib instead leverages existing job schedulers to request and manage Python processes and associated computing resources. Further, instead of defining a new syntax and concepts, Executorlib extends the existing syntax of the Executor class in the Python standard library. Taken together, this makes changing the mode of execution in Executorlib as easy as changing the Executor class, with the interface remaining the same. 
 
-## Example
-To illustrate the usage of Executorlib and explain the technical processes occurring in the background, we consider the simple example of doubling the numbers in the range from 1 to 4 on a laptop, local workstation or a single computing node of an HPC. With Executorlib, this can be achieved using the following code: 
-```python
-from executorlib import SingleNodeExecutor
-
-with SingleNodeExecutor() as exe:
-    future_lst = [exe.submit(sum, [i, i]) for i in range(1, 5)]
-    print([fs.result() for fs in future_lst])
-```
-In this example, each individual summation is concurrently executed in a separate process. We note the strict adherence to the standard python Executor Interface as the example remains functional when the Executorlib Executor `SingleNodeExecutor()` object is replaced with either the `ThreadPoolExecutor` or the `ProcessPoolExecutor` from the Python standard library. Following the initialization of the Executor context with the with-statement, the summation function `sum` is submitted for execution with the argument `[i, i]`, generated from the for-loop iterating over the range 1 to 4. \autoref{fig:process} illustrates the internal functionality of Executorlib. The submission function `submit()` requests the computational resources from the job scheduler, which can be SLURM, flux or a local job scheduler, depending on the choice of Executor class. The Python function is then executed asynchronously in a newly created Python process. The user can interact with the asynchronously executing Python process on the right through the `concurrent.futures.Future` object returned from the submission function, again as defined by the Python standard library. In the code example above, the `concurrent.futures.Future` object is named `fs`.  The `concurrent.futures.Future` object offers a function to check the status of the Python process (`done()`) and a function to block the execution of the process on the left until the execution of the process on the right is completed (`result()`). In contrast to the standard objects however, the Executorlib Executor allows for execution across multiple nodes of HPC systems, which enables the execution of highly compute-intensive workloads that require extensive computational resources, as we now show.
-
 ![Illustration of the communication between the Executorlib Executor, the job scheduler and the Python process to asynchronously execute the submitted Python function (on the right).\label{fig:process}](process.png){width="50%"}
 
-## Job Schedulers
-Currently, Executorlib supports five different job schedulers implement as different Executor classes. The first is the `SingleNodeExecutor` for rapid prototyping on a laptop or local workstation in a way that is functionally similar to the standard `ProcessPoolExecutor`. The second, `SlurmClusterExecutor` submits Python functions as individual jobs to a SLURM job scheduler using the `sbatch` command, which can be useful for long-running tasks, e.g., that call a compute intensive legacy code. This mode also has the advantage that all required hardware resources do not have to be secured prior to launching the workflow and can naturally vary in time. The third is the `SlurmJobExecutor` which distributes Python functions in an existing SLURM job using the `srun` command. It can be nested in a function submitted to a `SlurmClusterExecutor` to increase the computational efficiency for shorter tasks, as already requested computing resources are sub-divided rather than requesting new computing resources from the SLURM job scheduler. In analogy, the `FluxClusterExecutor` submits Python functions as individual jobs to a flux job scheduler and the `FluxJobExecutor` distributes Python functions in a flux job. Given the hierarchial approach of the flux scheduler there is no limit to the number of `FluxJobExecutor` instances which can be nested inside each other to construct hierarchical workflows. Finally, the `FluxJobExecutor` can also be nested in a `SlurmClusterExecutor` and is commonly more efficient than the `SlurmJobExecutor` as it uses the flux resource manager, rather than communicating with the central SLURM job scheduler using the `srun` command. While the `SlurmClusterExecutor` and the `FluxClusterExecutor` use file-based communication under the hood e.g., the Python function to execute and its inputs are stored on the file system, executed in a separate Python process whose output is again stored in a file, the other Executors rely on socket-based communication to improve computational efficiency. 
-
-## Resource Assignment
-To assign dedicated computing resources to individual Python functions, the Executorlib Executor classes extend the submission function `submit()` to support not only the Python function and its inputs, but also a Python dictionary specifying the requested computing resources `resource_dict`. The resource dictionary can define the number of compute cores, number of threads, number of GPUs, as well as job scheduler specific parameters like the working directory, maximum run time or the maximum memory. With this hierarchical approach, Executorlib allows the user to finely control the execution of each individual Python function, using parallel communication libraries like the Message Passing Interface (MPI) for Python [@mpi4py] or GPU-optimized libraries to aggressively optimize complex compute intensive tasks of heterogenous HPC that are best solved by tightly-coupled parallelization approaches, while offering a simple and easy to maintain approach to the orchestration of many such weakly-coupled tasks. This ability to seamlessly combine different programming models again accelerates the rapid prototyping of heterogenous HPC workflows without sacrificing performance of critical code components.
+Currently, Executorlib supports five different job schedulers implement as different Executor classes. The first is the `SingleNodeExecutor` for rapid prototyping on a laptop or local workstation in a way that is functionally similar to the standard `ProcessPoolExecutor`. The second, `SlurmClusterExecutor` submits Python functions as individual jobs to a SLURM job scheduler using the `sbatch` command, which can be useful for long-running tasks, e.g., that call a compute intensive legacy code. This mode also has the advantage that all required hardware resources do not have to be secured prior to launching the workflow and can naturally vary in time. The third is the `SlurmJobExecutor` which distributes Python functions in an existing SLURM job using the `srun` command. It can be nested in a function submitted to a `SlurmClusterExecutor` to increase the computational efficiency for shorter tasks, as already requested computing resources are sub-divided rather than requesting new computing resources from the SLURM job scheduler. In analogy, the `FluxClusterExecutor` submits Python functions as individual jobs to a flux job scheduler and the `FluxJobExecutor` distributes Python functions in a flux job. Given the hierarchial approach of the flux scheduler there is no limit to the number of `FluxJobExecutor` instances which can be nested inside each other to construct hierarchical workflows. Finally, the `FluxJobExecutor` can also be nested in a `SlurmClusterExecutor` and is commonly more efficient than the `SlurmJobExecutor` as it uses the flux resource manager, rather than communicating with the central SLURM job scheduler using the `srun` command. While the `SlurmClusterExecutor` and the `FluxClusterExecutor` use file-based communication under the hood e.g., the Python function to execute and its inputs are stored on the file system, executed in a separate Python process whose output is again stored in a file, the other Executors rely on socket-based communication to improve computational efficiency. \autoref{fig:process} illustrates the internal functionality of Executorlib.
 
 # Usage To-Date 
 While initially developed in the US DOE Exascale Computing Project’s Exascale Atomistic Capability for Accuracy, Length and Time (EXAALT) to accelerate the development of computational materials science simulation workflows for the Exascale, Executorlib has since been generalized to support a wide-range of backends and HPC clusters at different scales. Based on this generalization, it is also been implemented in the pyiron workflow framework [@pyiron] as primary task scheduling interface.