+The convergence of artificial intelligence (AI) and high-performance computing (HPC) workflows [@workflows] is one of the key drivers for the rise of Python workflows for HPC. To avoid intrusive code changes, interfaces to performance critical scientific software packages were traditionally implemented using file-based communication and control shell scripts, leading to poor maintainability, portability, and scalability. This approach is however losing ground to more efficient alternatives, such as the use of direct Python bindings, as their support is now increasingly common in scientific software packages and especially machine learning packages and AI frameworks. This enables the programmer to easily express complex workloads that require the orchestration of multiple codes. Still, Python workflows for HPC also come with challenges, like (1) safely terminating Python processes, (2) controlling the resources of Python processes and (3) the management of Python environments [@pythonhpc]. The first two of these challenges can be addressed by developing strategies and tools to interface HPC job schedulers such as SLURM [@slurm] with Python in order to control the execution and manage the computational resources required to execute heterogenous HPC workflows. A number of Python workflow frameworks have been developed for both types of interfaces, ranging from domain-specific solutions for fields like high-throughput screening in computational materials science, i.e. fireworks [@fireworks], aiida [@aiida] and pyiron [@pyiron], to generalized Python interfaces for job schedulers [@myqueue; @psij] and task scheduling frameworks which implement their own task scheduling on top of the HPC job scheduler, i.e. dask [@dask], parsl [@parsl] and jobflow [@jobflow]. While these tools can be powerful, they introduce new constructs that unfamiliar to most Python developers, adding complexity and creating a barrier to entry.
0 commit comments